Assessing the bias in samples of large online networks. González-Bailón, S., Wang, N., Rivero, A., Borge-Holthoefer, J., & Moreno, Y. Social Networks, 38:16--27, July, 2014.
Assessing the bias in samples of large online networks [link]Paper  doi  abstract   bibtex   
We consider the sampling bias introduced in the study of online networks when collecting data through publicly available APIs (application programming interfaces). We assess differences between three samples of Twitter activity; the empirical context is given by political protests taking place in May 2012. We track online communication around these protests for the period of one month, and reconstruct the network of mentions and re-tweets according to the search and the streaming APIs, and to different filtering parameters. We find that smaller samples do not offer an accurate picture of peripheral activity; we also find that the bias is greater for the network of mentions, partly because of the higher influence of snowballing in identifying relevant nodes. We discuss the implications of this bias for the study of diffusion dynamics and political communication through social media, and advocate the need for more uniform sampling procedures to study online communication.
@article{gonzalez-bailon_assessing_2014,
	title = {Assessing the bias in samples of large online networks},
	volume = {38},
	issn = {0378-8733},
	url = {http://www.sciencedirect.com/science/article/pii/S0378873314000057},
	doi = {10.1016/j.socnet.2014.01.004},
	abstract = {We consider the sampling bias introduced in the study of online networks when collecting data through publicly available APIs (application programming interfaces). We assess differences between three samples of Twitter activity; the empirical context is given by political protests taking place in May 2012. We track online communication around these protests for the period of one month, and reconstruct the network of mentions and re-tweets according to the search and the streaming APIs, and to different filtering parameters. We find that smaller samples do not offer an accurate picture of peripheral activity; we also find that the bias is greater for the network of mentions, partly because of the higher influence of snowballing in identifying relevant nodes. We discuss the implications of this bias for the study of diffusion dynamics and political communication through social media, and advocate the need for more uniform sampling procedures to study online communication.},
	urldate = {2014-03-10},
	journal = {Social Networks},
	author = {González-Bailón, Sandra and Wang, Ning and Rivero, Alejandro and Borge-Holthoefer, Javier and Moreno, Yamir},
	month = jul,
	year = {2014},
	keywords = {Graph comparison, Measurement error, Political communication, Social media, Social protests, Twitter},
	pages = {16--27},
	file = {ScienceDirect Full Text PDF:files/48502/González-Bailón et al. - 2014 - Assessing the bias in samples of large online netw.pdf:application/pdf;ScienceDirect Full Text PDF:files/48965/González-Bailón et al. - 2014 - Assessing the bias in samples of large online netw.pdf:application/pdf;ScienceDirect Snapshot:files/48503/S0378873314000057.html:text/html;ScienceDirect Snapshot:files/48967/González-Bailón et al. - 2014 - Assessing the bias in samples of large online netw.html:text/html}
}

Downloads: 0