Can you Trust the Trend? Discovering Simpson�s Paradoxes in Social Data. Alipourfard, N., Fennell, P., & Lerman, K. In Proceedings of the 11th International ACM Conference on Web Search and Data Mining (WSDM), 2018. ACM.
doi  abstract   bibtex   
We investigate how Simpson�s paradox affects analysis of trends in social data. According to the paradox, the trends observed in data that has been aggregated over an entire population may be quite different from, and even opposite to, those of the underlying subgroups. Failure to take this effect into account can lead analysis to wrong conclusions. We present a statistical method to automatically identify Simpson�s paradox in data by comparing statistical trends in the aggregate data to those in the disaggregated subgroups. We apply the approach to data from Stack Exchange, a popular question-answering platform, to analyze factors affecting answerer performance, specifically, the likelihood that an answer provided by a user will be accepted by the asker as the best answer to his or her question. Our analysis confirms a known Simpson�s paradox and identifies several new instances. These paradoxes provide novel insights into user behavior on Stack Exchange.
@INPROCEEDINGS{Alipourfard2018wsdm,
  author =       {Nazanin Alipourfard and Peter Fennell and Kristina Lerman},
  title =        {Can you Trust the Trend? Discovering Simpson�s Paradoxes in Social Data},
  booktitle =    {Proceedings of the 11th International ACM Conference on Web Search and Data Mining (WSDM)},
  year =         {2018},
  pages =        {},
  doi={10.1145/3159652.3159684},
  publisher =    {ACM},
  abstract =     {We investigate how Simpson�s paradox affects analysis of trends
in social data. According to the paradox, the trends observed in
data that has been aggregated over an entire population may be
quite different from, and even opposite to, those of the underlying
subgroups. Failure to take this effect into account can lead
analysis to wrong conclusions. We present a statistical method to
automatically identify Simpson�s paradox in data by comparing
statistical trends in the aggregate data to those in the disaggregated
subgroups. We apply the approach to data from Stack Exchange, a
popular question-answering platform, to analyze factors affecting
answerer performance, specifically, the likelihood that an answer
provided by a user will be accepted by the asker as the best answer
to his or her question. Our analysis confirms a known Simpson�s
paradox and identifies several new instances. These paradoxes provide
novel insights into user behavior on Stack Exchange.},
  keywords =     {social-dynamics},
}

Downloads: 0