Tracking Dengue Epidemics using Twitter Content Classification and Topic Modelling

Tracking Dengue Epidemics using Twitter Content Classification and Topic Modelling. Missier, P., Romanovsky, A, Miu, T, Pal, A, Daniilakis, M, Garcia, A, Cedrim, D, & Sousa, L In Procs. SoWeMine workshop, co-located with ICWE 2016, Lugano, Switzerland, 2016.

Paper abstract bibtex

Detecting and preventing outbreaks of mosquito-borne diseases such as Dengue and Zika in Brasil and other tropical regions has long been a priority for governments in affected areas. Streaming social media content, such as Twit- ter, is increasingly being used for health vigilance applications such as flu detec- tion. However, previous work has not addressed the complexity of drastic sea- sonal changes on Twitter a across multiple epidemic outbreaks. In order to address this gap, this paper contrasts two complementary approaches to detecting Twitter content that is relevant for Dengue outbreak detection, namely supervised classification and unsupervised clustering using topic modelling. Each approach has benefits and shortcomings. Our classifier achieves a prediction accuracy of about 80% based on a small training set of about 1,000 instances, but the need for manual annotation makes it hard to track seasonal changes in the nature of the epidemics, such as the emergence of new types of virus in certain geographical locations. In contrast, LDA-based topic modelling scales well, generating cohe- sive and well-separated clusters from larger samples. While clusters can be easily re-generated following changes in epidemics, however, this approach makes it hard to clearly segregate relevant tweets into well-defined clusters.

@inproceedings{missier_tracking_2016,
	address = {Lugano, Switzerland},
	title = {Tracking {Dengue} {Epidemics} using {Twitter} {Content} {Classification} and {Topic} {Modelling}},
	url = {http://arxiv.org/abs/1605.00968},
	abstract = {Detecting and preventing outbreaks of mosquito-borne diseases such as Dengue and Zika in Brasil and other tropical regions has long been a priority for governments in affected areas. Streaming social media content, such as Twit- ter, is increasingly being used for health vigilance applications such as flu detec- tion. However, previous work has not addressed the complexity of drastic sea- sonal changes on Twitter a across multiple epidemic outbreaks. In order to address this gap, this paper contrasts two complementary approaches to detecting Twitter content that is relevant for Dengue outbreak detection, namely supervised classification and unsupervised clustering using topic modelling. Each approach has benefits and shortcomings. Our classifier achieves a prediction accuracy of about 80\% based on a small training set of about 1,000 instances, but the need for manual annotation makes it hard to track seasonal changes in the nature of the epidemics, such as the emergence of new types of virus in certain geographical locations. In contrast, LDA-based topic modelling scales well, generating cohe- sive and well-separated clusters from larger samples. While clusters can be easily re-generated following changes in epidemics, however, this approach makes it hard to clearly segregate relevant tweets into well-defined clusters.},
	booktitle = {Procs. {SoWeMine} workshop, co-located with {ICWE} 2016},
	author = {Missier, Paolo and Romanovsky, A and Miu, T and Pal, A and Daniilakis, M and Garcia, A and Cedrim, D and Sousa, L},
	year = {2016},
	keywords = {\#social media analytics, \#twitter analytics},
}

Downloads: 0

{"_id":"5EfwhZDYwS29SiYwS","bibbaseid":"missier-romanovsky-miu-pal-daniilakis-garcia-cedrim-sousa-trackingdengueepidemicsusingtwittercontentclassificationandtopicmodelling-2016","downloads":0,"creationDate":"2016-05-03T16:18:43.319Z","title":"Tracking Dengue Epidemics using Twitter Content Classification and Topic Modelling","author_short":["Missier, P.","Romanovsky, A","Miu, T","Pal, A","Daniilakis, M","Garcia, A","Cedrim, D","Sousa, L"],"year":2016,"bibtype":"inproceedings","biburl":"https://bibbase.org/network/files/tHMs8ic86gSWoTp44","bibdata":{"bibtype":"inproceedings","type":"inproceedings","address":"Lugano, Switzerland","title":"Tracking Dengue Epidemics using Twitter Content Classification and Topic Modelling","url":"http://arxiv.org/abs/1605.00968","abstract":"Detecting and preventing outbreaks of mosquito-borne diseases such as Dengue and Zika in Brasil and other tropical regions has long been a priority for governments in affected areas. Streaming social media content, such as Twit- ter, is increasingly being used for health vigilance applications such as flu detec- tion. However, previous work has not addressed the complexity of drastic sea- sonal changes on Twitter a across multiple epidemic outbreaks. In order to address this gap, this paper contrasts two complementary approaches to detecting Twitter content that is relevant for Dengue outbreak detection, namely supervised classification and unsupervised clustering using topic modelling. Each approach has benefits and shortcomings. Our classifier achieves a prediction accuracy of about 80% based on a small training set of about 1,000 instances, but the need for manual annotation makes it hard to track seasonal changes in the nature of the epidemics, such as the emergence of new types of virus in certain geographical locations. In contrast, LDA-based topic modelling scales well, generating cohe- sive and well-separated clusters from larger samples. While clusters can be easily re-generated following changes in epidemics, however, this approach makes it hard to clearly segregate relevant tweets into well-defined clusters.","booktitle":"Procs. SoWeMine workshop, co-located with ICWE 2016","author":[{"propositions":[],"lastnames":["Missier"],"firstnames":["Paolo"],"suffixes":[]},{"propositions":[],"lastnames":["Romanovsky"],"firstnames":["A"],"suffixes":[]},{"propositions":[],"lastnames":["Miu"],"firstnames":["T"],"suffixes":[]},{"propositions":[],"lastnames":["Pal"],"firstnames":["A"],"suffixes":[]},{"propositions":[],"lastnames":["Daniilakis"],"firstnames":["M"],"suffixes":[]},{"propositions":[],"lastnames":["Garcia"],"firstnames":["A"],"suffixes":[]},{"propositions":[],"lastnames":["Cedrim"],"firstnames":["D"],"suffixes":[]},{"propositions":[],"lastnames":["Sousa"],"firstnames":["L"],"suffixes":[]}],"year":"2016","keywords":"#social media analytics, #twitter analytics","bibtex":"@inproceedings{missier_tracking_2016,\n\taddress = {Lugano, Switzerland},\n\ttitle = {Tracking {Dengue} {Epidemics} using {Twitter} {Content} {Classification} and {Topic} {Modelling}},\n\turl = {http://arxiv.org/abs/1605.00968},\n\tabstract = {Detecting and preventing outbreaks of mosquito-borne diseases such as Dengue and Zika in Brasil and other tropical regions has long been a priority for governments in affected areas. Streaming social media content, such as Twit- ter, is increasingly being used for health vigilance applications such as flu detec- tion. However, previous work has not addressed the complexity of drastic sea- sonal changes on Twitter a across multiple epidemic outbreaks. In order to address this gap, this paper contrasts two complementary approaches to detecting Twitter content that is relevant for Dengue outbreak detection, namely supervised classification and unsupervised clustering using topic modelling. Each approach has benefits and shortcomings. Our classifier achieves a prediction accuracy of about 80\\% based on a small training set of about 1,000 instances, but the need for manual annotation makes it hard to track seasonal changes in the nature of the epidemics, such as the emergence of new types of virus in certain geographical locations. In contrast, LDA-based topic modelling scales well, generating cohe- sive and well-separated clusters from larger samples. While clusters can be easily re-generated following changes in epidemics, however, this approach makes it hard to clearly segregate relevant tweets into well-defined clusters.},\n\tbooktitle = {Procs. {SoWeMine} workshop, co-located with {ICWE} 2016},\n\tauthor = {Missier, Paolo and Romanovsky, A and Miu, T and Pal, A and Daniilakis, M and Garcia, A and Cedrim, D and Sousa, L},\n\tyear = {2016},\n\tkeywords = {\\#social media analytics, \\#twitter analytics},\n}\n\n","author_short":["Missier, P.","Romanovsky, A","Miu, T","Pal, A","Daniilakis, M","Garcia, A","Cedrim, D","Sousa, L"],"key":"missier_tracking_2016","id":"missier_tracking_2016","bibbaseid":"missier-romanovsky-miu-pal-daniilakis-garcia-cedrim-sousa-trackingdengueepidemicsusingtwittercontentclassificationandtopicmodelling-2016","role":"author","urls":{"Paper":"http://arxiv.org/abs/1605.00968"},"keyword":["#social media analytics","#twitter analytics"],"metadata":{"authorlinks":{}},"downloads":0},"search_terms":["tracking","dengue","epidemics","using","twitter","content","classification","topic","modelling","missier","romanovsky","miu","pal","daniilakis","garcia","cedrim","sousa"],"keywords":["#social media analytics","#twitter analytics"],"authorIDs":[],"dataSources":["zh27EpT9RPew3MWSE","ner3YxPo3mvD9E5ym","nF6KkFb4XxGruanwy","BDjqJntjXzyBmLxhv","oiWqtmpFQ6ZtiMEK2","k75vCTghu54BjX5qH","j9tnaL2u4rifwAc2v","NCorZq2vkXK6BnhLF","ze2X9uz8Dcv2oGipf","afppXLgSuddAzAL9e","wJE4ynGem9MRsXBRn","9zrgMZfGdRkdkNXfZ","qTQGxWDYeue2pHBus"]}