Detecting Hypoglycemia Incidents Reported in Patients’ Secure Messages: Using Cost-Sensitive Learning and Oversampling to Reduce Data Imbalance

Detecting Hypoglycemia Incidents Reported in Patients’ Secure Messages: Using Cost-Sensitive Learning and Oversampling to Reduce Data Imbalance. Chen, J., Lalor, J., Liu, W., Druhl, E., Granillo, E., Vimalananda, V. G, & Yu, H. Journal of Medical Internet Research, March, 2019.

Paper doi abstract bibtex

Background Improper dosing of medications such as insulin can cause hypoglycemic episodes, which may lead to severe morbidity or even death. Although secure messaging was designed for exchanging nonurgent messages, patients sometimes report hypoglycemia events through secure messaging. Detecting these patient-reported adverse events may help alert clinical teams and enable early corrective actions to improve patient safety. Objective We aimed to develop a natural language processing system, called HypoDetect (Hypoglycemia Detector), to automatically identify hypoglycemia incidents reported in patients’ secure messages. Methods An expert in public health annotated 3000 secure message threads between patients with diabetes and US Department of Veterans Affairs clinical teams as containing patient-reported hypoglycemia incidents or not. A physician independently annotated 100 threads randomly selected from this dataset to determine interannotator agreement. We used this dataset to develop and evaluate HypoDetect. HypoDetect incorporates 3 machine learning algorithms widely used for text classification: linear support vector machines, random forest, and logistic regression. We explored different learning features, including new knowledge-driven features. Because only 114 (3.80%) messages were annotated as positive, we investigated cost-sensitive learning and oversampling methods to mitigate the challenge of imbalanced data. Results The interannotator agreement was Cohen kappa=.976. Using cross-validation, logistic regression with cost-sensitive learning achieved the best performance (area under the receiver operating characteristic curve=0.954, sensitivity=0.693, specificity 0.974, F1 score=0.590). Cost-sensitive learning and the ensembled synthetic minority oversampling technique improved the sensitivity of the baseline systems substantially (by 0.123 to 0.728 absolute gains). Our results show that a variety of features contributed to the best performance of HypoDetect. Conclusions Despite the challenge of data imbalance, HypoDetect achieved promising results for the task of detecting hypoglycemia incidents from secure messages. The system has a great potential to facilitate early detection and treatment of hypoglycemia.

@article{chen_detecting_2019,
	title = {Detecting {Hypoglycemia} {Incidents} {Reported} in {Patients}’ {Secure} {Messages}: {Using} {Cost}-{Sensitive} {Learning} and {Oversampling} to {Reduce} {Data} {Imbalance}},
	volume = {21},
	issn = {1439-4456},
	shorttitle = {Detecting {Hypoglycemia} {Incidents} {Reported} in {Patients}’ {Secure} {Messages}},
	url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6431826/},
	doi = {10.2196/11990},
	abstract = {Background
Improper dosing of medications such as insulin can cause hypoglycemic episodes, which may lead to severe morbidity or even death. Although secure messaging was designed for exchanging nonurgent messages, patients sometimes report hypoglycemia events through secure messaging. Detecting these patient-reported adverse events may help alert clinical teams and enable early corrective actions to improve patient safety.

Objective
We aimed to develop a natural language processing system, called HypoDetect (Hypoglycemia Detector), to automatically identify hypoglycemia incidents reported in patients’ secure messages.

Methods
An expert in public health annotated 3000 secure message threads between patients with diabetes and US Department of Veterans Affairs clinical teams as containing patient-reported hypoglycemia incidents or not. A physician independently annotated 100 threads randomly selected from this dataset to determine interannotator agreement. We used this dataset to develop and evaluate HypoDetect. HypoDetect incorporates 3 machine learning algorithms widely used for text classification: linear support vector machines, random forest, and logistic regression. We explored different learning features, including new knowledge-driven features. Because only 114 (3.80\%) messages were annotated as positive, we investigated cost-sensitive learning and oversampling methods to mitigate the challenge of imbalanced data.

Results
The interannotator agreement was Cohen kappa=.976. Using cross-validation, logistic regression with cost-sensitive learning achieved the best performance (area under the receiver operating characteristic curve=0.954, sensitivity=0.693, specificity 0.974, F1 score=0.590). Cost-sensitive learning and the ensembled synthetic minority oversampling technique improved the sensitivity of the baseline systems substantially (by 0.123 to 0.728 absolute gains). Our results show that a variety of features contributed to the best performance of HypoDetect.

Conclusions
Despite the challenge of data imbalance, HypoDetect achieved promising results for the task of detecting hypoglycemia incidents from secure messages. The system has a great potential to facilitate early detection and treatment of hypoglycemia.},
	number = {3},
	urldate = {2019-12-29},
	journal = {Journal of Medical Internet Research},
	author = {Chen, Jinying and Lalor, John and Liu, Weisong and Druhl, Emily and Granillo, Edgard and Vimalananda, Varsha G and Yu, Hong},
	month = mar,
	year = {2019},
	pmid = {30855231 PMCID: PMC6431826},
}

Downloads: 0

{"_id":"Md3EBAAyL9CjHeT4n","bibbaseid":"chen-lalor-liu-druhl-granillo-vimalananda-yu-detectinghypoglycemiaincidentsreportedinpatientssecuremessagesusingcostsensitivelearningandoversamplingtoreducedataimbalance-2019","author_short":["Chen, J.","Lalor, J.","Liu, W.","Druhl, E.","Granillo, E.","Vimalananda, V. G","Yu, H."],"bibdata":{"bibtype":"article","type":"article","title":"Detecting Hypoglycemia Incidents Reported in Patients’ Secure Messages: Using Cost-Sensitive Learning and Oversampling to Reduce Data Imbalance","volume":"21","issn":"1439-4456","shorttitle":"Detecting Hypoglycemia Incidents Reported in Patients’ Secure Messages","url":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6431826/","doi":"10.2196/11990","abstract":"Background Improper dosing of medications such as insulin can cause hypoglycemic episodes, which may lead to severe morbidity or even death. Although secure messaging was designed for exchanging nonurgent messages, patients sometimes report hypoglycemia events through secure messaging. Detecting these patient-reported adverse events may help alert clinical teams and enable early corrective actions to improve patient safety. Objective We aimed to develop a natural language processing system, called HypoDetect (Hypoglycemia Detector), to automatically identify hypoglycemia incidents reported in patients’ secure messages. Methods An expert in public health annotated 3000 secure message threads between patients with diabetes and US Department of Veterans Affairs clinical teams as containing patient-reported hypoglycemia incidents or not. A physician independently annotated 100 threads randomly selected from this dataset to determine interannotator agreement. We used this dataset to develop and evaluate HypoDetect. HypoDetect incorporates 3 machine learning algorithms widely used for text classification: linear support vector machines, random forest, and logistic regression. We explored different learning features, including new knowledge-driven features. Because only 114 (3.80%) messages were annotated as positive, we investigated cost-sensitive learning and oversampling methods to mitigate the challenge of imbalanced data. Results The interannotator agreement was Cohen kappa=.976. Using cross-validation, logistic regression with cost-sensitive learning achieved the best performance (area under the receiver operating characteristic curve=0.954, sensitivity=0.693, specificity 0.974, F1 score=0.590). Cost-sensitive learning and the ensembled synthetic minority oversampling technique improved the sensitivity of the baseline systems substantially (by 0.123 to 0.728 absolute gains). Our results show that a variety of features contributed to the best performance of HypoDetect. Conclusions Despite the challenge of data imbalance, HypoDetect achieved promising results for the task of detecting hypoglycemia incidents from secure messages. The system has a great potential to facilitate early detection and treatment of hypoglycemia.","number":"3","urldate":"2019-12-29","journal":"Journal of Medical Internet Research","author":[{"propositions":[],"lastnames":["Chen"],"firstnames":["Jinying"],"suffixes":[]},{"propositions":[],"lastnames":["Lalor"],"firstnames":["John"],"suffixes":[]},{"propositions":[],"lastnames":["Liu"],"firstnames":["Weisong"],"suffixes":[]},{"propositions":[],"lastnames":["Druhl"],"firstnames":["Emily"],"suffixes":[]},{"propositions":[],"lastnames":["Granillo"],"firstnames":["Edgard"],"suffixes":[]},{"propositions":[],"lastnames":["Vimalananda"],"firstnames":["Varsha","G"],"suffixes":[]},{"propositions":[],"lastnames":["Yu"],"firstnames":["Hong"],"suffixes":[]}],"month":"March","year":"2019","pmid":"30855231 PMCID: PMC6431826","bibtex":"@article{chen_detecting_2019,\n\ttitle = {Detecting {Hypoglycemia} {Incidents} {Reported} in {Patients}’ {Secure} {Messages}: {Using} {Cost}-{Sensitive} {Learning} and {Oversampling} to {Reduce} {Data} {Imbalance}},\n\tvolume = {21},\n\tissn = {1439-4456},\n\tshorttitle = {Detecting {Hypoglycemia} {Incidents} {Reported} in {Patients}’ {Secure} {Messages}},\n\turl = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6431826/},\n\tdoi = {10.2196/11990},\n\tabstract = {Background\nImproper dosing of medications such as insulin can cause hypoglycemic episodes, which may lead to severe morbidity or even death. Although secure messaging was designed for exchanging nonurgent messages, patients sometimes report hypoglycemia events through secure messaging. Detecting these patient-reported adverse events may help alert clinical teams and enable early corrective actions to improve patient safety.\n\nObjective\nWe aimed to develop a natural language processing system, called HypoDetect (Hypoglycemia Detector), to automatically identify hypoglycemia incidents reported in patients’ secure messages.\n\nMethods\nAn expert in public health annotated 3000 secure message threads between patients with diabetes and US Department of Veterans Affairs clinical teams as containing patient-reported hypoglycemia incidents or not. A physician independently annotated 100 threads randomly selected from this dataset to determine interannotator agreement. We used this dataset to develop and evaluate HypoDetect. HypoDetect incorporates 3 machine learning algorithms widely used for text classification: linear support vector machines, random forest, and logistic regression. We explored different learning features, including new knowledge-driven features. Because only 114 (3.80\\%) messages were annotated as positive, we investigated cost-sensitive learning and oversampling methods to mitigate the challenge of imbalanced data.\n\nResults\nThe interannotator agreement was Cohen kappa=.976. Using cross-validation, logistic regression with cost-sensitive learning achieved the best performance (area under the receiver operating characteristic curve=0.954, sensitivity=0.693, specificity 0.974, F1 score=0.590). Cost-sensitive learning and the ensembled synthetic minority oversampling technique improved the sensitivity of the baseline systems substantially (by 0.123 to 0.728 absolute gains). Our results show that a variety of features contributed to the best performance of HypoDetect.\n\nConclusions\nDespite the challenge of data imbalance, HypoDetect achieved promising results for the task of detecting hypoglycemia incidents from secure messages. The system has a great potential to facilitate early detection and treatment of hypoglycemia.},\n\tnumber = {3},\n\turldate = {2019-12-29},\n\tjournal = {Journal of Medical Internet Research},\n\tauthor = {Chen, Jinying and Lalor, John and Liu, Weisong and Druhl, Emily and Granillo, Edgard and Vimalananda, Varsha G and Yu, Hong},\n\tmonth = mar,\n\tyear = {2019},\n\tpmid = {30855231 PMCID: PMC6431826},\n}\n\n","author_short":["Chen, J.","Lalor, J.","Liu, W.","Druhl, E.","Granillo, E.","Vimalananda, V. G","Yu, H."],"key":"chen_detecting_2019","id":"chen_detecting_2019","bibbaseid":"chen-lalor-liu-druhl-granillo-vimalananda-yu-detectinghypoglycemiaincidentsreportedinpatientssecuremessagesusingcostsensitivelearningandoversamplingtoreducedataimbalance-2019","role":"author","urls":{"Paper":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6431826/"},"metadata":{"authorlinks":{}},"html":""},"bibtype":"article","biburl":"http://fenway.cs.uml.edu/papers/pubs-all.bib","dataSources":["TqaA9miSB65nRfS5H"],"keywords":[],"search_terms":["detecting","hypoglycemia","incidents","reported","patients","secure","messages","using","cost","sensitive","learning","oversampling","reduce","data","imbalance","chen","lalor","liu","druhl","granillo","vimalananda","yu"],"title":"Detecting Hypoglycemia Incidents Reported in Patients’ Secure Messages: Using Cost-Sensitive Learning and Oversampling to Reduce Data Imbalance","year":2019}