Toward automated consumer question answering: Automatically separating consumer questions from professional questions in the healthcare domain

Toward automated consumer question answering: Automatically separating consumer questions from professional questions in the healthcare domain. Liu, F., Antieau, L. D., & Yu, H. Journal of Biomedical Informatics, 44(6):1032–1038, December, 2011.

Paper doi abstract bibtex

Objective Both healthcare professionals and healthcare consumers have information needs that can be met through the use of computers, specifically via medical question answering systems. However, the information needs of both groups are different in terms of literacy levels and technical expertise, and an effective question answering system must be able to account for these differences if it is to formulate the most relevant responses for users from each group. In this paper, we propose that a first step toward answering the queries of different users is automatically classifying questions according to whether they were asked by healthcare professionals or consumers. Design We obtained two sets of consumer questions (∼10,000 questions in total) from Yahoo answers. The professional questions consist of two question collections: 4654 point-of-care questions (denoted as PointCare) obtained from interviews of a group of family doctors following patient visits and 5378 questions from physician practices through professional online services (denoted as OnlinePractice). With more than 20,000 questions combined, we developed supervised machine-learning models for automatic classification between consumer questions and professional questions. To evaluate the robustness of our models, we tested the model that was trained on the Consumer–PointCare dataset on the Consumer–OnlinePractice dataset. We evaluated both linguistic features and statistical features and examined how the characteristics in two different types of professional questions (PointCare vs. OnlinePractice) may affect the classification performance. We explored information gain for feature reduction and the back-off linguistic category features. Results The 10-fold cross-validation results showed the best F1-measure of 0.936 and 0.946 on Consumer–PointCare and Consumer–OnlinePractice respectively, and the best F1-measure of 0.891 when testing the Consumer–PointCare model on the Consumer–OnlinePractice dataset. Conclusion Healthcare consumer questions posted at Yahoo online communities can be reliably classified from professional questions posted by point-of-care clinicians and online physicians. The supervised machine-learning models are robust for this task. Our study will significantly benefit further development in automated consumer question answering.

@article{liu_toward_2011,
	title = {Toward automated consumer question answering: {Automatically} separating consumer questions from professional questions in the healthcare domain},
	volume = {44},
	issn = {1532-0464},
	shorttitle = {Toward automated consumer question answering},
	url = {https://www.sciencedirect.com/science/article/pii/S1532046411001353},
	doi = {10.1016/j.jbi.2011.08.008},
	abstract = {Objective
Both healthcare professionals and healthcare consumers have information needs that can be met through the use of computers, specifically via medical question answering systems. However, the information needs of both groups are different in terms of literacy levels and technical expertise, and an effective question answering system must be able to account for these differences if it is to formulate the most relevant responses for users from each group. In this paper, we propose that a first step toward answering the queries of different users is automatically classifying questions according to whether they were asked by healthcare professionals or consumers.
Design
We obtained two sets of consumer questions (∼10,000 questions in total) from Yahoo answers. The professional questions consist of two question collections: 4654 point-of-care questions (denoted as PointCare) obtained from interviews of a group of family doctors following patient visits and 5378 questions from physician practices through professional online services (denoted as OnlinePractice). With more than 20,000 questions combined, we developed supervised machine-learning models for automatic classification between consumer questions and professional questions. To evaluate the robustness of our models, we tested the model that was trained on the Consumer–PointCare dataset on the Consumer–OnlinePractice dataset. We evaluated both linguistic features and statistical features and examined how the characteristics in two different types of professional questions (PointCare vs. OnlinePractice) may affect the classification performance. We explored information gain for feature reduction and the back-off linguistic category features.
Results
The 10-fold cross-validation results showed the best F1-measure of 0.936 and 0.946 on Consumer–PointCare and Consumer–OnlinePractice respectively, and the best F1-measure of 0.891 when testing the Consumer–PointCare model on the Consumer–OnlinePractice dataset.
Conclusion
Healthcare consumer questions posted at Yahoo online communities can be reliably classified from professional questions posted by point-of-care clinicians and online physicians. The supervised machine-learning models are robust for this task. Our study will significantly benefit further development in automated consumer question answering.},
	language = {en},
	number = {6},
	urldate = {2022-12-02},
	journal = {Journal of Biomedical Informatics},
	author = {Liu, Feifan and Antieau, Lamont D. and Yu, Hong},
	month = dec,
	year = {2011},
	keywords = {Medical question answering, Natural language processing, Question classification, Supervised machine learning, Support vector machines},
	pages = {1032--1038},
}

Downloads: 0

{"_id":"PfHHS8zY5yN7pQate","bibbaseid":"liu-antieau-yu-towardautomatedconsumerquestionansweringautomaticallyseparatingconsumerquestionsfromprofessionalquestionsinthehealthcaredomain-2011","author_short":["Liu, F.","Antieau, L. D.","Yu, H."],"bibdata":{"bibtype":"article","type":"article","title":"Toward automated consumer question answering: Automatically separating consumer questions from professional questions in the healthcare domain","volume":"44","issn":"1532-0464","shorttitle":"Toward automated consumer question answering","url":"https://www.sciencedirect.com/science/article/pii/S1532046411001353","doi":"10.1016/j.jbi.2011.08.008","abstract":"Objective Both healthcare professionals and healthcare consumers have information needs that can be met through the use of computers, specifically via medical question answering systems. However, the information needs of both groups are different in terms of literacy levels and technical expertise, and an effective question answering system must be able to account for these differences if it is to formulate the most relevant responses for users from each group. In this paper, we propose that a first step toward answering the queries of different users is automatically classifying questions according to whether they were asked by healthcare professionals or consumers. Design We obtained two sets of consumer questions (∼10,000 questions in total) from Yahoo answers. The professional questions consist of two question collections: 4654 point-of-care questions (denoted as PointCare) obtained from interviews of a group of family doctors following patient visits and 5378 questions from physician practices through professional online services (denoted as OnlinePractice). With more than 20,000 questions combined, we developed supervised machine-learning models for automatic classification between consumer questions and professional questions. To evaluate the robustness of our models, we tested the model that was trained on the Consumer–PointCare dataset on the Consumer–OnlinePractice dataset. We evaluated both linguistic features and statistical features and examined how the characteristics in two different types of professional questions (PointCare vs. OnlinePractice) may affect the classification performance. We explored information gain for feature reduction and the back-off linguistic category features. Results The 10-fold cross-validation results showed the best F1-measure of 0.936 and 0.946 on Consumer–PointCare and Consumer–OnlinePractice respectively, and the best F1-measure of 0.891 when testing the Consumer–PointCare model on the Consumer–OnlinePractice dataset. Conclusion Healthcare consumer questions posted at Yahoo online communities can be reliably classified from professional questions posted by point-of-care clinicians and online physicians. The supervised machine-learning models are robust for this task. Our study will significantly benefit further development in automated consumer question answering.","language":"en","number":"6","urldate":"2022-12-02","journal":"Journal of Biomedical Informatics","author":[{"propositions":[],"lastnames":["Liu"],"firstnames":["Feifan"],"suffixes":[]},{"propositions":[],"lastnames":["Antieau"],"firstnames":["Lamont","D."],"suffixes":[]},{"propositions":[],"lastnames":["Yu"],"firstnames":["Hong"],"suffixes":[]}],"month":"December","year":"2011","keywords":"Medical question answering, Natural language processing, Question classification, Supervised machine learning, Support vector machines","pages":"1032–1038","bibtex":"@article{liu_toward_2011,\n\ttitle = {Toward automated consumer question answering: {Automatically} separating consumer questions from professional questions in the healthcare domain},\n\tvolume = {44},\n\tissn = {1532-0464},\n\tshorttitle = {Toward automated consumer question answering},\n\turl = {https://www.sciencedirect.com/science/article/pii/S1532046411001353},\n\tdoi = {10.1016/j.jbi.2011.08.008},\n\tabstract = {Objective\nBoth healthcare professionals and healthcare consumers have information needs that can be met through the use of computers, specifically via medical question answering systems. However, the information needs of both groups are different in terms of literacy levels and technical expertise, and an effective question answering system must be able to account for these differences if it is to formulate the most relevant responses for users from each group. In this paper, we propose that a first step toward answering the queries of different users is automatically classifying questions according to whether they were asked by healthcare professionals or consumers.\nDesign\nWe obtained two sets of consumer questions (∼10,000 questions in total) from Yahoo answers. The professional questions consist of two question collections: 4654 point-of-care questions (denoted as PointCare) obtained from interviews of a group of family doctors following patient visits and 5378 questions from physician practices through professional online services (denoted as OnlinePractice). With more than 20,000 questions combined, we developed supervised machine-learning models for automatic classification between consumer questions and professional questions. To evaluate the robustness of our models, we tested the model that was trained on the Consumer–PointCare dataset on the Consumer–OnlinePractice dataset. We evaluated both linguistic features and statistical features and examined how the characteristics in two different types of professional questions (PointCare vs. OnlinePractice) may affect the classification performance. We explored information gain for feature reduction and the back-off linguistic category features.\nResults\nThe 10-fold cross-validation results showed the best F1-measure of 0.936 and 0.946 on Consumer–PointCare and Consumer–OnlinePractice respectively, and the best F1-measure of 0.891 when testing the Consumer–PointCare model on the Consumer–OnlinePractice dataset.\nConclusion\nHealthcare consumer questions posted at Yahoo online communities can be reliably classified from professional questions posted by point-of-care clinicians and online physicians. The supervised machine-learning models are robust for this task. Our study will significantly benefit further development in automated consumer question answering.},\n\tlanguage = {en},\n\tnumber = {6},\n\turldate = {2022-12-02},\n\tjournal = {Journal of Biomedical Informatics},\n\tauthor = {Liu, Feifan and Antieau, Lamont D. and Yu, Hong},\n\tmonth = dec,\n\tyear = {2011},\n\tkeywords = {Medical question answering, Natural language processing, Question classification, Supervised machine learning, Support vector machines},\n\tpages = {1032--1038},\n}\n\n\n\n\n\n\n\n","author_short":["Liu, F.","Antieau, L. D.","Yu, H."],"key":"liu_toward_2011","id":"liu_toward_2011","bibbaseid":"liu-antieau-yu-towardautomatedconsumerquestionansweringautomaticallyseparatingconsumerquestionsfromprofessionalquestionsinthehealthcaredomain-2011","role":"author","urls":{"Paper":"https://www.sciencedirect.com/science/article/pii/S1532046411001353"},"keyword":["Medical question answering","Natural language processing","Question classification","Supervised machine learning","Support vector machines"],"metadata":{"authorlinks":{}},"html":""},"bibtype":"article","biburl":"https://bibbase.org/zotero/sarveshsoni","dataSources":["taWdMrienBzqHC2tC"],"keywords":["medical question answering","natural language processing","question classification","supervised machine learning","support vector machines"],"search_terms":["toward","automated","consumer","question","answering","automatically","separating","consumer","questions","professional","questions","healthcare","domain","liu","antieau","yu"],"title":"Toward automated consumer question answering: Automatically separating consumer questions from professional questions in the healthcare domain","year":2011}