Toward automated consumer question answering: Automatically separating consumer questions from professional questions in the healthcare domain. Liu, F., Antieau, L. D., & Yu, H. Journal of Biomedical Informatics, 44(6):1032–1038, December, 2011.
Toward automated consumer question answering: Automatically separating consumer questions from professional questions in the healthcare domain [link]Paper  doi  abstract   bibtex   
OBJECTIVE: Both healthcare professionals and healthcare consumers have information needs that can be met through the use of computers, specifically via medical question answering systems. However, the information needs of both groups are different in terms of literacy levels and technical expertise, and an effective question answering system must be able to account for these differences if it is to formulate the most relevant responses for users from each group. In this paper, we propose that a first step toward answering the queries of different users is automatically classifying questions according to whether they were asked by healthcare professionals or consumers. DESIGN: We obtained two sets of consumer questions (~10,000 questions in total) from Yahoo answers. The professional questions consist of two question collections: 4654 point-of-care questions (denoted as PointCare) obtained from interviews of a group of family doctors following patient visits and 5378 questions from physician practices through professional online services (denoted as OnlinePractice). With more than 20,000 questions combined, we developed supervised machine-learning models for automatic classification between consumer questions and professional questions. To evaluate the robustness of our models, we tested the model that was trained on the Consumer-PointCare dataset on the Consumer-OnlinePractice dataset. We evaluated both linguistic features and statistical features and examined how the characteristics in two different types of professional questions (PointCare vs. OnlinePractice) may affect the classification performance. We explored information gain for feature reduction and the back-off linguistic category features. RESULTS: The 10-fold cross-validation results showed the best F1-measure of 0.936 and 0.946 on Consumer-PointCare and Consumer-OnlinePractice respectively, and the best F1-measure of 0.891 when testing the Consumer-PointCare model on the Consumer-OnlinePractice dataset. CONCLUSION: Healthcare consumer questions posted at Yahoo online communities can be reliably classified from professional questions posted by point-of-care clinicians and online physicians. The supervised machine-learning models are robust for this task. Our study will significantly benefit further development in automated consumer question answering.
@article{liu_toward_2011,
	title = {Toward automated consumer question answering: {Automatically} separating consumer questions from professional questions in the healthcare domain},
	volume = {44},
	issn = {15320464},
	shorttitle = {Toward automated consumer question answering},
	url = {http://linkinghub.elsevier.com/retrieve/pii/S1532046411001353},
	doi = {10.1016/j.jbi.2011.08.008},
	abstract = {OBJECTIVE:
Both healthcare professionals and healthcare consumers have information needs that can be met through the use of computers, specifically via medical question answering systems. However, the information needs of both groups are different in terms of literacy levels and technical expertise, and an effective question answering system must be able to account for these differences if it is to formulate the most relevant responses for users from each group. In this paper, we propose that a first step toward answering the queries of different users is automatically classifying questions according to whether they were asked by healthcare professionals or consumers.

DESIGN:
We obtained two sets of consumer questions ({\textasciitilde}10,000 questions in total) from Yahoo answers. The professional questions consist of two question collections: 4654 point-of-care questions (denoted as PointCare) obtained from interviews of a group of family doctors following patient visits and 5378 questions from physician practices through professional online services (denoted as OnlinePractice). With more than 20,000 questions combined, we developed supervised machine-learning models for automatic classification between consumer questions and professional questions. To evaluate the robustness of our models, we tested the model that was trained on the Consumer-PointCare dataset on the Consumer-OnlinePractice dataset. We evaluated both linguistic features and statistical features and examined how the characteristics in two different types of professional questions (PointCare vs. OnlinePractice) may affect the classification performance. We explored information gain for feature reduction and the back-off linguistic category features.

RESULTS:
The 10-fold cross-validation results showed the best F1-measure of 0.936 and 0.946 on Consumer-PointCare and Consumer-OnlinePractice respectively, and the best F1-measure of 0.891 when testing the Consumer-PointCare model on the Consumer-OnlinePractice dataset.

CONCLUSION:
Healthcare consumer questions posted at Yahoo online communities can be reliably classified from professional questions posted by point-of-care clinicians and online physicians. The supervised machine-learning models are robust for this task. Our study will significantly benefit further development in automated consumer question answering.},
	language = {en},
	number = {6},
	urldate = {2016-11-30},
	journal = {Journal of Biomedical Informatics},
	author = {Liu, Feifan and Antieau, Lamont D. and Yu, Hong},
	month = dec,
	year = {2011},
	pmid = {21856442 PMCID: PMC3226885},
	keywords = {Artificial Intelligence, Consumer Participation, Databases, Factual, Delivery of Health Care, Humans, Information Dissemination, Information Storage and Retrieval, Internet, Point-of-Care Systems, Semantics, natural language processing},
	pages = {1032--1038},
}

Downloads: 0