Ranking medical terms to support expansion of lay language resources for patient comprehension of electronic health record notes: adapted distant supervision approach

Ranking medical terms to support expansion of lay language resources for patient comprehension of electronic health record notes: adapted distant supervision approach. Chen, J., Jagannatha, A. N., Fodeh, S. J., & Yu, H. JMIR medical informatics, 5(4):e42, October, 2017.
doi abstract bibtex

BACKGROUND: Medical terms are a major obstacle for patients to comprehend their electronic health record (EHR) notes. Clinical natural language processing (NLP) systems that link EHR terms to lay terms or definitions allow patients to easily access helpful information when reading through their EHR notes, and have shown to improve patient EHR comprehension. However, high-quality lay language resources for EHR terms are very limited in the public domain. Because expanding and curating such a resource is a costly process, it is beneficial and even necessary to identify terms important for patient EHR comprehension first. OBJECTIVE: We aimed to develop an NLP system, called adapted distant supervision (ADS), to rank candidate terms mined from EHR corpora. We will give EHR terms ranked as high by ADS a higher priority for lay language annotation-that is, creating lay definitions for these terms. METHODS: Adapted distant supervision uses distant supervision from consumer health vocabulary and transfer learning to adapt itself to solve the problem of ranking EHR terms in the target domain. We investigated 2 state-of-the-art transfer learning algorithms (ie, feature space augmentation and supervised distant supervision) and designed 5 types of learning features, including distributed word representations learned from large EHR data for ADS. For evaluating ADS, we asked domain experts to annotate 6038 candidate terms as important or nonimportant for EHR comprehension. We then randomly divided these data into the target-domain training data (1000 examples) and the evaluation data (5038 examples). We compared ADS with 2 strong baselines, including standard supervised learning, on the evaluation data. RESULTS: The ADS system using feature space augmentation achieved the best average precision, 0.850, on the evaluation set when using 1000 target-domain training examples. The ADS system using supervised distant supervision achieved the best average precision, 0.819, on the evaluation set when using only 100 target-domain training examples. The 2 ADS systems both performed significantly better than the baseline systems (P\textless.001 for all measures and all conditions). Using a rich set of learning features contributed to ADS's performance substantially. CONCLUSIONS: ADS can effectively rank terms mined from EHRs. Transfer learning improved ADS's performance even with a small number of target-domain training examples. EHR terms prioritized by ADS were used to expand a lay language resource that supports patient EHR comprehension. The top 10,000 EHR terms ranked by ADS are available upon request.

@article{chen_ranking_2017,
	title = {Ranking medical terms to support expansion of lay language resources for patient comprehension of electronic health record notes: adapted distant supervision approach},
	volume = {5},
	issn = {2291-9694},
	shorttitle = {Ranking medical terms to support expansion of lay language resources for patient comprehension of electronic health record notes},
	doi = {10.2196/medinform.8531},
	abstract = {BACKGROUND: Medical terms are a major obstacle for patients to comprehend their electronic health record (EHR) notes. Clinical natural language processing (NLP) systems that link EHR terms to lay terms or definitions allow patients to easily access helpful information when reading through their EHR notes, and have shown to improve patient EHR comprehension. However, high-quality lay language resources for EHR terms are very limited in the public domain. Because expanding and curating such a resource is a costly process, it is beneficial and even necessary to identify terms important for patient EHR comprehension first.
OBJECTIVE: We aimed to develop an NLP system, called adapted distant supervision (ADS), to rank candidate terms mined from EHR corpora. We will give EHR terms ranked as high by ADS a higher priority for lay language annotation-that is, creating lay definitions for these terms.
METHODS: Adapted distant supervision uses distant supervision from consumer health vocabulary and transfer learning to adapt itself to solve the problem of ranking EHR terms in the target domain. We investigated 2 state-of-the-art transfer learning algorithms (ie, feature space augmentation and supervised distant supervision) and designed 5 types of learning features, including distributed word representations learned from large EHR data for ADS. For evaluating ADS, we asked domain experts to annotate 6038 candidate terms as important or nonimportant for EHR comprehension. We then randomly divided these data into the target-domain training data (1000 examples) and the evaluation data (5038 examples). We compared ADS with 2 strong baselines, including standard supervised learning, on the evaluation data.
RESULTS: The ADS system using feature space augmentation achieved the best average precision, 0.850, on the evaluation set when using 1000 target-domain training examples. The ADS system using supervised distant supervision achieved the best average precision, 0.819, on the evaluation set when using only 100 target-domain training examples. The 2 ADS systems both performed significantly better than the baseline systems (P{\textless}.001 for all measures and all conditions). Using a rich set of learning features contributed to ADS's performance substantially.
CONCLUSIONS: ADS can effectively rank terms mined from EHRs. Transfer learning improved ADS's performance even with a small number of target-domain training examples. EHR terms prioritized by ADS were used to expand a lay language resource that supports patient EHR comprehension. The top 10,000 EHR terms ranked by ADS are available upon request.},
	language = {eng},
	number = {4},
	journal = {JMIR medical informatics},
	author = {Chen, Jinying and Jagannatha, Abhyuday N. and Fodeh, Samah J. and Yu, Hong},
	month = oct,
	year = {2017},
	pmid = {29089288},
	pmcid = {PMC5686421},
	keywords = {Information extraction, electronic health records, lexical entry selection, natural language processing, transfer learning},
	pages = {e42},
}

Downloads: 0

{"_id":"YuMxWzi7aybvaChBC","bibbaseid":"chen-jagannatha-fodeh-yu-rankingmedicaltermstosupportexpansionoflaylanguageresourcesforpatientcomprehensionofelectronichealthrecordnotesadapteddistantsupervisionapproach-2017","author_short":["Chen, J.","Jagannatha, A. N.","Fodeh, S. J.","Yu, H."],"bibdata":{"bibtype":"article","type":"article","title":"Ranking medical terms to support expansion of lay language resources for patient comprehension of electronic health record notes: adapted distant supervision approach","volume":"5","issn":"2291-9694","shorttitle":"Ranking medical terms to support expansion of lay language resources for patient comprehension of electronic health record notes","doi":"10.2196/medinform.8531","abstract":"BACKGROUND: Medical terms are a major obstacle for patients to comprehend their electronic health record (EHR) notes. Clinical natural language processing (NLP) systems that link EHR terms to lay terms or definitions allow patients to easily access helpful information when reading through their EHR notes, and have shown to improve patient EHR comprehension. However, high-quality lay language resources for EHR terms are very limited in the public domain. Because expanding and curating such a resource is a costly process, it is beneficial and even necessary to identify terms important for patient EHR comprehension first. OBJECTIVE: We aimed to develop an NLP system, called adapted distant supervision (ADS), to rank candidate terms mined from EHR corpora. We will give EHR terms ranked as high by ADS a higher priority for lay language annotation-that is, creating lay definitions for these terms. METHODS: Adapted distant supervision uses distant supervision from consumer health vocabulary and transfer learning to adapt itself to solve the problem of ranking EHR terms in the target domain. We investigated 2 state-of-the-art transfer learning algorithms (ie, feature space augmentation and supervised distant supervision) and designed 5 types of learning features, including distributed word representations learned from large EHR data for ADS. For evaluating ADS, we asked domain experts to annotate 6038 candidate terms as important or nonimportant for EHR comprehension. We then randomly divided these data into the target-domain training data (1000 examples) and the evaluation data (5038 examples). We compared ADS with 2 strong baselines, including standard supervised learning, on the evaluation data. RESULTS: The ADS system using feature space augmentation achieved the best average precision, 0.850, on the evaluation set when using 1000 target-domain training examples. The ADS system using supervised distant supervision achieved the best average precision, 0.819, on the evaluation set when using only 100 target-domain training examples. The 2 ADS systems both performed significantly better than the baseline systems (P\\textless.001 for all measures and all conditions). Using a rich set of learning features contributed to ADS's performance substantially. CONCLUSIONS: ADS can effectively rank terms mined from EHRs. Transfer learning improved ADS's performance even with a small number of target-domain training examples. EHR terms prioritized by ADS were used to expand a lay language resource that supports patient EHR comprehension. The top 10,000 EHR terms ranked by ADS are available upon request.","language":"eng","number":"4","journal":"JMIR medical informatics","author":[{"propositions":[],"lastnames":["Chen"],"firstnames":["Jinying"],"suffixes":[]},{"propositions":[],"lastnames":["Jagannatha"],"firstnames":["Abhyuday","N."],"suffixes":[]},{"propositions":[],"lastnames":["Fodeh"],"firstnames":["Samah","J."],"suffixes":[]},{"propositions":[],"lastnames":["Yu"],"firstnames":["Hong"],"suffixes":[]}],"month":"October","year":"2017","pmid":"29089288","pmcid":"PMC5686421","keywords":"Information extraction, electronic health records, lexical entry selection, natural language processing, transfer learning","pages":"e42","bibtex":"@article{chen_ranking_2017,\n\ttitle = {Ranking medical terms to support expansion of lay language resources for patient comprehension of electronic health record notes: adapted distant supervision approach},\n\tvolume = {5},\n\tissn = {2291-9694},\n\tshorttitle = {Ranking medical terms to support expansion of lay language resources for patient comprehension of electronic health record notes},\n\tdoi = {10.2196/medinform.8531},\n\tabstract = {BACKGROUND: Medical terms are a major obstacle for patients to comprehend their electronic health record (EHR) notes. Clinical natural language processing (NLP) systems that link EHR terms to lay terms or definitions allow patients to easily access helpful information when reading through their EHR notes, and have shown to improve patient EHR comprehension. However, high-quality lay language resources for EHR terms are very limited in the public domain. Because expanding and curating such a resource is a costly process, it is beneficial and even necessary to identify terms important for patient EHR comprehension first.\nOBJECTIVE: We aimed to develop an NLP system, called adapted distant supervision (ADS), to rank candidate terms mined from EHR corpora. We will give EHR terms ranked as high by ADS a higher priority for lay language annotation-that is, creating lay definitions for these terms.\nMETHODS: Adapted distant supervision uses distant supervision from consumer health vocabulary and transfer learning to adapt itself to solve the problem of ranking EHR terms in the target domain. We investigated 2 state-of-the-art transfer learning algorithms (ie, feature space augmentation and supervised distant supervision) and designed 5 types of learning features, including distributed word representations learned from large EHR data for ADS. For evaluating ADS, we asked domain experts to annotate 6038 candidate terms as important or nonimportant for EHR comprehension. We then randomly divided these data into the target-domain training data (1000 examples) and the evaluation data (5038 examples). We compared ADS with 2 strong baselines, including standard supervised learning, on the evaluation data.\nRESULTS: The ADS system using feature space augmentation achieved the best average precision, 0.850, on the evaluation set when using 1000 target-domain training examples. The ADS system using supervised distant supervision achieved the best average precision, 0.819, on the evaluation set when using only 100 target-domain training examples. The 2 ADS systems both performed significantly better than the baseline systems (P{\\textless}.001 for all measures and all conditions). Using a rich set of learning features contributed to ADS's performance substantially.\nCONCLUSIONS: ADS can effectively rank terms mined from EHRs. Transfer learning improved ADS's performance even with a small number of target-domain training examples. EHR terms prioritized by ADS were used to expand a lay language resource that supports patient EHR comprehension. The top 10,000 EHR terms ranked by ADS are available upon request.},\n\tlanguage = {eng},\n\tnumber = {4},\n\tjournal = {JMIR medical informatics},\n\tauthor = {Chen, Jinying and Jagannatha, Abhyuday N. and Fodeh, Samah J. and Yu, Hong},\n\tmonth = oct,\n\tyear = {2017},\n\tpmid = {29089288},\n\tpmcid = {PMC5686421},\n\tkeywords = {Information extraction, electronic health records, lexical entry selection, natural language processing, transfer learning},\n\tpages = {e42},\n}\n\n","author_short":["Chen, J.","Jagannatha, A. N.","Fodeh, S. J.","Yu, H."],"key":"chen_ranking_2017","id":"chen_ranking_2017","bibbaseid":"chen-jagannatha-fodeh-yu-rankingmedicaltermstosupportexpansionoflaylanguageresourcesforpatientcomprehensionofelectronichealthrecordnotesadapteddistantsupervisionapproach-2017","role":"author","urls":{},"keyword":["Information extraction","electronic health records","lexical entry selection","natural language processing","transfer learning"],"metadata":{"authorlinks":{}},"html":""},"bibtype":"article","biburl":"http://fenway.cs.uml.edu/papers/pubs-all.bib","dataSources":["TqaA9miSB65nRfS5H"],"keywords":["information extraction","electronic health records","lexical entry selection","natural language processing","transfer learning"],"search_terms":["ranking","medical","terms","support","expansion","lay","language","resources","patient","comprehension","electronic","health","record","notes","adapted","distant","supervision","approach","chen","jagannatha","fodeh","yu"],"title":"Ranking medical terms to support expansion of lay language resources for patient comprehension of electronic health record notes: adapted distant supervision approach","year":2017}