{"_id":"Lwr4hpb7Yz82TAmHo","bibbaseid":"adamson-waskom-blarre-kelly-krismer-nemeth-gipetti-ritten-etal-approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords-2023","author_short":["Adamson, B. J","Waskom, M.","Blarre, A.","Kelly, J.","Krismer, K.","Nemeth, S.","Gipetti, J.","Ritten, J.","Harrison, K.","Ho, G.","Linzmayer, R.","Bansal, T.","Wilkinson, S.","Amster, G.","Estola, E.","Benedum, C. M","Fidyk, E.","Estevez, M.","Shapiro, W.","Cohen, A. B"],"bibdata":{"bibtype":"article","type":"article","title":"Approach to Machine Learning for Extraction of Real-World Data Variables from Electronic Health Records","url":"https://www.proquest.com/working-papers/approach-machine-learning-extraction-real-world/docview/2783519677/se-2","doi":"10.1101/2023.03.02.23286522","abstract":"Background: As artificial intelligence (AI) continues to advance with breakthroughs in natural language processing (NLP) and machine learning (ML), such as the development of models like OpenAI's ChatGPT, new opportunities are emerging for efficient curation of electronic health records (EHR) into real-world data (RWD) for evidence generation in oncology. Our objective is to describe the research and development of industry methods to promote transparency and explainability. Methods: We applied NLP with ML techniques to train, validate, and test the extraction of information from unstructured documents (eg, clinician notes, radiology reports, lab reports, etc.) to output a set of structured variables required for RWD analysis. This research used a nationwide electronic health record (EHR)-derived database. Models were selected based on performance. Variables curated with an approach using ML extraction are those where the value is determined solely based on an ML model (ie, not confirmed by abstraction), which identifies key information from visit notes and documents. These models do not predict future events or infer missing information. Results: We developed an approach using NLP and ML for extraction of clinically meaningful information from unstructured EHR documents and found high performance of output variables compared with variables curated by manually abstracted data. These extraction methods resulted in research-ready variables including initial cancer diagnosis with date, advanced/metastatic diagnosis with date, disease stage, histology, smoking status, surgery status with date, biomarker test results with dates, and oral treatments with dates. Conclusions: NLP and ML enable the extraction of retrospective clinical data in EHR with speed and scalability to help researchers learn from the experience of every person with cancer.","language":"English","journal":"MedRxiv","author":[{"propositions":[],"lastnames":["Adamson"],"firstnames":["Blythe","J"],"suffixes":[]},{"propositions":[],"lastnames":["Waskom"],"firstnames":["Michael"],"suffixes":[]},{"propositions":[],"lastnames":["Blarre"],"firstnames":["Auriane"],"suffixes":[]},{"propositions":[],"lastnames":["Kelly"],"firstnames":["Jonathan"],"suffixes":[]},{"propositions":[],"lastnames":["Krismer"],"firstnames":["Konstantin"],"suffixes":[]},{"propositions":[],"lastnames":["Nemeth"],"firstnames":["Sheila"],"suffixes":[]},{"propositions":[],"lastnames":["Gipetti"],"firstnames":["James"],"suffixes":[]},{"propositions":[],"lastnames":["Ritten"],"firstnames":["John"],"suffixes":[]},{"propositions":[],"lastnames":["Harrison"],"firstnames":["Katherine"],"suffixes":[]},{"propositions":[],"lastnames":["Ho"],"firstnames":["George"],"suffixes":[]},{"propositions":[],"lastnames":["Linzmayer"],"firstnames":["Robin"],"suffixes":[]},{"propositions":[],"lastnames":["Bansal"],"firstnames":["Tarun"],"suffixes":[]},{"propositions":[],"lastnames":["Wilkinson"],"firstnames":["Samuel"],"suffixes":[]},{"propositions":[],"lastnames":["Amster"],"firstnames":["Guy"],"suffixes":[]},{"propositions":[],"lastnames":["Estola"],"firstnames":["Evan"],"suffixes":[]},{"propositions":[],"lastnames":["Benedum"],"firstnames":["Corey","M"],"suffixes":[]},{"propositions":[],"lastnames":["Fidyk"],"firstnames":["Erin"],"suffixes":[]},{"propositions":[],"lastnames":["Estevez"],"firstnames":["Melissa"],"suffixes":[]},{"propositions":[],"lastnames":["Shapiro"],"firstnames":["Will"],"suffixes":[]},{"propositions":[],"lastnames":["Cohen"],"firstnames":["Aaron","B"],"suffixes":[]}],"month":"March","year":"2023","note":"Place: Cold Spring Harbor Publisher: Cold Spring Harbor Laboratory Press","keywords":"Artificial intelligence, Machine learning, Medical Sciences, Models, Diagnosis, Electronic health records, Electronic medical records, Learning algorithms, Metastases, Research & development–R&D","annote":"Última actualización - 2023-03-07","bibtex":"@article{adamson_approach_2023,\n\ttitle = {Approach to {Machine} {Learning} for {Extraction} of {Real}-{World} {Data} {Variables} from {Electronic} {Health} {Records}},\n\turl = {https://www.proquest.com/working-papers/approach-machine-learning-extraction-real-world/docview/2783519677/se-2},\n\tdoi = {10.1101/2023.03.02.23286522},\n\tabstract = {Background: As artificial intelligence (AI) continues to advance with breakthroughs in natural language processing (NLP) and machine learning (ML), such as the development of models like OpenAI's ChatGPT, new opportunities are emerging for efficient curation of electronic health records (EHR) into real-world data (RWD) for evidence generation in oncology. Our objective is to describe the research and development of industry methods to promote transparency and explainability. Methods: We applied NLP with ML techniques to train, validate, and test the extraction of information from unstructured documents (eg, clinician notes, radiology reports, lab reports, etc.) to output a set of structured variables required for RWD analysis. This research used a nationwide electronic health record (EHR)-derived database. Models were selected based on performance. Variables curated with an approach using ML extraction are those where the value is determined solely based on an ML model (ie, not confirmed by abstraction), which identifies key information from visit notes and documents. These models do not predict future events or infer missing information. Results: We developed an approach using NLP and ML for extraction of clinically meaningful information from unstructured EHR documents and found high performance of output variables compared with variables curated by manually abstracted data. These extraction methods resulted in research-ready variables including initial cancer diagnosis with date, advanced/metastatic diagnosis with date, disease stage, histology, smoking status, surgery status with date, biomarker test results with dates, and oral treatments with dates. Conclusions: NLP and ML enable the extraction of retrospective clinical data in EHR with speed and scalability to help researchers learn from the experience of every person with cancer.},\n\tlanguage = {English},\n\tjournal = {MedRxiv},\n\tauthor = {Adamson, Blythe J and Waskom, Michael and Blarre, Auriane and Kelly, Jonathan and Krismer, Konstantin and Nemeth, Sheila and Gipetti, James and Ritten, John and Harrison, Katherine and Ho, George and Linzmayer, Robin and Bansal, Tarun and Wilkinson, Samuel and Amster, Guy and Estola, Evan and Benedum, Corey M and Fidyk, Erin and Estevez, Melissa and Shapiro, Will and Cohen, Aaron B},\n\tmonth = mar,\n\tyear = {2023},\n\tnote = {Place: Cold Spring Harbor\nPublisher: Cold Spring Harbor Laboratory Press},\n\tkeywords = {Artificial intelligence, Machine learning, Medical Sciences, Models, Diagnosis, Electronic health records, Electronic medical records, Learning algorithms, Metastases, Research \\& development--R\\&D},\n\tannote = {Copyright - © 2023. This article is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.},\n\tannote = {Última actualización - 2023-03-07},\n}\n\n","author_short":["Adamson, B. J","Waskom, M.","Blarre, A.","Kelly, J.","Krismer, K.","Nemeth, S.","Gipetti, J.","Ritten, J.","Harrison, K.","Ho, G.","Linzmayer, R.","Bansal, T.","Wilkinson, S.","Amster, G.","Estola, E.","Benedum, C. M","Fidyk, E.","Estevez, M.","Shapiro, W.","Cohen, A. B"],"key":"adamson_approach_2023","id":"adamson_approach_2023","bibbaseid":"adamson-waskom-blarre-kelly-krismer-nemeth-gipetti-ritten-etal-approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords-2023","role":"author","urls":{"Paper":"https://www.proquest.com/working-papers/approach-machine-learning-extraction-real-world/docview/2783519677/se-2"},"keyword":["Artificial intelligence","Machine learning","Medical Sciences","Models","Diagnosis","Electronic health records","Electronic medical records","Learning algorithms","Metastases","Research & development–R&D"],"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://bibbase.org/network/files/22WYpzbBvi3hDHX7Y","dataSources":["cYu6uhMkeFHgRrEty","hLMh7bwHyFsPNWAEL","LKW3iRvnztCpLNTW7","TLD9JxqHfSQQ4r268","X9BvByJrC3kGJexn8","iovNvcnNYDGJcuMq2","NjZJ5ZmWhTtMZBfje"],"keywords":["artificial intelligence","machine learning","medical sciences","models","diagnosis","electronic health records","electronic medical records","learning algorithms","metastases","research & development–r&d"],"search_terms":["approach","machine","learning","extraction","real","world","data","variables","electronic","health","records","adamson","waskom","blarre","kelly","krismer","nemeth","gipetti","ritten","harrison","ho","linzmayer","bansal","wilkinson","amster","estola","benedum","fidyk","estevez","shapiro","cohen"],"title":"Approach to Machine Learning for Extraction of Real-World Data Variables from Electronic Health Records","year":2023}