Automated Medical Chart Review for Breast Cancer Outcomes Research: A Novel Natural Language Processing Extraction System

Automated Medical Chart Review for Breast Cancer Outcomes Research: A Novel Natural Language Processing Extraction System. Chen, Y., Hao, L., Zou, V., Hollander, Z., Ng, R., & Isaac, K. (manuscript submmited to The Lancet: Digital Health), 2021.
abstract bibtex

Background: Manually extracted data points from health records are collated on an institutional, provincial, and national level to facilitate clinical research. However, the labour-intensive clinical chart review process puts an increasing burden on healthcare systems. Therefore, an automated information extraction system is needed to ensure the timeliness and scalability of research data. Methods: We used a dataset of 100 synoptic operative and 100 pathology reports, evenly split into 50 reports in training and validation sets for each type. The training set guided our development of a Natural Language Processing (NLP) extraction system, which accepts scanned images of operative and pathology reports. The system uses a combination of rule-based and transfer learning methods to extract numeric encodings from text. We also developed visualization tools to compare the manual and automated extractions. Findings: A validation set of 50 operative and 50 pathology reports are used to compare the extraction accuracies between an MD student and the NLP system. An MD student yielded 92.1% (operative) and 99.8% (pathology) accuracies, while the NLP system achieved 91.9% (operative) and 96.0% (pathology) accuracies. Interpretation: The NLP system achieves near-human-level accuracy in both operative and pathology reports. The results support the deployment of this NLP system in production settings. Its use cases include 1) substituting human chart reviewers, 2) facilitating human reviewers through encoding recommendations, and 3) measuring the accuracy of human extractions.

@article{chen_automated_2021,
	title = {Automated {Medical} {Chart} {Review} for {Breast} {Cancer} {Outcomes} {Research}: {A} {Novel} {Natural} {Language} {Processing} {Extraction} {System}},
	abstract = {Background: Manually extracted data points from health records are collated on an institutional, provincial, and national level to facilitate clinical research. However, the labour-intensive clinical chart review process puts an increasing burden on healthcare systems. Therefore, an automated information extraction system is needed to ensure the timeliness and scalability of research data.

Methods: We used a dataset of 100 synoptic operative and 100 pathology reports, evenly split into 50 reports in training and validation sets for each type. The training set guided our development of a Natural Language Processing (NLP) extraction system, which accepts scanned images of operative and pathology reports. The system uses a combination of rule-based and transfer learning methods to extract numeric encodings from text. We also developed visualization tools to compare the manual and automated extractions.

Findings: A validation set of 50 operative and 50 pathology reports are used to compare the extraction accuracies between an MD student and the NLP system. An MD student yielded 92.1\% (operative) and 99.8\% (pathology) accuracies, while the NLP system achieved 91.9\% (operative) and 96.0\% (pathology) accuracies. 

Interpretation: The NLP system achieves near-human-level accuracy in both operative and pathology reports. The results support the deployment of this NLP system in production settings. Its use cases include 1) substituting human chart reviewers, 2) facilitating human reviewers through encoding recommendations, and 3) measuring the accuracy of human extractions.},
	journal = {(manuscript submmited to The Lancet: Digital Health)},
	author = {Chen, Yifu and Hao, Lucy and Zou, Vito and Hollander, Zsuzsanna and Ng, Raymond and Isaac, Kathryn},
	year = {2021},
}

Downloads: 0

{"_id":"xXq7ZaC7G3jYtpMiG","bibbaseid":"chen-hao-zou-hollander-ng-isaac-automatedmedicalchartreviewforbreastcanceroutcomesresearchanovelnaturallanguageprocessingextractionsystem-2021","author_short":["Chen, Y.","Hao, L.","Zou, V.","Hollander, Z.","Ng, R.","Isaac, K."],"bibdata":{"bibtype":"article","type":"article","title":"Automated Medical Chart Review for Breast Cancer Outcomes Research: A Novel Natural Language Processing Extraction System","abstract":"Background: Manually extracted data points from health records are collated on an institutional, provincial, and national level to facilitate clinical research. However, the labour-intensive clinical chart review process puts an increasing burden on healthcare systems. Therefore, an automated information extraction system is needed to ensure the timeliness and scalability of research data. Methods: We used a dataset of 100 synoptic operative and 100 pathology reports, evenly split into 50 reports in training and validation sets for each type. The training set guided our development of a Natural Language Processing (NLP) extraction system, which accepts scanned images of operative and pathology reports. The system uses a combination of rule-based and transfer learning methods to extract numeric encodings from text. We also developed visualization tools to compare the manual and automated extractions. Findings: A validation set of 50 operative and 50 pathology reports are used to compare the extraction accuracies between an MD student and the NLP system. An MD student yielded 92.1% (operative) and 99.8% (pathology) accuracies, while the NLP system achieved 91.9% (operative) and 96.0% (pathology) accuracies. Interpretation: The NLP system achieves near-human-level accuracy in both operative and pathology reports. The results support the deployment of this NLP system in production settings. Its use cases include 1) substituting human chart reviewers, 2) facilitating human reviewers through encoding recommendations, and 3) measuring the accuracy of human extractions.","journal":"(manuscript submmited to The Lancet: Digital Health)","author":[{"propositions":[],"lastnames":["Chen"],"firstnames":["Yifu"],"suffixes":[]},{"propositions":[],"lastnames":["Hao"],"firstnames":["Lucy"],"suffixes":[]},{"propositions":[],"lastnames":["Zou"],"firstnames":["Vito"],"suffixes":[]},{"propositions":[],"lastnames":["Hollander"],"firstnames":["Zsuzsanna"],"suffixes":[]},{"propositions":[],"lastnames":["Ng"],"firstnames":["Raymond"],"suffixes":[]},{"propositions":[],"lastnames":["Isaac"],"firstnames":["Kathryn"],"suffixes":[]}],"year":"2021","bibtex":"@article{chen_automated_2021,\n\ttitle = {Automated {Medical} {Chart} {Review} for {Breast} {Cancer} {Outcomes} {Research}: {A} {Novel} {Natural} {Language} {Processing} {Extraction} {System}},\n\tabstract = {Background: Manually extracted data points from health records are collated on an institutional, provincial, and national level to facilitate clinical research. However, the labour-intensive clinical chart review process puts an increasing burden on healthcare systems. Therefore, an automated information extraction system is needed to ensure the timeliness and scalability of research data.\n\nMethods: We used a dataset of 100 synoptic operative and 100 pathology reports, evenly split into 50 reports in training and validation sets for each type. The training set guided our development of a Natural Language Processing (NLP) extraction system, which accepts scanned images of operative and pathology reports. The system uses a combination of rule-based and transfer learning methods to extract numeric encodings from text. We also developed visualization tools to compare the manual and automated extractions.\n\nFindings: A validation set of 50 operative and 50 pathology reports are used to compare the extraction accuracies between an MD student and the NLP system. An MD student yielded 92.1\\% (operative) and 99.8\\% (pathology) accuracies, while the NLP system achieved 91.9\\% (operative) and 96.0\\% (pathology) accuracies. \n\nInterpretation: The NLP system achieves near-human-level accuracy in both operative and pathology reports. The results support the deployment of this NLP system in production settings. Its use cases include 1) substituting human chart reviewers, 2) facilitating human reviewers through encoding recommendations, and 3) measuring the accuracy of human extractions.},\n\tjournal = {(manuscript submmited to The Lancet: Digital Health)},\n\tauthor = {Chen, Yifu and Hao, Lucy and Zou, Vito and Hollander, Zsuzsanna and Ng, Raymond and Isaac, Kathryn},\n\tyear = {2021},\n}\n\n","author_short":["Chen, Y.","Hao, L.","Zou, V.","Hollander, Z.","Ng, R.","Isaac, K."],"key":"chen_automated_2021","id":"chen_automated_2021","bibbaseid":"chen-hao-zou-hollander-ng-isaac-automatedmedicalchartreviewforbreastcanceroutcomesresearchanovelnaturallanguageprocessingextractionsystem-2021","role":"author","urls":{},"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://api.zotero.org/users/8295825/collections/MG3XR5KT/items?key=Y7rfld8S8JPx6dVsAPoXiFrY&format=bibtex&limit=100","dataSources":["AH9hpyNrknNTSKxDe","jmbSxJ9M6o4AQrcDM","DE3TKJiMnMnFgFjjE"],"keywords":[],"search_terms":["automated","medical","chart","review","breast","cancer","outcomes","research","novel","natural","language","processing","extraction","system","chen","hao","zou","hollander","ng","isaac"],"title":"Automated Medical Chart Review for Breast Cancer Outcomes Research: A Novel Natural Language Processing Extraction System","year":2021}