Evaluating the Performance of ChatGPT in Ophthalmology: An Analysis of its Successes and Shortcomings. Antaki, F., Touma, S., Milad, D., El-Khoury, J., & Duval, R. MedRxiv, January, 2023. Place: Cold Spring Harbor Publisher: Cold Spring Harbor Laboratory Press
Evaluating the Performance of ChatGPT in Ophthalmology: An Analysis of its Successes and Shortcomings [link]Paper  doi  abstract   bibtex   
We tested the accuracy of ChatGPT, a large language model (LLM), in the ophthalmology question-answering space using two popular multiple choice question banks used for the high-stakes Ophthalmic Knowledge Assessment Program (OKAP) exam. The testing sets were of easy-to-moderate difficulty and were diversified, including recall, interpretation, practical and clinical decision-making problems. ChatGPT achieved 55.8% and 42.7% accuracy in the two 260-question simulated exams. Its performance varied across subspecialties, with the best results in general medicine and the worst in neuro-ophthalmology and ophthalmic pathology and intraocular tumors. These results are encouraging but suggest that specialising LLMs through domain-specific pre-training may be necessary to improve their performance in ophthalmic subspecialties.
@article{antaki_evaluating_2023,
	title = {Evaluating the {Performance} of {ChatGPT} in {Ophthalmology}: {An} {Analysis} of its {Successes} and {Shortcomings}},
	url = {https://www.proquest.com/working-papers/evaluating-performance-chatgpt-ophthalmology/docview/2768841875/se-2},
	doi = {10.1101/2023.01.22.23284882},
	abstract = {We tested the accuracy of ChatGPT, a large language model (LLM), in the ophthalmology question-answering space using two popular multiple choice question banks used for the high-stakes Ophthalmic Knowledge Assessment Program (OKAP) exam. The testing sets were of easy-to-moderate difficulty and were diversified, including recall, interpretation, practical and clinical decision-making problems. ChatGPT achieved 55.8\% and 42.7\% accuracy in the two 260-question simulated exams. Its performance varied across subspecialties, with the best results in general medicine and the worst in neuro-ophthalmology and ophthalmic pathology and intraocular tumors. These results are encouraging but suggest that specialising LLMs through domain-specific pre-training may be necessary to improve their performance in ophthalmic subspecialties.},
	language = {English},
	journal = {MedRxiv},
	author = {Antaki, Fares and Touma, Samir and Milad, Daniel and El-Khoury, Jonathan and Duval, Renaud},
	month = jan,
	year = {2023},
	note = {Place: Cold Spring Harbor
Publisher: Cold Spring Harbor Laboratory Press},
	keywords = {Medical Sciences, Decision making},
	annote = {Copyright - © 2023. This article is published under http://creativecommons.org/licenses/by-nd/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.},
	annote = {Última actualización - 2023-01-27},
}

Downloads: 0