Capabilities of Gemini Models in Medicine

Capabilities of Gemini Models in Medicine. Saab, K., Tu, T., Weng, W., Tanno, R., Stutz, D., Wulczyn, E., Zhang, F., Strother, T., Park, C., Vedadi, E., Chaves, J. Z., Hu, S., Schaekermann, M., Kamath, A., Cheng, Y., Barrett, D. G. T., Cheung, C., Mustafa, B., Palepu, A., McDuff, D., Hou, L., Golany, T., Liu, L., Alayrac, J., Houlsby, N., Tomasev, N., Freyberg, J., Lau, C., Kemp, J., Lai, J., Azizi, S., Kanada, K., Man, S., Kulkarni, K., Sun, R., Shakeri, S., He, L., Caine, B., Webson, A., Latysheva, N., Johnson, M., Mansfield, P., Lu, J., Rivlin, E., Anderson, J., Green, B., Wong, R., Krause, J., Shlens, J., Dominowska, E., Eslami, S. M. A., Chou, K., Cui, C., Vinyals, O., Kavukcuoglu, K., Manyika, J., Dean, J., Hassabis, D., Matias, Y., Webster, D., Barral, J., Corrado, G., Semturs, C., Mahdavi, S. S., Gottweis, J., Karthikesalingam, A., & Natarajan, V. May, 2024. arXiv:2404.18416 [cs]

Paper doi abstract bibtex

Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-Gemini, a family of highly capable multimodal models that are specialized in medicine with the ability to seamlessly use web search, and that can be efficiently tailored to novel modalities using custom encoders. We evaluate Med-Gemini on 14 medical benchmarks, establishing new state-of-the-art (SoTA) performance on 10 of them, and surpass the GPT-4 model family on every benchmark where a direct comparison is viable, often by a wide margin. On the popular MedQA (USMLE) benchmark, our best-performing Med-Gemini model achieves SoTA performance of 91.1% accuracy, using a novel uncertainty-guided search strategy. On 7 multimodal benchmarks including NEJM Image Challenges and MMMU (health & medicine), Med-Gemini improves over GPT-4V by an average relative margin of 44.5%. We demonstrate the effectiveness of Med-Gemini's long-context capabilities through SoTA performance on a needle-in-a-haystack retrieval task from long de-identified health records and medical video question answering, surpassing prior bespoke methods using only in-context learning. Finally, Med-Gemini's performance suggests real-world utility by surpassing human experts on tasks such as medical text summarization, alongside demonstrations of promising potential for multimodal medical dialogue, medical research and education. Taken together, our results offer compelling evidence for Med-Gemini's potential, although further rigorous evaluation will be crucial before real-world deployment in this safety-critical domain.

@misc{saab_capabilities_2024,
	title = {Capabilities of {Gemini} {Models} in {Medicine}},
	url = {http://arxiv.org/abs/2404.18416},
	doi = {10.48550/arXiv.2404.18416},
	abstract = {Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-Gemini, a family of highly capable multimodal models that are specialized in medicine with the ability to seamlessly use web search, and that can be efficiently tailored to novel modalities using custom encoders. We evaluate Med-Gemini on 14 medical benchmarks, establishing new state-of-the-art (SoTA) performance on 10 of them, and surpass the GPT-4 model family on every benchmark where a direct comparison is viable, often by a wide margin. On the popular MedQA (USMLE) benchmark, our best-performing Med-Gemini model achieves SoTA performance of 91.1\% accuracy, using a novel uncertainty-guided search strategy. On 7 multimodal benchmarks including NEJM Image Challenges and MMMU (health \& medicine), Med-Gemini improves over GPT-4V by an average relative margin of 44.5\%. We demonstrate the effectiveness of Med-Gemini's long-context capabilities through SoTA performance on a needle-in-a-haystack retrieval task from long de-identified health records and medical video question answering, surpassing prior bespoke methods using only in-context learning. Finally, Med-Gemini's performance suggests real-world utility by surpassing human experts on tasks such as medical text summarization, alongside demonstrations of promising potential for multimodal medical dialogue, medical research and education. Taken together, our results offer compelling evidence for Med-Gemini's potential, although further rigorous evaluation will be crucial before real-world deployment in this safety-critical domain.},
	urldate = {2024-05-04},
	publisher = {arXiv},
	author = {Saab, Khaled and Tu, Tao and Weng, Wei-Hung and Tanno, Ryutaro and Stutz, David and Wulczyn, Ellery and Zhang, Fan and Strother, Tim and Park, Chunjong and Vedadi, Elahe and Chaves, Juanma Zambrano and Hu, Szu-Yeu and Schaekermann, Mike and Kamath, Aishwarya and Cheng, Yong and Barrett, David G. T. and Cheung, Cathy and Mustafa, Basil and Palepu, Anil and McDuff, Daniel and Hou, Le and Golany, Tomer and Liu, Luyang and Alayrac, Jean-baptiste and Houlsby, Neil and Tomasev, Nenad and Freyberg, Jan and Lau, Charles and Kemp, Jonas and Lai, Jeremy and Azizi, Shekoofeh and Kanada, Kimberly and Man, SiWai and Kulkarni, Kavita and Sun, Ruoxi and Shakeri, Siamak and He, Luheng and Caine, Ben and Webson, Albert and Latysheva, Natasha and Johnson, Melvin and Mansfield, Philip and Lu, Jian and Rivlin, Ehud and Anderson, Jesper and Green, Bradley and Wong, Renee and Krause, Jonathan and Shlens, Jonathon and Dominowska, Ewa and Eslami, S. M. Ali and Chou, Katherine and Cui, Claire and Vinyals, Oriol and Kavukcuoglu, Koray and Manyika, James and Dean, Jeff and Hassabis, Demis and Matias, Yossi and Webster, Dale and Barral, Joelle and Corrado, Greg and Semturs, Christopher and Mahdavi, S. Sara and Gottweis, Juraj and Karthikesalingam, Alan and Natarajan, Vivek},
	month = may,
	year = {2024},
	note = {arXiv:2404.18416 [cs]},
	keywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, notion},
}

Downloads: 0