Visual Cropping Improves Zero-Shot Question Answering of Multimodal Large Language Models. Zhang, J., Khayatkhoei, M., Chhikara, P., & Ilievski, F. In Advances in Neural Information Processing Systems Workshop on Robustness of Few-shot and Zero-shot Learning in Foundation Models, December, 2023.
Visual Cropping Improves Zero-Shot Question Answering of Multimodal Large Language Models [link]Paper  Visual Cropping Improves Zero-Shot Question Answering of Multimodal Large Language Models [link]Link  bibtex   
@InProceedings{zhang2023visual-crop,
  title={Visual Cropping Improves Zero-Shot Question Answering of Multimodal Large Language Models},
  author={Zhang, Jiarui and Khayatkhoei, Mahyar and Chhikara, Prateek and Ilievski, Filip},
  booktitle={Advances in Neural Information Processing Systems Workshop on Robustness of Few-shot and Zero-shot Learning in Foundation Models},
  year={2023},
  month={December},
  url_Paper={https://openreview.net/pdf?id=YrYcoV2dAk},
  url_Link={https://neurips.cc/virtual/2023/76680},
  ISIArea = {ML, VISTA, NLP}
}

Downloads: 0