SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM. Qiu, J., Madotto, A., Lin, Z., Crook, P. A., Xu, Y. E., Damavandi, B., Dong, X., Faloutsos, C., Li, L., & Moon, S. In Al-Onaizan, Y., Bansal, M., & Chen, Y., editors, Findings of the Association for Computational Linguistics: EMNLP 2024, Miami, Florida, USA, November 12-16, 2024, pages 247–266, 2024. Association for Computational Linguistics.
SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM [link]Paper  bibtex   
@inproceedings{DBLP:conf/emnlp/QiuMLCXD0F0M24,
  author       = {Jielin Qiu and
                  Andrea Madotto and
                  Zhaojiang Lin and
                  Paul A. Crook and
                  Yifan Ethan Xu and
                  Babak Damavandi and
                  Xin Dong and
                  Christos Faloutsos and
                  Lei Li and
                  Seungwhan Moon},
  editor       = {Yaser Al{-}Onaizan and
                  Mohit Bansal and
                  Yun{-}Nung Chen},
  title        = {SnapNTell: Enhancing Entity-Centric Visual Question Answering with
                  Retrieval Augmented Multimodal {LLM}},
  booktitle    = {Findings of the Association for Computational Linguistics: {EMNLP}
                  2024, Miami, Florida, USA, November 12-16, 2024},
  pages        = {247--266},
  publisher    = {Association for Computational Linguistics},
  year         = {2024},
  url          = {https://aclanthology.org/2024.findings-emnlp.14},
  timestamp    = {Mon, 18 Nov 2024 09:05:59 +0100},
  biburl       = {https://dblp.org/rec/conf/emnlp/QiuMLCXD0F0M24.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

Downloads: 0