AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding. Suglia, A., Greco, C., Baker, K., Part, J. L., Papaioannou, I., Eshghi, A., Konstas, I., & Lemon, O. In Al-Onaizan, Y., Bansal, M., & Chen, Y., editors, Findings of the Association for Computational Linguistics: EMNLP 2024, Miami, Florida, USA, November 12-16, 2024, pages 11101–11122, 2024. Association for Computational Linguistics.
AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding [link]Paper  doi  bibtex   
@inproceedings{DBLP:conf/emnlp/Suglia0BPPEKL24,
  author       = {Alessandro Suglia and
                  Claudio Greco and
                  Katie Baker and
                  Jose L. Part and
                  Ioannis Papaioannou and
                  Arash Eshghi and
                  Ioannis Konstas and
                  Oliver Lemon},
  editor       = {Yaser Al{-}Onaizan and
                  Mohit Bansal and
                  Yun{-}Nung Chen},
  title        = {AlanaVLM: {A} Multimodal Embodied {AI} Foundation Model for Egocentric
                  Video Understanding},
  booktitle    = {Findings of the Association for Computational Linguistics: {EMNLP}
                  2024, Miami, Florida, USA, November 12-16, 2024},
  pages        = {11101--11122},
  publisher    = {Association for Computational Linguistics},
  year         = {2024},
  url          = {https://doi.org/10.18653/v1/2024.findings-emnlp.649},
  doi          = {10.18653/V1/2024.FINDINGS-EMNLP.649},
  timestamp    = {Fri, 13 Jun 2025 01:00:00 +0200},
  biburl       = {https://dblp.org/rec/conf/emnlp/Suglia0BPPEKL24.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

Downloads: 0