Explicit Knowledge Integration for Knowledge-Aware Visual Question Answering about Named Entities. Adjali, O., Grimal, P., Ferret, O., Ghannay, S., & Le Borgne, H. In International Conference on Multimedia Retrieval (ICMR), 2023.
Explicit Knowledge Integration for Knowledge-Aware Visual Question Answering about Named Entities [link]Hal  Explicit Knowledge Integration for Knowledge-Aware Visual Question Answering about Named Entities [link]Paper  doi  abstract   bibtex   3 downloads  
Recent years have shown an unprecedented growth of interest in Vision-Language related tasks, with the need to address the inherent challenges of integrating linguistic and visual information to solve real-world applications. Such a typical task is Visual Question Answering (VQA), which aims at answering questions about visual content. The limitations of the VQA task in terms of question redundancy and poor linguistic variability encouraged researchers to propose Knowledge-aware Visual Question Answering tasks as a natural extension of VQA. In this paper, we tackle the KVQAE (Knowledge-based Visual Question Answering about named Entities) task, which proposes to answer questions about named entities defined in a knowledge base and grounded in a visual content. In particular, beside the textual and visual information, we propose to leverage the structural information extracted from syntactic dependency trees and external knowledge graphs to help answer questions about a large spectrum of entities of various types. Thus, by combining contextual and graph-based representations using Graph Convolutional Networks (GCNs), we are able to learn meaningful embeddings for information retrieval tasks. Experiments on the KVQAE public dataset show how our approach improves the state-of-the art baselines while demonstrating the interest of injecting external knowledge to enhance multimodal information retrieval.
@inproceedings{adjali2023icmr,
  title     = {Explicit Knowledge Integration for Knowledge-Aware Visual Question Answering about Named Entities},
  author    = {Adjali, Omar and Grimal, Paul and Ferret, Olivier and Ghannay, Sahar and {Le Borgne}, Herv{\'e}},
  booktitle = {International Conference on Multimedia Retrieval (ICMR)},
  location  = {Thessaloniki, Greece},
  year      = {2023},
  url_HAL   = {https://universite-paris-saclay.hal.science/cea-04172061/},
  url       = {https://dl.acm.org/doi/abs/10.1145/3591106.3592227},
  doi       = {10.1145/3591106.3592227},
  abstract  = {Recent years have shown an unprecedented growth of interest in Vision-Language related tasks, with the need to address the inherent challenges of integrating linguistic and visual information to solve real-world applications. Such a typical task is Visual Question Answering (VQA), which aims at answering questions about visual content. The limitations of the VQA task in terms of question redundancy and poor linguistic variability encouraged researchers to propose Knowledge-aware Visual Question Answering tasks as a natural extension of VQA. In this paper, we tackle the KVQAE (Knowledge-based Visual Question Answering about named Entities) task, which proposes to answer questions about named entities defined in a knowledge base and grounded in a visual content. In particular, beside the textual and visual information, we propose to leverage the structural information extracted from syntactic dependency trees and external knowledge graphs to help answer questions about a large spectrum of entities of various types. Thus, by combining contextual and graph-based representations using Graph Convolutional Networks (GCNs), we are able to learn meaningful embeddings for information retrieval tasks. Experiments on the KVQAE public dataset show how our approach improves the state-of-the art baselines while demonstrating the interest of injecting external knowledge to enhance multimodal information retrieval.},
  keywords  = {kvqae}
}

Downloads: 3