A Graph-Based Approach for Inferring Semantic Descriptions of Wikipedia Tables. Vu, B., Knoblock, C. A., Szekely, P., Pham, M., & Pujara, J. In Hotho, A., Blomqvist, E., Dietze, S., Fokoue, A., Ding, Y., Barnaghi, P., Haller, A., Dragoni, M., & Alani, H., editors, The Semantic Web – ISWC 2021, pages 304–320, 2021. Springer International Publishing.
A Graph-Based Approach for Inferring Semantic Descriptions of Wikipedia Tables [pptx]Slides  abstract   bibtex   4 downloads  
There are millions of high-quality tables available in Wikipedia. These tables cover many domains and contain useful information. To make use of these tables for data discovery or data integration, we need precise descriptions of the concepts and relationships in the data, known as semantic descriptions. However, creating semantic descriptions is a complex process requiring considerable manual effort and can be error prone. In this paper, we present a novel probabilistic approach for automatically building semantic descriptions of Wikipedia tables. Our approach leverages hyperlinks in a Wikipedia table and existing knowledge in Wikidata to construct a graph of possible relationships in the table and its context, and then it uses collective inference to distinguish genuine and spurious relationships to form the final semantic description. In contrast to existing methods, our solution can handle tables that require complex semantic descriptions of n-ary relations (e.g., the population of a country in a particular year) or implicit contextual values to describe the data accurately. In our empirical evaluation, our approach outperforms state-of-the-art systems on the SemTab2020 dataset and outperforms those systems by as much as 28% in F1 score on a large set of Wikipedia tables.
@InProceedings{10.1007/978-3-030-88361-4_18,
author="Vu, Binh and Knoblock, Craig A. and Szekely, Pedro and Pham, Minh and Pujara, Jay",
editor="Hotho, Andreas and Blomqvist, Eva and Dietze, Stefan and Fokoue, Achille and Ding, Ying and Barnaghi, Payam and Haller, Armin and Dragoni, Mauro and Alani, Harith",
title="A Graph-Based Approach for Inferring Semantic Descriptions of Wikipedia Tables",
booktitle="The Semantic Web -- ISWC 2021",
year="2021",
publisher="Springer International Publishing",
pages="304--320",
abstract="There are millions of high-quality tables available in Wikipedia. These tables cover many domains and contain useful information. To make use of these tables for data discovery or data integration, we need precise descriptions of the concepts and relationships in the data, known as semantic descriptions. However, creating semantic descriptions is a complex process requiring considerable manual effort and can be error prone. In this paper, we present a novel probabilistic approach for automatically building semantic descriptions of Wikipedia tables. Our approach leverages hyperlinks in a Wikipedia table and existing knowledge in Wikidata to construct a graph of possible relationships in the table and its context, and then it uses collective inference to distinguish genuine and spurious relationships to form the final semantic description. In contrast to existing methods, our solution can handle tables that require complex semantic descriptions of n-ary relations (e.g., the population of a country in a particular year) or implicit contextual values to describe the data accurately. In our empirical evaluation, our approach outperforms state-of-the-art systems on the SemTab2020 dataset and outperforms those systems by as much as 28{\%} in F1 score on a large set of Wikipedia tables.",
isbn="978-3-030-88361-4",
URLslides = "http://usc-isi-i2.github.io/slides/vu-iswc21-slides.pptx"
}

Downloads: 4