\n
\n\n \n \n \n \n \n \n D-REPR: A Language for Describing and Mapping Diversely-Structured Data Sources to RDF.\n \n \n \n \n\n\n \n Vu, B.; Pujara, J.; and Knoblock, C. A.\n\n\n \n\n\n\n In
Proceedings of the 10th International Conference on Knowledge Capture, of
K-CAP '19, pages 189–196, New York, NY, USA, September 2019. Association for Computing Machinery\n
\n\n
\n\n
\n\n
\n\n \n \n Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n \n \n 17 downloads\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n\n\n\n
\n
@inproceedings{10.1145/3360901.3364449,\n author = {Vu, Binh and Pujara, Jay and Knoblock, Craig A.},\n title = {D-REPR: A Language for Describing and Mapping Diversely-Structured Data Sources to RDF},\n month = {September},\n year = {2019},\n isbn = {9781450370080},\n publisher = {Association for Computing Machinery},\n address = {New York, NY, USA},\n url = {https://doi.org/10.1145/3360901.3364449},\n doi = {10.1145/3360901.3364449},\n abstract = {Publishing data sources to knowledge graphs is a complicated and laborious process as data sources are often heterogeneous, hierarchical and interlinked. As an example, food price datasets may contain product prices of various units at different markets and times, and different providers can have many choices of formats such as CSV, JSON or spreadsheet. Beyond data formats, these datasets may have differing layout, where one dataset may be organized as a row-based table or relational table (prices are in one column), while another may use a matrix table (prices are in one matrix). To address these problems, we present a novel data description language for mapping datasets to RDF. In particular, our language supports specifying the locations of source attributes in the sources, mapping of the attributes to ontologies, and simple rules to join the data of these attributes to output final RDF triples. Unlike existing approaches, our language is not restricted to specific data layouts such as the Nested Relational Model, or to specific data formats, such as spreadsheet. Our broad data description language presents a format-independent solution, allowing interlinking among multiple heterogeneous sources and representing many diverse data structures that existing tools are unable to handle.},\n booktitle = {Proceedings of the 10th International Conference on Knowledge Capture},\n pages = {189–196},\n numpages = {8},\n keywords = {linked data, knowledge graph, rdf mapping},\n location = {Marina Del Rey, CA, USA},\n series = {K-CAP '19}\n}\n\n
\n
\n\n\n
\n Publishing data sources to knowledge graphs is a complicated and laborious process as data sources are often heterogeneous, hierarchical and interlinked. As an example, food price datasets may contain product prices of various units at different markets and times, and different providers can have many choices of formats such as CSV, JSON or spreadsheet. Beyond data formats, these datasets may have differing layout, where one dataset may be organized as a row-based table or relational table (prices are in one column), while another may use a matrix table (prices are in one matrix). To address these problems, we present a novel data description language for mapping datasets to RDF. In particular, our language supports specifying the locations of source attributes in the sources, mapping of the attributes to ontologies, and simple rules to join the data of these attributes to output final RDF triples. Unlike existing approaches, our language is not restricted to specific data layouts such as the Nested Relational Model, or to specific data formats, such as spreadsheet. Our broad data description language presents a format-independent solution, allowing interlinking among multiple heterogeneous sources and representing many diverse data structures that existing tools are unable to handle.\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Learning Semantic Models of Data Sources Using Probabilistic Graphical Models.\n \n \n \n \n\n\n \n Vu, B.; Knoblock, C.; and Pujara, J.\n\n\n \n\n\n\n In
The World Wide Web Conference, of
WWW '19, pages 1944–1953, New York, NY, USA, May 2019. Association for Computing Machinery\n
\n\n
\n\n
\n\n
\n\n \n \n Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n \n \n 18 downloads\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n
\n
@inproceedings{10.1145/3308558.3313711,\n author = {Vu, Binh and Knoblock, Craig and Pujara, Jay},\n title = {Learning Semantic Models of Data Sources Using Probabilistic Graphical Models},\n month = {May},\n year = {2019},\n isbn = {9781450366748},\n publisher = {Association for Computing Machinery},\n address = {New York, NY, USA},\n url = {https://doi.org/10.1145/3308558.3313711},\n doi = {10.1145/3308558.3313711},\n abstract = {A semantic model of a data source is a representation of the concepts and relationships contained in the data. Building semantic models is a prerequisite to automatically publishing data to a knowledge graph. However, creating these semantic models is a complex process requiring considerable manual effort and can be error-prone. In this paper, we present a novel approach that efficiently searches over the combinatorial space of possible semantic models, and applies a probabilistic graphical model to identify the most probable semantic model for a data source. Probabilistic graphical models offer many advantages over existing methods: they are robust to noisy inputs and provide a straightforward approach for exploiting relationships within the data. Our solution uses a conditional random field (CRF) to encode structural patterns and enforce conceptual consistency within the semantic model. In an empirical evaluation, our approach outperforms state of the art systems by an average 8.4% of F1 score, even with noisy input data.},\n booktitle = {The World Wide Web Conference},\n pages = {1944–1953},\n numpages = {10},\n keywords = {semantic web, probabilistic graphical models, Semantic models, ontology, knowledge graph, linked data},\n location = {San Francisco, CA, USA},\n series = {WWW '19}\n}\n\n
\n
\n\n\n
\n A semantic model of a data source is a representation of the concepts and relationships contained in the data. Building semantic models is a prerequisite to automatically publishing data to a knowledge graph. However, creating these semantic models is a complex process requiring considerable manual effort and can be error-prone. In this paper, we present a novel approach that efficiently searches over the combinatorial space of possible semantic models, and applies a probabilistic graphical model to identify the most probable semantic model for a data source. Probabilistic graphical models offer many advantages over existing methods: they are robust to noisy inputs and provide a straightforward approach for exploiting relationships within the data. Our solution uses a conditional random field (CRF) to encode structural patterns and enforce conceptual consistency within the semantic model. In an empirical evaluation, our approach outperforms state of the art systems by an average 8.4% of F1 score, even with noisy input data.\n
\n\n\n