Binary RDF Representation for Publication and Exchange (HDT). Fernández, J. D., Mart\inez-Prieto, M. A., Gutiérrez, C., Polleres, A., & Arias, M. Journal of Web Semantics (JWS), Elsevier, 2013.
Binary RDF Representation for Publication and Exchange (HDT) [link]Paper  abstract   bibtex   
The current Web of Data is producing increasingly large RDF data sets. Massive publication efforts of RDF data driven by initiatives like the Linked Open Data movement, and the need to exchange large data sets has unveiled the drawbacks of traditional RDF representations, inspired and designed by a document-centric and human-readable Web. Among the main problems are high levels of verbosity/redundancy and weak machine-processable capabilities in the description of these data sets. This scenario calls for efficient formats for publication and exchange. This article presents a binary RDF representation addressing these issues. Based on a set of metrics that characterizes the skewed structure of real-world RDF data, we develop a proposal of an RDF representation that modularly partitions and efficiently represents three components of RDF data sets: Header information, a Dictionary, and the actual Triples structure (thus called HDT). Our experimental evaluation shows that data sets in HDT format can be compacted by more than fifteen times as compared to current naive representations, improving both parsing and processing while keeping a consistent pub- lication scheme. Specific compression techniques over HDT further improve these compression rates and prove to outperform existing compression solutions for efficient RDF exchange.
@article{fern-etal-2013-HDT-JWS,
	Abstract = {The current Web of Data is producing increasingly large RDF data sets. Massive publication efforts of RDF data driven by initiatives like the Linked Open Data movement, and the need to exchange large data sets has unveiled the drawbacks of traditional RDF representations, inspired and designed by a document-centric and human-readable Web. Among the main problems are high levels of verbosity/redundancy and weak machine-processable capabilities in the description of these data sets. This scenario calls for efficient formats for publication and exchange. This article presents a binary RDF representation addressing these issues. Based on a set of metrics that characterizes the skewed structure of real-world RDF data, we develop a proposal of an RDF representation that modularly partitions and efficiently represents three components of RDF data sets: Header information, a Dictionary, and the actual Triples structure (thus called HDT).
Our experimental evaluation shows that data sets in HDT format can be compacted by more than fifteen times as compared to current naive representations, improving both parsing and processing while keeping a consistent pub- lication scheme. Specific compression techniques over HDT further improve these compression rates and prove to outperform existing compression solutions for efficient RDF exchange.},
	Author = {Javier D. Fern{\'a}ndez and Miguel A. Mart{\i}nez-Prieto and Claudio Guti{\'e}rrez and Axel Polleres and Mario Arias},
	Journal = JWS,
	Number = 2,
	Publisher = {Elsevier},
	Title = {{Binary RDF Representation for Publication and Exchange (HDT)}},
	Type = JOURNAL,
	Url = {http://www.websemanticsjournal.org/index.php/ps/article/view/328},
	Volume = 19,
	Year = 2013,
	Bdsk-Url-1 = {http://www.websemanticsjournal.org/index.php/ps/article/view/328}}
Downloads: 0