Lifting data portals to the Web of Data. Neumaier, S., Umbrich, J., & Polleres, A. In 10th Workshop on Linked Data on the Web (LDOW2017), Perth, Austrialia, April, 2017.
Lifting data portals to the Web of Data [pdf]Paper  abstract   bibtex   
Data portals are central hubs for freely available (governmental) datasets. These portals use different software frameworks to publish their data and the metadata descriptions of these datasets come in different schemas according to the used framework. The present work aims at re-exposing and connecting the metadata descriptions of currently 854k datasets on 261 data portals to the Web of Linked Data by mapping and publishing their homogenized metadata in standard vocabularies such as DCAT and Schema.org. Additionally, we publish existing quality information about the datasets and further enrich their descriptions by automatically generated metadata for CSV resources. In order to make all this information traceable and trustworthy, we annotate the generated data using W3C’s provenance vocabulary. The dataset descriptions are harvested weekly and we offer access to the archived data by providing APIs compliant to the Memento framework. All this data – a total of about 120 million triples per weekly snapshot – is queryable at the SPARQL endpoint at r̆lhttp://data.wu.ac.at/portalwatch/sparql.
@inproceedings{neum-etal-LDOW2017,
   title = {Lifting data portals to the Web of Data},
   author = {Sebastian Neumaier and J{\"u}rgen Umbrich and Axel Polleres},
   abstract = {Data portals are central hubs for freely available (governmental) datasets. These portals use different software frameworks to publish their data and the metadata descriptions of these datasets come in different schemas according to the used framework. The present work aims at re-exposing and connecting the metadata descriptions of currently 854k datasets on 261 data portals to the Web of Linked Data by mapping and publishing their homogenized metadata in standard vocabularies such as DCAT and Schema.org. Additionally, we publish existing quality information about the datasets and further enrich their descriptions by automatically generated metadata for CSV resources. In order to make all this information traceable and trustworthy, we annotate the generated data using W3C’s provenance vocabulary. The dataset descriptions are harvested weekly and we offer access to the archived data by providing APIs compliant to the Memento framework. All this data -- a total of about 120 million triples per weekly snapshot -- is queryable at the SPARQL endpoint at \url{http://data.wu.ac.at/portalwatch/sparql}.},
  year = 2017,
  booktitle = {10th Workshop on Linked Data on the Web (LDOW2017)},
  address = {Perth, Austrialia},
  day = 3,
  month = apr,
  url = {http://polleres.net/publications/neum-etal-LDOW2017.pdf}

}
Downloads: 0