Automated Quality Assessment of Metadata across Open Data Portals. Neumaier, S., Umbrich, J., & Polleres, A. ACM Journal of Data and Information Quality (JDIQ), 8(1):2, November, 2016. Paper doi abstract bibtex The Open Data movement has become a driver for publicly available data on the Web. More and more data – from governments, public institutions but also from the private sector – is made available online and is mainly published in so called Open Data portals. However, with the increasing number of published resources, there are a number of concerns with regards to the quality of the data sources and the corresponding metadata, which compromise the searchability, discoverability and usability of resources. In order to get a more complete picture of the severity of these issues, the present work aims at developing a generic metadata quality assessment framework for various Open Data portals: we treat data portals independently from the portal software frameworks by mapping the specific metadata of three widely used portal software frameworks (CKAN, Socrata, OpenDataSoft) to the standardized DCAT metadata schema. We subsequently define several quality metrics, which can be evaluated automatically and in a efficient manner. Finally, we report findings based on monitoring a set of over 260 Open Data portals with 1.1M datasets. This includes the discussion of general quality issues, e.g. the retrievability of data, and the analysis of our specific quality metrics.
@article{neum-etal-2016JDIQ,
author = {Neumaier, Sebastian and Umbrich, J\"urgen and Polleres, Axel},
journal = {ACM Journal of Data and Information Quality (JDIQ)},
keyword = {open data, quality assessment},
abstract = {The Open Data movement has become a driver for publicly available data on the Web. More and more data -- from governments, public institutions but also from the private sector -- is made available online and is mainly published in so called Open Data portals. However, with the increasing number of published resources, there are a number of concerns with regards to the quality of the data sources and the corresponding metadata, which compromise the searchability, discoverability and usability of resources.
In order to get a more complete picture of the severity of these issues, the present work aims at developing a generic metadata quality assessment framework for various Open Data portals: we treat data portals independently from the portal software frameworks by mapping the specific metadata of three widely used portal software frameworks (CKAN, Socrata, OpenDataSoft) to the standardized DCAT metadata schema. We subsequently define several quality metrics, which can be evaluated automatically and in a efficient manner. Finally, we report findings based on monitoring a set of over 260 Open Data portals with 1.1M datasets. This includes the discussion of general quality issues, e.g. the retrievability of data, and the analysis of our specific quality metrics.},
volume = 8,
number = 1,
pages = 2,
url = {http://polleres.net/publications/neum-etal-2016JDIQ.pdf},
title = {Automated Quality Assessment of Metadata across Open Data Portals},
year = {2016},
month = nov,
doi = {https://doi.org/10.1145/2964909},
}
Downloads: 0
{"_id":"ZAmMCECXGNs7FmS2L","bibbaseid":"neumaier-umbrich-polleres-automatedqualityassessmentofmetadataacrossopendataportals-2016","authorIDs":["FyLDFGg993nDS2Spf"],"author_short":["Neumaier, S.","Umbrich, J.","Polleres, A."],"bibdata":{"bibtype":"article","type":"article","author":[{"propositions":[],"lastnames":["Neumaier"],"firstnames":["Sebastian"],"suffixes":[]},{"propositions":[],"lastnames":["Umbrich"],"firstnames":["Jürgen"],"suffixes":[]},{"propositions":[],"lastnames":["Polleres"],"firstnames":["Axel"],"suffixes":[]}],"journal":"ACM Journal of Data and Information Quality (JDIQ)","keyword":["open data"," quality assessment"],"abstract":"The Open Data movement has become a driver for publicly available data on the Web. More and more data – from governments, public institutions but also from the private sector – is made available online and is mainly published in so called Open Data portals. However, with the increasing number of published resources, there are a number of concerns with regards to the quality of the data sources and the corresponding metadata, which compromise the searchability, discoverability and usability of resources. In order to get a more complete picture of the severity of these issues, the present work aims at developing a generic metadata quality assessment framework for various Open Data portals: we treat data portals independently from the portal software frameworks by mapping the specific metadata of three widely used portal software frameworks (CKAN, Socrata, OpenDataSoft) to the standardized DCAT metadata schema. We subsequently define several quality metrics, which can be evaluated automatically and in a efficient manner. Finally, we report findings based on monitoring a set of over 260 Open Data portals with 1.1M datasets. This includes the discussion of general quality issues, e.g. the retrievability of data, and the analysis of our specific quality metrics.","volume":"8","number":"1","pages":"2","url":"http://polleres.net/publications/neum-etal-2016JDIQ.pdf","title":"Automated Quality Assessment of Metadata across Open Data Portals","year":"2016","month":"November","doi":"https://doi.org/10.1145/2964909","bibtex":"@article{neum-etal-2016JDIQ,\n\t author = {Neumaier, Sebastian and Umbrich, J\\\"urgen and Polleres, Axel},\n\t journal = {ACM Journal of Data and Information Quality (JDIQ)},\n\t keyword = {open data, quality assessment},\n abstract = {The Open Data movement has become a driver for publicly available data on the Web. More and more data -- from governments, public institutions but also from the private sector -- is made available online and is mainly published in so called Open Data portals. However, with the increasing number of published resources, there are a number of concerns with regards to the quality of the data sources and the corresponding metadata, which compromise the searchability, discoverability and usability of resources.\nIn order to get a more complete picture of the severity of these issues, the present work aims at developing a generic metadata quality assessment framework for various Open Data portals: we treat data portals independently from the portal software frameworks by mapping the specific metadata of three widely used portal software frameworks (CKAN, Socrata, OpenDataSoft) to the standardized DCAT metadata schema. We subsequently define several quality metrics, which can be evaluated automatically and in a efficient manner. Finally, we report findings based on monitoring a set of over 260 Open Data portals with 1.1M datasets. This includes the discussion of general quality issues, e.g. the retrievability of data, and the analysis of our specific quality metrics.},\n\t volume = 8,\n number = 1,\n pages = 2,\n url = {http://polleres.net/publications/neum-etal-2016JDIQ.pdf},\n\t title = {Automated Quality Assessment of Metadata across Open Data Portals},\n\t year = {2016},\n month = nov,\n\t doi = {https://doi.org/10.1145/2964909},\n}\n\n","author_short":["Neumaier, S.","Umbrich, J.","Polleres, A."],"key":"neum-etal-2016JDIQ","id":"neum-etal-2016JDIQ","bibbaseid":"neumaier-umbrich-polleres-automatedqualityassessmentofmetadataacrossopendataportals-2016","role":"author","urls":{"Paper":"http://polleres.net/publications/neum-etal-2016JDIQ.pdf"},"metadata":{"authorlinks":{"polleres, a":"https://bibbase.org/show?bib=www.polleres.net/mypublications.bib"}},"downloads":0,"html":""},"bibtype":"article","biburl":"www.polleres.net/mypublications.bib","creationDate":"2021-03-03T00:56:04.817Z","downloads":0,"keywords":["open data"," quality assessment"],"search_terms":["automated","quality","assessment","metadata","open","data","portals","neumaier","umbrich","polleres"],"title":"Automated Quality Assessment of Metadata across Open Data Portals","year":2016,"dataSources":["cBfwyqsLFQQMc4Fss","gixxkiKt6rtWGoKSh","QfLT6siHZuHw9MqvK"]}