Literally better: Analyzing and improving the quality of literals

Literally better: Analyzing and improving the quality of literals. Beek, W., Ilievski, F., Debattista, J., Schlobach, S., & Wielemaker, J. Semantic Web, 9(1):131–150, IOS Press, 2018.
doi abstract bibtex

Quality is a complicated and multifarious topic in contemporary Linked Data research. The aspect of literal quality in particular has not yet been rigorously studied. Nevertheless, analyzing and improving the quality of literals is important since literals form a substantial (one in seven statements) and crucial part of the Semantic Web. Specifically, literals allow infinite value spaces to be expressed and they provide the linguistic entry point to the LOD Cloud. We present a toolchain that builds on the LOD Laundromat data cleaning and republishing infrastructure and that allows us to analyze the quality of literals on a very large scale, using a collection of quality criteria we specify in a systematic way. We illustrate the viability of our approach by lifting out two particular aspects in which the current LOD Cloud can be immediately improved by automated means: value canonization and language tagging. Since not all quality aspects can be addressed algorithmically, we also give an overview of other problems that can be used to guide future endeavors in tooling, training, and best practice formulation.

@article{7fcc8392689e4c6f8bc5c8ae55eb89a2,
  title     = "Literally better: Analyzing and improving the quality of literals",
  abstract  = "Quality is a complicated and multifarious topic in contemporary Linked Data research. The aspect of literal quality in particular has not yet been rigorously studied. Nevertheless, analyzing and improving the quality of literals is important since literals form a substantial (one in seven statements) and crucial part of the Semantic Web. Specifically, literals allow infinite value spaces to be expressed and they provide the linguistic entry point to the LOD Cloud. We present a toolchain that builds on the LOD Laundromat data cleaning and republishing infrastructure and that allows us to analyze the quality of literals on a very large scale, using a collection of quality criteria we specify in a systematic way. We illustrate the viability of our approach by lifting out two particular aspects in which the current LOD Cloud can be immediately improved by automated means: value canonization and language tagging. Since not all quality aspects can be addressed algorithmically, we also give an overview of other problems that can be used to guide future endeavors in tooling, training, and best practice formulation.",
  keywords  = "data observatory, Data quality, linked data, quality assessment, quality improvement",
  author    = "Wouter Beek and Filip Ilievski and Jeremy Debattista and Stefan Schlobach and Jan Wielemaker",
  year      = "2018",
  doi       = "10.3233/SW-170288",
  volume    = "9",
  pages     = "131--150",
  journal   = "Semantic Web",
  issn      = "1570-0844",
  publisher = "IOS Press",
  number    = "1",
}

Downloads: 0

{"_id":"HP6hcn7QMtZ3Rw26n","bibbaseid":"beek-ilievski-debattista-schlobach-wielemaker-literallybetteranalyzingandimprovingthequalityofliterals-2018","downloads":0,"creationDate":"2018-01-22T13:16:33.813Z","title":"Literally better: Analyzing and improving the quality of literals","author_short":["Beek, W.","Ilievski, F.","Debattista, J.","Schlobach, S.","Wielemaker, J."],"year":2018,"bibtype":"article","biburl":"https://raw.githubusercontent.com/KRRVU/website/master/publications/krr.bib","bibdata":{"bibtype":"article","type":"article","title":"Literally better: Analyzing and improving the quality of literals","abstract":"Quality is a complicated and multifarious topic in contemporary Linked Data research. The aspect of literal quality in particular has not yet been rigorously studied. Nevertheless, analyzing and improving the quality of literals is important since literals form a substantial (one in seven statements) and crucial part of the Semantic Web. Specifically, literals allow infinite value spaces to be expressed and they provide the linguistic entry point to the LOD Cloud. We present a toolchain that builds on the LOD Laundromat data cleaning and republishing infrastructure and that allows us to analyze the quality of literals on a very large scale, using a collection of quality criteria we specify in a systematic way. We illustrate the viability of our approach by lifting out two particular aspects in which the current LOD Cloud can be immediately improved by automated means: value canonization and language tagging. Since not all quality aspects can be addressed algorithmically, we also give an overview of other problems that can be used to guide future endeavors in tooling, training, and best practice formulation.","keywords":"data observatory, Data quality, linked data, quality assessment, quality improvement","author":[{"firstnames":["Wouter"],"propositions":[],"lastnames":["Beek"],"suffixes":[]},{"firstnames":["Filip"],"propositions":[],"lastnames":["Ilievski"],"suffixes":[]},{"firstnames":["Jeremy"],"propositions":[],"lastnames":["Debattista"],"suffixes":[]},{"firstnames":["Stefan"],"propositions":[],"lastnames":["Schlobach"],"suffixes":[]},{"firstnames":["Jan"],"propositions":[],"lastnames":["Wielemaker"],"suffixes":[]}],"year":"2018","doi":"10.3233/SW-170288","volume":"9","pages":"131–150","journal":"Semantic Web","issn":"1570-0844","publisher":"IOS Press","number":"1","bibtex":"@article{7fcc8392689e4c6f8bc5c8ae55eb89a2,\n title = \"Literally better: Analyzing and improving the quality of literals\",\n abstract = \"Quality is a complicated and multifarious topic in contemporary Linked Data research. The aspect of literal quality in particular has not yet been rigorously studied. Nevertheless, analyzing and improving the quality of literals is important since literals form a substantial (one in seven statements) and crucial part of the Semantic Web. Specifically, literals allow infinite value spaces to be expressed and they provide the linguistic entry point to the LOD Cloud. We present a toolchain that builds on the LOD Laundromat data cleaning and republishing infrastructure and that allows us to analyze the quality of literals on a very large scale, using a collection of quality criteria we specify in a systematic way. We illustrate the viability of our approach by lifting out two particular aspects in which the current LOD Cloud can be immediately improved by automated means: value canonization and language tagging. Since not all quality aspects can be addressed algorithmically, we also give an overview of other problems that can be used to guide future endeavors in tooling, training, and best practice formulation.\",\n keywords = \"data observatory, Data quality, linked data, quality assessment, quality improvement\",\n author = \"Wouter Beek and Filip Ilievski and Jeremy Debattista and Stefan Schlobach and Jan Wielemaker\",\n year = \"2018\",\n doi = \"10.3233/SW-170288\",\n volume = \"9\",\n pages = \"131--150\",\n journal = \"Semantic Web\",\n issn = \"1570-0844\",\n publisher = \"IOS Press\",\n number = \"1\",\n}\n\n\n","author_short":["Beek, W.","Ilievski, F.","Debattista, J.","Schlobach, S.","Wielemaker, J."],"key":"7fcc8392689e4c6f8bc5c8ae55eb89a2","id":"7fcc8392689e4c6f8bc5c8ae55eb89a2","bibbaseid":"beek-ilievski-debattista-schlobach-wielemaker-literallybetteranalyzingandimprovingthequalityofliterals-2018","role":"author","urls":{},"keyword":["data observatory","Data quality","linked data","quality assessment","quality improvement"],"metadata":{"authorlinks":{}},"downloads":0},"search_terms":["literally","better","analyzing","improving","quality","literals","beek","ilievski","debattista","schlobach","wielemaker"],"keywords":["data observatory","data quality","linked data","quality assessment","quality improvement"],"authorIDs":[],"dataSources":["H6xuGqu5uQ6rXhdJ4","gF7fTLoQtqJwEfGLF","7sSZHoL8DjWLGq889","NZXeb4QPd4MzNiij3","dJmTXpbSWWjnxatYT"]}