Improving Search Results in the Lod With Entity Resolution. De Assis Costa, G. and De Oliveira, J. Proceedings of the Congresso Linked Open Data Brasil 2014, 2014.
abstract   bibtex   
Expressive query capabilities can be achieved due to the traversal of links between different data sources. In this sense, the ability of data aggregation could be a differential to some Linked Data search engines when crawling the Web of Data. However, there are many challenges specially when considering different types, structures and vocabularies used in the Web. Besides that, it is difficult to guarantee the quality of data because they are usually incomplete, inconsistent and contain outliers. Trying to overcome some of these problems, many works have applied the task of Entity Resolution using different techniques and algorithms. In this paper we present an overview of our experience obtained in the construction of an approach to integrate data sets aiming to improve search results made over the LOD. In addition to a general description of the approach and its main aspects, we point out a brief overview about ER and some trends about the potential of this technique. Introduction Following the trend of the World Wide Web, a considerable number of people and companies have published their data in the Web of Data (HEATH AND BIZER, 2011). As a result, the amount and variety of data is growing exponentially, creating a graph of global dimensions formed by billions of RDF triples that represent data from different fields of knowledge. The creation of RDF data sources are commonly based in conversion processes from structured data that comes from relational databases or from semi-structured or unstructured data crawled from web pages, texts and other type of documents. As these data sources typically present problems like outliers, duplication, inconsistency, and other, like schema heterogeneity, derived data will have them in the same way. Among the existing problems, these arise as some of the limiting factors to the effective integration and sharing of Linked Data. To deal with these problems, methods of inductive approaches from the fields of machine learning and data mining were successfully employed to perform approximate reasoning and to derive predictions which are neither explicitly asserted in the knowledge base nor provable based in logical reasoning (RETTINGER et. al. 2012; TRESP et. al. 2008). In general, some tasks can be performed: classification (object type prediction or property value prediction), link prediction, clustering and ER. In addition to the previously mentioned approaches, the Semantic Web community recognizes the approach of instance-level ER (BIZER, HEATH, and BERNERS-LEE, 2009). In this way, methods often make use of similarity metrics applied between entities based on established techniques from database community, like record linkage or de-duplication (FELLEGI and SUNTER 1969), and from ontology community, like ontology matching (EUZENAT and SHVAIKO 2007).
@article{
 title = {Improving Search Results in the Lod With Entity Resolution},
 type = {article},
 year = {2014},
 keywords = {Entity Resolution,Linked Data,Semantic Web},
 pages = {12},
 id = {3f1119e0-7a49-319d-bc3e-2030d5df6e96},
 created = {2018-09-12T14:43:40.104Z},
 file_attached = {false},
 profile_id = {4519b08f-eb67-3c45-be51-5ea1a2c16093},
 group_id = {c53614b8-0822-3f60-9001-f72394b88ff8},
 last_modified = {2018-09-12T14:43:40.104Z},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {true},
 hidden = {false},
 private_publication = {false},
 abstract = {Expressive query capabilities can be achieved due to the traversal of links between different data sources. In this sense, the ability of data aggregation could be a differential to some Linked Data search engines when crawling the Web of Data. However, there are many challenges specially when considering different types, structures and vocabularies used in the Web. Besides that, it is difficult to guarantee the quality of data because they are usually incomplete, inconsistent and contain outliers. Trying to overcome some of these problems, many works have applied the task of Entity Resolution using different techniques and algorithms. In this paper we present an overview of our experience obtained in the construction of an approach to integrate data sets aiming to improve search results made over the LOD. In addition to a general description of the approach and its main aspects, we point out a brief overview about ER and some trends about the potential of this technique. Introduction Following the trend of the World Wide Web, a considerable number of people and companies have published their data in the Web of Data (HEATH AND BIZER, 2011). As a result, the amount and variety of data is growing exponentially, creating a graph of global dimensions formed by billions of RDF triples that represent data from different fields of knowledge. The creation of RDF data sources are commonly based in conversion processes from structured data that comes from relational databases or from semi-structured or unstructured data crawled from web pages, texts and other type of documents. As these data sources typically present problems like outliers, duplication, inconsistency, and other, like schema heterogeneity, derived data will have them in the same way. Among the existing problems, these arise as some of the limiting factors to the effective integration and sharing of Linked Data. To deal with these problems, methods of inductive approaches from the fields of machine learning and data mining were successfully employed to perform approximate reasoning and to derive predictions which are neither explicitly asserted in the knowledge base nor provable based in logical reasoning (RETTINGER et. al. 2012; TRESP et. al. 2008). In general, some tasks can be performed: classification (object type prediction or property value prediction), link prediction, clustering and ER. In addition to the previously mentioned approaches, the Semantic Web community recognizes the approach of instance-level ER (BIZER, HEATH, and BERNERS-LEE, 2009). In this way, methods often make use of similarity metrics applied between entities based on established techniques from database community, like record linkage or de-duplication (FELLEGI and SUNTER 1969), and from ontology community, like ontology matching (EUZENAT and SHVAIKO 2007).},
 bibtype = {article},
 author = {De Assis Costa, G. and De Oliveira, J.M.P.},
 journal = {Proceedings of the Congresso Linked Open Data Brasil 2014},
 number = {March}
}
Downloads: 0