Web object retrieval

Web object retrieval. Nie, Z., Ma, Y., Shi, S., Wen, J., & Ma, W. Proceedings of the 16th international conference on World Wide Web WWW 07, ACM Press, 2007.

Website abstract bibtex

The primary function of current Web search engines is essentially relevance ranking at the document level. However, myriad structured information about real-world objects is embedded in static Web pages and online Web databases. Document-level information retrieval can unfortunately lead to highly inaccurate relevance ranking in answering object-oriented queries. In this paper, we propose a paradigm shift to enable searching at the object level. In traditional information retrieval models, documents are taken as the retrieval units and the content of a document is considered reliable. However, this reliability assumption is no longer valid in the object retrieval context when multiple copies of information about the same object typically exist. These copies may be inconsistent because of diversity of Web site qualities and the limited performance of current information extraction techniques. If we simply combine the noisy and inaccurate attribute information extracted from different sources, we may not be able to achieve satisfactory retrieval performance. In this paper, we propose several language models for Web object retrieval, namely an unstructured object retrieval model, a structured object retrieval model, and a hybrid model with both structured and unstructured retrieval features. We test these models on a paper search engine and compare their performances. We conclude that the hybrid model is the superior by taking into account the extraction errors at varying levels.

@article{
 title = {Web object retrieval},
 type = {article},
 year = {2007},
 identifiers = {[object Object]},
 keywords = {information,information extraction,web objects},
 pages = {81},
 websites = {http://portal.acm.org/citation.cfm?doid=1242572.1242584},
 publisher = {ACM Press},
 id = {7250601f-6f39-3011-978b-a5861a48ebfa},
 created = {2011-02-27T18:33:21.000Z},
 file_attached = {false},
 profile_id = {5284e6aa-156c-3ce5-bc0e-b80cf09f3ef6},
 group_id = {066b42c8-f712-3fc3-abb2-225c158d2704},
 last_modified = {2017-03-14T14:36:19.698Z},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {true},
 hidden = {false},
 citation_key = {Nie2007},
 private_publication = {false},
 abstract = {The primary function of current Web search engines is essentially relevance ranking at the document level. However, myriad structured information about real-world objects is embedded in static Web pages and online Web databases. Document-level information retrieval can unfortunately lead to highly inaccurate relevance ranking in answering object-oriented queries. In this paper, we propose a paradigm shift to enable searching at the object level. In traditional information retrieval models, documents are taken as the retrieval units and the content of a document is considered reliable. However, this reliability assumption is no longer valid in the object retrieval context when multiple copies of information about the same object typically exist. These copies may be inconsistent because of diversity of Web site qualities and the limited performance of current information extraction techniques. If we simply combine the noisy and inaccurate attribute information extracted from different sources, we may not be able to achieve satisfactory retrieval performance. In this paper, we propose several language models for Web object retrieval, namely an unstructured object retrieval model, a structured object retrieval model, and a hybrid model with both structured and unstructured retrieval features. We test these models on a paper search engine and compare their performances. We conclude that the hybrid model is the superior by taking into account the extraction errors at varying levels.},
 bibtype = {article},
 author = {Nie, Zaiqing and Ma, Yunxiao and Shi, Shuming and Wen, Ji-Rong and Ma, Wei-Ying},
 journal = {Proceedings of the 16th international conference on World Wide Web WWW 07}
}

Downloads: 0

{"_id":"n4dHFui26mmxPPkiZ","bibbaseid":"nie-ma-shi-wen-ma-webobjectretrieval-2007","authorIDs":[],"author_short":["Nie, Z.","Ma, Y.","Shi, S.","Wen, J.","Ma, W."],"bibdata":{"title":"Web object retrieval","type":"article","year":"2007","identifiers":"[object Object]","keywords":"information,information extraction,web objects","pages":"81","websites":"http://portal.acm.org/citation.cfm?doid=1242572.1242584","publisher":"ACM Press","id":"7250601f-6f39-3011-978b-a5861a48ebfa","created":"2011-02-27T18:33:21.000Z","file_attached":false,"profile_id":"5284e6aa-156c-3ce5-bc0e-b80cf09f3ef6","group_id":"066b42c8-f712-3fc3-abb2-225c158d2704","last_modified":"2017-03-14T14:36:19.698Z","read":false,"starred":false,"authored":false,"confirmed":"true","hidden":false,"citation_key":"Nie2007","private_publication":false,"abstract":"The primary function of current Web search engines is essentially relevance ranking at the document level. However, myriad structured information about real-world objects is embedded in static Web pages and online Web databases. Document-level information retrieval can unfortunately lead to highly inaccurate relevance ranking in answering object-oriented queries. In this paper, we propose a paradigm shift to enable searching at the object level. In traditional information retrieval models, documents are taken as the retrieval units and the content of a document is considered reliable. However, this reliability assumption is no longer valid in the object retrieval context when multiple copies of information about the same object typically exist. These copies may be inconsistent because of diversity of Web site qualities and the limited performance of current information extraction techniques. If we simply combine the noisy and inaccurate attribute information extracted from different sources, we may not be able to achieve satisfactory retrieval performance. In this paper, we propose several language models for Web object retrieval, namely an unstructured object retrieval model, a structured object retrieval model, and a hybrid model with both structured and unstructured retrieval features. We test these models on a paper search engine and compare their performances. We conclude that the hybrid model is the superior by taking into account the extraction errors at varying levels.","bibtype":"article","author":"Nie, Zaiqing and Ma, Yunxiao and Shi, Shuming and Wen, Ji-Rong and Ma, Wei-Ying","journal":"Proceedings of the 16th international conference on World Wide Web WWW 07","bibtex":"@article{\n title = {Web object retrieval},\n type = {article},\n year = {2007},\n identifiers = {[object Object]},\n keywords = {information,information extraction,web objects},\n pages = {81},\n websites = {http://portal.acm.org/citation.cfm?doid=1242572.1242584},\n publisher = {ACM Press},\n id = {7250601f-6f39-3011-978b-a5861a48ebfa},\n created = {2011-02-27T18:33:21.000Z},\n file_attached = {false},\n profile_id = {5284e6aa-156c-3ce5-bc0e-b80cf09f3ef6},\n group_id = {066b42c8-f712-3fc3-abb2-225c158d2704},\n last_modified = {2017-03-14T14:36:19.698Z},\n read = {false},\n starred = {false},\n authored = {false},\n confirmed = {true},\n hidden = {false},\n citation_key = {Nie2007},\n private_publication = {false},\n abstract = {The primary function of current Web search engines is essentially relevance ranking at the document level. However, myriad structured information about real-world objects is embedded in static Web pages and online Web databases. Document-level information retrieval can unfortunately lead to highly inaccurate relevance ranking in answering object-oriented queries. In this paper, we propose a paradigm shift to enable searching at the object level. In traditional information retrieval models, documents are taken as the retrieval units and the content of a document is considered reliable. However, this reliability assumption is no longer valid in the object retrieval context when multiple copies of information about the same object typically exist. These copies may be inconsistent because of diversity of Web site qualities and the limited performance of current information extraction techniques. If we simply combine the noisy and inaccurate attribute information extracted from different sources, we may not be able to achieve satisfactory retrieval performance. In this paper, we propose several language models for Web object retrieval, namely an unstructured object retrieval model, a structured object retrieval model, and a hybrid model with both structured and unstructured retrieval features. We test these models on a paper search engine and compare their performances. We conclude that the hybrid model is the superior by taking into account the extraction errors at varying levels.},\n bibtype = {article},\n author = {Nie, Zaiqing and Ma, Yunxiao and Shi, Shuming and Wen, Ji-Rong and Ma, Wei-Ying},\n journal = {Proceedings of the 16th international conference on World Wide Web WWW 07}\n}","author_short":["Nie, Z.","Ma, Y.","Shi, S.","Wen, J.","Ma, W."],"urls":{"Website":"http://portal.acm.org/citation.cfm?doid=1242572.1242584"},"bibbaseid":"nie-ma-shi-wen-ma-webobjectretrieval-2007","role":"author","keyword":["information","information extraction","web objects"],"downloads":0,"html":""},"bibtype":"article","creationDate":"2020-02-06T23:48:11.740Z","downloads":0,"keywords":["information","information extraction","web objects"],"search_terms":["web","object","retrieval","nie","ma","shi","wen","ma"],"title":"Web object retrieval","year":2007}