Proximity-based document representation for named entity retrieval

Proximity-based document representation for named entity retrieval. Petkova, D. & Croft, W., B. Proceedings of the sixteenth ACM conference on Conference on information and knowledge management CIKM 07, ACM Press, 2007.

Website abstract bibtex

One aspect in which retrieving named entities is different from retrieving documents is that the items to be retrieved - persons, locations, organizations - are only indirectly described by documents throughout the collection. Much work has been dedicated to finding references to named entities, in particular to the problems of named entity extraction and disambiguation. However, just as important for retrieval performance is how these snippets of text are combined to build named entity representations. We focus on the TREC expert search task where the goal is to identify people who are knowledgeable on a specific topic. Existing language modeling techniques for expert finding assume that terms and person entities are conditionally independent given a document. We present theoretical and experimental evidence that this simplifying assumption ignores information on how named entities relate to document content. To address this issue, we propose a new document representation which emphasizes text in proximity to entities and thus incorporates sequential information implicit in text. Our experiments demonstrate that the proposed model significantly improves retrieval performance. The main contribution of this work is an effective formal method for explicitly modeling the dependency between the named entities and terms which appear in a document.

@article{
 title = {Proximity-based document representation for named entity retrieval},
 type = {article},
 year = {2007},
 identifiers = {[object Object]},
 pages = {731},
 websites = {http://portal.acm.org/citation.cfm?doid=1321440.1321542},
 publisher = {ACM Press},
 editors = {[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]},
 id = {2dd6feba-7145-30f8-b1f6-41c62f2c66dc},
 created = {2011-02-27T18:33:21.000Z},
 file_attached = {false},
 profile_id = {5284e6aa-156c-3ce5-bc0e-b80cf09f3ef6},
 group_id = {066b42c8-f712-3fc3-abb2-225c158d2704},
 last_modified = {2017-03-14T14:36:19.698Z},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {true},
 hidden = {false},
 citation_key = {Petkova2007},
 private_publication = {false},
 abstract = {One aspect in which retrieving named entities is different from retrieving documents is that the items to be retrieved - persons, locations, organizations - are only indirectly described by documents throughout the collection. Much work has been dedicated to finding references to named entities, in particular to the problems of named entity extraction and disambiguation. However, just as important for retrieval performance is how these snippets of text are combined to build named entity representations. We focus on the TREC expert search task where the goal is to identify people who are knowledgeable on a specific topic. Existing language modeling techniques for expert finding assume that terms and person entities are conditionally independent given a document. We present theoretical and experimental evidence that this simplifying assumption ignores information on how named entities relate to document content. To address this issue, we propose a new document representation which emphasizes text in proximity to entities and thus incorporates sequential information implicit in text. Our experiments demonstrate that the proposed model significantly improves retrieval performance. The main contribution of this work is an effective formal method for explicitly modeling the dependency between the named entities and terms which appear in a document.},
 bibtype = {article},
 author = {Petkova, Desislava and Croft, W Bruce},
 journal = {Proceedings of the sixteenth ACM conference on Conference on information and knowledge management CIKM 07}
}

Downloads: 0

{"_id":"QWDWBwgFpCwLXa8Tg","bibbaseid":"petkova-croft-proximitybaseddocumentrepresentationfornamedentityretrieval-2007","authorIDs":[],"author_short":["Petkova, D.","Croft, W., B."],"bibdata":{"title":"Proximity-based document representation for named entity retrieval","type":"article","year":"2007","identifiers":"[object Object]","pages":"731","websites":"http://portal.acm.org/citation.cfm?doid=1321440.1321542","publisher":"ACM Press","editors":"[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]","id":"2dd6feba-7145-30f8-b1f6-41c62f2c66dc","created":"2011-02-27T18:33:21.000Z","file_attached":false,"profile_id":"5284e6aa-156c-3ce5-bc0e-b80cf09f3ef6","group_id":"066b42c8-f712-3fc3-abb2-225c158d2704","last_modified":"2017-03-14T14:36:19.698Z","read":false,"starred":false,"authored":false,"confirmed":"true","hidden":false,"citation_key":"Petkova2007","private_publication":false,"abstract":"One aspect in which retrieving named entities is different from retrieving documents is that the items to be retrieved - persons, locations, organizations - are only indirectly described by documents throughout the collection. Much work has been dedicated to finding references to named entities, in particular to the problems of named entity extraction and disambiguation. However, just as important for retrieval performance is how these snippets of text are combined to build named entity representations. We focus on the TREC expert search task where the goal is to identify people who are knowledgeable on a specific topic. Existing language modeling techniques for expert finding assume that terms and person entities are conditionally independent given a document. We present theoretical and experimental evidence that this simplifying assumption ignores information on how named entities relate to document content. To address this issue, we propose a new document representation which emphasizes text in proximity to entities and thus incorporates sequential information implicit in text. Our experiments demonstrate that the proposed model significantly improves retrieval performance. The main contribution of this work is an effective formal method for explicitly modeling the dependency between the named entities and terms which appear in a document.","bibtype":"article","author":"Petkova, Desislava and Croft, W Bruce","journal":"Proceedings of the sixteenth ACM conference on Conference on information and knowledge management CIKM 07","bibtex":"@article{\n title = {Proximity-based document representation for named entity retrieval},\n type = {article},\n year = {2007},\n identifiers = {[object Object]},\n pages = {731},\n websites = {http://portal.acm.org/citation.cfm?doid=1321440.1321542},\n publisher = {ACM Press},\n editors = {[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]},\n id = {2dd6feba-7145-30f8-b1f6-41c62f2c66dc},\n created = {2011-02-27T18:33:21.000Z},\n file_attached = {false},\n profile_id = {5284e6aa-156c-3ce5-bc0e-b80cf09f3ef6},\n group_id = {066b42c8-f712-3fc3-abb2-225c158d2704},\n last_modified = {2017-03-14T14:36:19.698Z},\n read = {false},\n starred = {false},\n authored = {false},\n confirmed = {true},\n hidden = {false},\n citation_key = {Petkova2007},\n private_publication = {false},\n abstract = {One aspect in which retrieving named entities is different from retrieving documents is that the items to be retrieved - persons, locations, organizations - are only indirectly described by documents throughout the collection. Much work has been dedicated to finding references to named entities, in particular to the problems of named entity extraction and disambiguation. However, just as important for retrieval performance is how these snippets of text are combined to build named entity representations. We focus on the TREC expert search task where the goal is to identify people who are knowledgeable on a specific topic. Existing language modeling techniques for expert finding assume that terms and person entities are conditionally independent given a document. We present theoretical and experimental evidence that this simplifying assumption ignores information on how named entities relate to document content. To address this issue, we propose a new document representation which emphasizes text in proximity to entities and thus incorporates sequential information implicit in text. Our experiments demonstrate that the proposed model significantly improves retrieval performance. The main contribution of this work is an effective formal method for explicitly modeling the dependency between the named entities and terms which appear in a document.},\n bibtype = {article},\n author = {Petkova, Desislava and Croft, W Bruce},\n journal = {Proceedings of the sixteenth ACM conference on Conference on information and knowledge management CIKM 07}\n}","author_short":["Petkova, D.","Croft, W., B."],"urls":{"Website":"http://portal.acm.org/citation.cfm?doid=1321440.1321542"},"bibbaseid":"petkova-croft-proximitybaseddocumentrepresentationfornamedentityretrieval-2007","role":"author","downloads":0,"html":""},"bibtype":"article","creationDate":"2020-02-06T23:48:11.739Z","downloads":0,"keywords":[],"search_terms":["proximity","based","document","representation","named","entity","retrieval","petkova","croft"],"title":"Proximity-based document representation for named entity retrieval","year":2007}