Named Entity Recognition without Gazetteers Using a Machine Learning Approach. Zhou Artificial Intelligence, 2002.
Named Entity Recognition without Gazetteers Using a Machine Learning Approach [link]Website  abstract   bibtex   
Gazetteers, such as lists of names of persons, organizations, locations and other entities, have been always mentioned as a bottleneck of a Named Entity (NE) Recognition (NER) system. This paper proposes a modified Hidden Markov Model (HMM) and an HMM-based chunk tagger, from which a NER system is built to recognize and classify names, times and numerical quantities. Through the modified HMM, our system is able to apply and integrate three types of internal and external evidences: 1) simple deterministic internal feature of the words, such as capitalization and digitalization; 2) internal semantic feature of important triggers; 3) external macro context feature. In this way, the NER problem can be resolved effectively without gazetteers. Evaluation of our system on MUC-6 and MUC-7 English NE tasks achieves F-measures of 95.4% and 93.0% respectively. It shows that gazetteers need not be a bottleneck for NER problem: the performance of our system (without gazetteers) is consistently better than that of any other reported machine-learning system and even comparable to that of the reported best system based on handcrafted rules.
@article{
 title = {Named Entity Recognition without Gazetteers Using a Machine Learning Approach},
 type = {article},
 year = {2002},
 websites = {citeseer.ist.psu.edu/article/zhou02named.html},
 id = {5aae1506-6f58-30d2-96a8-869d89dbeecd},
 created = {2011-12-28T07:04:55.000Z},
 file_attached = {false},
 profile_id = {5284e6aa-156c-3ce5-bc0e-b80cf09f3ef6},
 group_id = {066b42c8-f712-3fc3-abb2-225c158d2704},
 last_modified = {2017-03-14T14:36:19.698Z},
 tags = {named entities},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {true},
 hidden = {false},
 citation_key = {Zhou2002},
 private_publication = {false},
 abstract = {Gazetteers, such as lists of names of persons, organizations, locations and other entities, have been always mentioned as a bottleneck of a Named Entity (NE) Recognition (NER) system. This paper proposes a modified Hidden Markov Model (HMM) and an HMM-based chunk tagger, from which a NER system is built to recognize and classify names, times and numerical quantities. Through the modified HMM, our system is able to apply and integrate three types of internal and external evidences: 1) simple deterministic internal feature of the words, such as capitalization and digitalization; 2) internal semantic feature of important triggers; 3) external macro context feature. In this way, the NER problem can be resolved effectively without gazetteers. Evaluation of our system on MUC-6 and MUC-7 English NE tasks achieves F-measures of 95.4% and 93.0% respectively. It shows that gazetteers need not be a bottleneck for NER problem: the performance of our system (without gazetteers) is consistently better than that of any other reported machine-learning system and even comparable to that of the reported best system based on handcrafted rules.},
 bibtype = {article},
 author = {Zhou, undefined},
 journal = {Artificial Intelligence}
}

Downloads: 0