Chinese named entity recognition using lexicalized HMMs. Fu, G. & Luke, K. ACM SIGKDD Explorations Newsletter, 7(1):19-25, 2005.
Chinese named entity recognition using lexicalized HMMs [link]Website  abstract   bibtex   
This paper presents a lexicalized HMM-based approach to Chinese named entity recognition (NER). To tackle the problem of unknown words, we unify unknown word identification and NER as a single tagging task on a sequence of known words. To do this, we first employ a known-word bigram-based model to segment a sentence into a sequence of known words, and then apply the uniformly lexicalized HMMs to assign each known word a proper hybrid tag that indicates its pattern in forming an entity and the category of the formed entity. Our system is able to integrate both the internal formation patterns and the surrounding contextual clues for NER under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. We have tested our system using different public corpora. The results show that lexicalized HMMs can substantially improve NER performance over standard HMMs. The results also indicate that character-based tagging (viz. the tagging based on pure single-character words) is comparable to and can even outperform the relevant known-word based tagging when a lexicalization technique is applied.
@article{
 title = {Chinese named entity recognition using lexicalized HMMs},
 type = {article},
 year = {2005},
 identifiers = {[object Object]},
 keywords = {character tagging,chinese named entity recognition,known word tagging,lexicalized hidden markov,models},
 pages = {19-25},
 volume = {7},
 websites = {http://portal.acm.org/citation.cfm?doid=1089815.1089819},
 id = {e7e9f5d6-94c6-36d0-93e5-166445f26b7e},
 created = {2012-01-21T12:35:31.000Z},
 file_attached = {false},
 profile_id = {5284e6aa-156c-3ce5-bc0e-b80cf09f3ef6},
 group_id = {066b42c8-f712-3fc3-abb2-225c158d2704},
 last_modified = {2017-03-14T14:36:19.698Z},
 tags = {named entity recognition},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {true},
 hidden = {false},
 citation_key = {Fu2005},
 private_publication = {false},
 abstract = {This paper presents a lexicalized HMM-based approach to Chinese named entity recognition (NER). To tackle the problem of unknown words, we unify unknown word identification and NER as a single tagging task on a sequence of known words. To do this, we first employ a known-word bigram-based model to segment a sentence into a sequence of known words, and then apply the uniformly lexicalized HMMs to assign each known word a proper hybrid tag that indicates its pattern in forming an entity and the category of the formed entity. Our system is able to integrate both the internal formation patterns and the surrounding contextual clues for NER under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. We have tested our system using different public corpora. The results show that lexicalized HMMs can substantially improve NER performance over standard HMMs. The results also indicate that character-based tagging (viz. the tagging based on pure single-character words) is comparable to and can even outperform the relevant known-word based tagging when a lexicalization technique is applied.},
 bibtype = {article},
 author = {Fu, Guohong and Luke, Kang-Kwong},
 journal = {ACM SIGKDD Explorations Newsletter},
 number = {1}
}

Downloads: 0