A Chinese Named Entity Extraction System. Peterson, E. In Proceedings of the 8th Annual Conference of the International Association of Chinese Linguistics, 1999.
abstract   bibtex   
For many information applications, being able to identify the proper names and other entities in a text is a vital step in understanding and using the text. For example, in a Chinese-English machine translation system, if a word is identified as a person name, it can be converted to pinyin, rather than being translated as a regular word. Other entities, such as times, dates, and money amounts, are best translated by modules with special knowledge of these domains. This paper discusses the development and testing of a Perl-based named entity identification and extraction system for simplified Chinese text. This system can serve as a first stage to other Chinese language processing systems. Entities that are identified include locations, person names, organizations, dates, times, money amounts, and percentages. The system uses a segmenter and a specially created pattern matching language to identify these named entities. Useful criteria for finding each of these entity types, along with the major problems in finding them, are discussed. Initial runs of the system on a test corpus produced promising precision and recall scores of 52 and 46, respectively. Finally, the paper proposes possibilities for ways to improve the extraction system.
@inProceedings{
 title = {A Chinese Named Entity Extraction System},
 type = {inProceedings},
 year = {1999},
 id = {2368df42-8c68-3fae-8c83-d3ce03f26e7f},
 created = {2012-02-28T00:52:49.000Z},
 file_attached = {false},
 profile_id = {5284e6aa-156c-3ce5-bc0e-b80cf09f3ef6},
 group_id = {066b42c8-f712-3fc3-abb2-225c158d2704},
 last_modified = {2017-03-14T14:36:19.698Z},
 tags = {named entity recognition},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {true},
 hidden = {false},
 citation_key = {Peterson1999},
 private_publication = {false},
 abstract = {For many information applications, being able to identify the proper names and other entities in a text is a vital step in understanding and using the text. For example, in a Chinese-English machine translation system, if a word is identified as a person name, it can be converted to pinyin, rather than being translated as a regular word. Other entities, such as times, dates, and money amounts, are best translated by modules with special knowledge of these domains. This paper discusses the development and testing of a Perl-based named entity identification and extraction system for simplified Chinese text. This system can serve as a first stage to other Chinese language processing systems. Entities that are identified include locations, person names, organizations, dates, times, money amounts, and percentages. The system uses a segmenter and a specially created pattern matching language to identify these named entities. Useful criteria for finding each of these entity types, along with the major problems in finding them, are discussed. Initial runs of the system on a test corpus produced promising precision and recall scores of 52 and 46, respectively. Finally, the paper proposes possibilities for ways to improve the extraction system.},
 bibtype = {inProceedings},
 author = {Peterson, Erik},
 booktitle = {Proceedings of the 8th Annual Conference of the International Association of Chinese Linguistics}
}

Downloads: 0