A Hybrid Approach for Named Entity Recognition in Indian Languages. Saha, S., K. & Bengal, W. Processing, 2008.
abstract   bibtex   
In this paper we describe a hybrid system that applies maximum entropy model (Max- Ent), language specific rules and gazetteers to the task of named entity recognition (NER) in Indian languages designed for the IJCNLP NERSSEAL shared task. Starting with named entity (NE) annotated corpora and a set of features we first build a base- line NER system. Then some language spe- cific rules are added to the system to recog- nize some specific NE classes. Also we have added some gazetteers and context patterns to the system to increase the performance. As identification of rules and context pat- terns requires language knowledge, we were able to prepare rules and identify context patterns for Hindi and Bengali only. For the other languages the system uses the MaxEnt model only. After preparing the one-level NER system, we have applid a set of rules to identify the nested entities. The system is able to recognize 12 classes of NEs with 65.13% f-value in Hindi, 65.96% f-value in Bengali and 44.65%, 18.74%, and 35.47% f-value in Oriya, Telugu and Urdu respec- tively.
@article{
 title = {A Hybrid Approach for Named Entity Recognition in Indian Languages},
 type = {article},
 year = {2008},
 pages = {17-24},
 id = {6661d172-c2b9-39a9-88a3-8aba7501eb51},
 created = {2012-01-21T12:35:31.000Z},
 file_attached = {false},
 profile_id = {5284e6aa-156c-3ce5-bc0e-b80cf09f3ef6},
 group_id = {066b42c8-f712-3fc3-abb2-225c158d2704},
 last_modified = {2017-03-14T14:36:19.698Z},
 tags = {named entity recognition},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {true},
 hidden = {false},
 citation_key = {Saha2008},
 private_publication = {false},
 abstract = {In this paper we describe a hybrid system that applies maximum entropy model (Max- Ent), language specific rules and gazetteers to the task of named entity recognition (NER) in Indian languages designed for the IJCNLP NERSSEAL shared task. Starting with named entity (NE) annotated corpora and a set of features we first build a base- line NER system. Then some language spe- cific rules are added to the system to recog- nize some specific NE classes. Also we have added some gazetteers and context patterns to the system to increase the performance. As identification of rules and context pat- terns requires language knowledge, we were able to prepare rules and identify context patterns for Hindi and Bengali only. For the other languages the system uses the MaxEnt model only. After preparing the one-level NER system, we have applid a set of rules to identify the nested entities. The system is able to recognize 12 classes of NEs with 65.13% f-value in Hindi, 65.96% f-value in Bengali and 44.65%, 18.74%, and 35.47% f-value in Oriya, Telugu and Urdu respec- tively.},
 bibtype = {article},
 author = {Saha, Sujan Kumar and Bengal, West},
 journal = {Processing},
 number = {January}
}

Downloads: 0