A Hybrid Approach for Named Entity Recognition in Indian Languages

A Hybrid Approach for Named Entity Recognition in Indian Languages. Saha, S., K. & Bengal, W. Processing, 2008.
abstract bibtex

In this paper we describe a hybrid system that applies maximum entropy model (Max- Ent), language speciﬁc rules and gazetteers to the task of named entity recognition (NER) in Indian languages designed for the IJCNLP NERSSEAL shared task. Starting with named entity (NE) annotated corpora and a set of features we ﬁrst build a base- line NER system. Then some language spe- ciﬁc rules are added to the system to recog- nize some speciﬁc NE classes. Also we have added some gazetteers and context patterns to the system to increase the performance. As identiﬁcation of rules and context pat- terns requires language knowledge, we were able to prepare rules and identify context patterns for Hindi and Bengali only. For the other languages the system uses the MaxEnt model only. After preparing the one-level NER system, we have applid a set of rules to identify the nested entities. The system is able to recognize 12 classes of NEs with 65.13% f-value in Hindi, 65.96% f-value in Bengali and 44.65%, 18.74%, and 35.47% f-value in Oriya, Telugu and Urdu respec- tively.

@article{
 title = {A Hybrid Approach for Named Entity Recognition in Indian Languages},
 type = {article},
 year = {2008},
 pages = {17-24},
 id = {6661d172-c2b9-39a9-88a3-8aba7501eb51},
 created = {2012-01-21T12:35:31.000Z},
 file_attached = {false},
 profile_id = {5284e6aa-156c-3ce5-bc0e-b80cf09f3ef6},
 group_id = {066b42c8-f712-3fc3-abb2-225c158d2704},
 last_modified = {2017-03-14T14:36:19.698Z},
 tags = {named entity recognition},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {true},
 hidden = {false},
 citation_key = {Saha2008},
 private_publication = {false},
 abstract = {In this paper we describe a hybrid system that applies maximum entropy model (Max- Ent), language speciﬁc rules and gazetteers to the task of named entity recognition (NER) in Indian languages designed for the IJCNLP NERSSEAL shared task. Starting with named entity (NE) annotated corpora and a set of features we ﬁrst build a base- line NER system. Then some language spe- ciﬁc rules are added to the system to recog- nize some speciﬁc NE classes. Also we have added some gazetteers and context patterns to the system to increase the performance. As identiﬁcation of rules and context pat- terns requires language knowledge, we were able to prepare rules and identify context patterns for Hindi and Bengali only. For the other languages the system uses the MaxEnt model only. After preparing the one-level NER system, we have applid a set of rules to identify the nested entities. The system is able to recognize 12 classes of NEs with 65.13% f-value in Hindi, 65.96% f-value in Bengali and 44.65%, 18.74%, and 35.47% f-value in Oriya, Telugu and Urdu respec- tively.},
 bibtype = {article},
 author = {Saha, Sujan Kumar and Bengal, West},
 journal = {Processing},
 number = {January}
}

Downloads: 0

{"_id":"DJGoXiE2znNqxBZ5g","bibbaseid":"saha-bengal-ahybridapproachfornamedentityrecognitioninindianlanguages-2008","authorIDs":[],"author_short":["Saha, S., K.","Bengal, W."],"bibdata":{"title":"A Hybrid Approach for Named Entity Recognition in Indian Languages","type":"article","year":"2008","pages":"17-24","id":"6661d172-c2b9-39a9-88a3-8aba7501eb51","created":"2012-01-21T12:35:31.000Z","file_attached":false,"profile_id":"5284e6aa-156c-3ce5-bc0e-b80cf09f3ef6","group_id":"066b42c8-f712-3fc3-abb2-225c158d2704","last_modified":"2017-03-14T14:36:19.698Z","tags":"named entity recognition","read":false,"starred":false,"authored":false,"confirmed":"true","hidden":false,"citation_key":"Saha2008","private_publication":false,"abstract":"In this paper we describe a hybrid system that applies maximum entropy model (Max- Ent), language speciﬁc rules and gazetteers to the task of named entity recognition (NER) in Indian languages designed for the IJCNLP NERSSEAL shared task. Starting with named entity (NE) annotated corpora and a set of features we ﬁrst build a base- line NER system. Then some language spe- ciﬁc rules are added to the system to recog- nize some speciﬁc NE classes. Also we have added some gazetteers and context patterns to the system to increase the performance. As identiﬁcation of rules and context pat- terns requires language knowledge, we were able to prepare rules and identify context patterns for Hindi and Bengali only. For the other languages the system uses the MaxEnt model only. After preparing the one-level NER system, we have applid a set of rules to identify the nested entities. The system is able to recognize 12 classes of NEs with 65.13% f-value in Hindi, 65.96% f-value in Bengali and 44.65%, 18.74%, and 35.47% f-value in Oriya, Telugu and Urdu respec- tively.","bibtype":"article","author":"Saha, Sujan Kumar and Bengal, West","journal":"Processing","number":"January","bibtex":"@article{\n title = {A Hybrid Approach for Named Entity Recognition in Indian Languages},\n type = {article},\n year = {2008},\n pages = {17-24},\n id = {6661d172-c2b9-39a9-88a3-8aba7501eb51},\n created = {2012-01-21T12:35:31.000Z},\n file_attached = {false},\n profile_id = {5284e6aa-156c-3ce5-bc0e-b80cf09f3ef6},\n group_id = {066b42c8-f712-3fc3-abb2-225c158d2704},\n last_modified = {2017-03-14T14:36:19.698Z},\n tags = {named entity recognition},\n read = {false},\n starred = {false},\n authored = {false},\n confirmed = {true},\n hidden = {false},\n citation_key = {Saha2008},\n private_publication = {false},\n abstract = {In this paper we describe a hybrid system that applies maximum entropy model (Max- Ent), language speciﬁc rules and gazetteers to the task of named entity recognition (NER) in Indian languages designed for the IJCNLP NERSSEAL shared task. Starting with named entity (NE) annotated corpora and a set of features we ﬁrst build a base- line NER system. Then some language spe- ciﬁc rules are added to the system to recog- nize some speciﬁc NE classes. Also we have added some gazetteers and context patterns to the system to increase the performance. As identiﬁcation of rules and context pat- terns requires language knowledge, we were able to prepare rules and identify context patterns for Hindi and Bengali only. For the other languages the system uses the MaxEnt model only. After preparing the one-level NER system, we have applid a set of rules to identify the nested entities. The system is able to recognize 12 classes of NEs with 65.13% f-value in Hindi, 65.96% f-value in Bengali and 44.65%, 18.74%, and 35.47% f-value in Oriya, Telugu and Urdu respec- tively.},\n bibtype = {article},\n author = {Saha, Sujan Kumar and Bengal, West},\n journal = {Processing},\n number = {January}\n}","author_short":["Saha, S., K.","Bengal, W."],"bibbaseid":"saha-bengal-ahybridapproachfornamedentityrecognitioninindianlanguages-2008","role":"author","urls":{},"downloads":0,"html":""},"bibtype":"article","creationDate":"2020-02-06T23:48:11.941Z","downloads":0,"keywords":[],"search_terms":["hybrid","approach","named","entity","recognition","indian","languages","saha","bengal"],"title":"A Hybrid Approach for Named Entity Recognition in Indian Languages","year":2008}