Named entity transliteration for cross-language information retrieval using compressed word format mapping algorithm. Janarthanam, S., C., Ramalingam, S., & Nallasamy, U. Proceeding of the 2nd ACM workshop on Improving non english web searching iNEWS 08, ACM Press, 2008.
Named entity transliteration for cross-language information retrieval using compressed word format mapping algorithm [link]Website  abstract   bibtex   
Transliteration of named entities in user queries is a vital step in any Cross-Language Information Retrieval (CLIR) system. Several methods for transliteration have been proposed till date based on the nature of the languages considered. In this paper, we present a transliteration algorithm for mapping English named entities to their proper Tamil equivalents. Our algorithm employs a grapheme-based model, in which transliteration equivalents are identified by mapping the source language names to their equivalents in a target language database, instead of generating them. The basic principle is to compress the source word into its minimal form and align it across an indexed list of target language words to arrive at the top n-equivalents based on the edit distance. We compare the performance of our approach with a statistical generation approach using Microsoft Research India (MSRI) transliteration corpus. Our approach has proved very effective in terms of accuracy and time.
@article{
 title = {Named entity transliteration for cross-language information retrieval using compressed word format mapping algorithm},
 type = {article},
 year = {2008},
 identifiers = {[object Object]},
 pages = {33},
 websites = {http://portal.acm.org/citation.cfm?doid=1460027.1460033},
 publisher = {ACM Press},
 id = {a7febcc1-d720-319a-94b9-9a781fcbef02},
 created = {2011-12-28T07:04:55.000Z},
 file_attached = {false},
 profile_id = {5284e6aa-156c-3ce5-bc0e-b80cf09f3ef6},
 group_id = {066b42c8-f712-3fc3-abb2-225c158d2704},
 last_modified = {2017-03-14T14:36:19.698Z},
 tags = {named entities},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {true},
 hidden = {false},
 citation_key = {Janarthanam2008},
 private_publication = {false},
 abstract = {Transliteration of named entities in user queries is a vital step in any Cross-Language Information Retrieval (CLIR) system. Several methods for transliteration have been proposed till date based on the nature of the languages considered. In this paper, we present a transliteration algorithm for mapping English named entities to their proper Tamil equivalents. Our algorithm employs a grapheme-based model, in which transliteration equivalents are identified by mapping the source language names to their equivalents in a target language database, instead of generating them. The basic principle is to compress the source word into its minimal form and align it across an indexed list of target language words to arrive at the top n-equivalents based on the edit distance. We compare the performance of our approach with a statistical generation approach using Microsoft Research India (MSRI) transliteration corpus. Our approach has proved very effective in terms of accuracy and time.},
 bibtype = {article},
 author = {Janarthanam, Srinivasan C and Ramalingam, Sethuramalingam and Nallasamy, Udhyakumar},
 journal = {Proceeding of the 2nd ACM workshop on Improving non english web searching iNEWS 08}
}

Downloads: 0