Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems. Lifu, H., Jonathan, M., Xiaoman, P., Heng, J., Xiang, R., Jiawei, H., Lin, Z., & A., H. J. Big Data, 5(1):19-31, 2017. PMID: 28328252
Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems [link]Paper  doi  abstract   bibtex   
Abstract The ability of automatically recognizing and typing entities in natural language without prior knowledge (e.g., predefined entity types) is a major challenge in processing such data. Most existing entity typing systems are limited to certain domains, genres, and languages. In this article, we propose a novel unsupervised entity-typing framework by combining symbolic and distributional semantics. We start from learning three types of representations for each entity mention: general semantic representation, specific context representation, and knowledge representation based on knowledge bases. Then we develop a novel joint hierarchical clustering and linking algorithm to type all mentions using these representations. This framework does not rely on any annotated data, predefined typing schema, or handcrafted features; therefore, it can be quickly adapted to a new domain, genre, and/or language. Experiments on genres (news and discussion forum) show comparable performance with state-of-the-art supervised typing systems trained from a large amount of labeled data. Results on various languages (English, Chinese, Japanese, Hausa, and Yoruba) and domains (general and biomedical) demonstrate the portability of our framework.
@article{doi:10.1089/big.2017.0012,
author = { Huang Lifu  and  May Jonathan  and  Pan Xiaoman  and  Ji Heng  and  Ren Xiang  and  Han Jiawei  and  Zhao Lin  and  Hendler James A. },
title = {Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems},
journal = {Big Data},
volume = {5},
number = {1},
pages = {19-31},
year = {2017},
doi = {10.1089/big.2017.0012},
    note ={PMID: 28328252},

URL = { 
        https://doi.org/10.1089/big.2017.0012

},
eprint = { 
        https://doi.org/10.1089/big.2017.0012

}
,
    abstract = { Abstract The ability of automatically recognizing and typing entities in natural language without prior knowledge (e.g., predefined entity types) is a major challenge in processing such data. Most existing entity typing systems are limited to certain domains, genres, and languages. In this article, we propose a novel unsupervised entity-typing framework by combining symbolic and distributional semantics. We start from learning three types of representations for each entity mention: general semantic representation, specific context representation, and knowledge representation based on knowledge bases. Then we develop a novel joint hierarchical clustering and linking algorithm to type all mentions using these representations. This framework does not rely on any annotated data, predefined typing schema, or handcrafted features; therefore, it can be quickly adapted to a new domain, genre, and/or language. Experiments on genres (news and discussion forum) show comparable performance with state-of-the-art supervised typing systems trained from a large amount of labeled data. Results on various languages (English, Chinese, Japanese, Hausa, and Yoruba) and domains (general and biomedical) demonstrate the portability of our framework. }
}

Downloads: 0