Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems

Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems. Lifu, H., Jonathan, M., Xiaoman, P., Heng, J., Xiang, R., Jiawei, H., Lin, Z., & A., H. J. Big Data, 5(1):19-31, 2017. PMID: 28328252

Paper doi abstract bibtex

Abstract The ability of automatically recognizing and typing entities in natural language without prior knowledge (e.g., predefined entity types) is a major challenge in processing such data. Most existing entity typing systems are limited to certain domains, genres, and languages. In this article, we propose a novel unsupervised entity-typing framework by combining symbolic and distributional semantics. We start from learning three types of representations for each entity mention: general semantic representation, specific context representation, and knowledge representation based on knowledge bases. Then we develop a novel joint hierarchical clustering and linking algorithm to type all mentions using these representations. This framework does not rely on any annotated data, predefined typing schema, or handcrafted features; therefore, it can be quickly adapted to a new domain, genre, and/or language. Experiments on genres (news and discussion forum) show comparable performance with state-of-the-art supervised typing systems trained from a large amount of labeled data. Results on various languages (English, Chinese, Japanese, Hausa, and Yoruba) and domains (general and biomedical) demonstrate the portability of our framework.

@article{doi:10.1089/big.2017.0012,
author = { Huang Lifu  and  May Jonathan  and  Pan Xiaoman  and  Ji Heng  and  Ren Xiang  and  Han Jiawei  and  Zhao Lin  and  Hendler James A. },
title = {Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems},
journal = {Big Data},
volume = {5},
number = {1},
pages = {19-31},
year = {2017},
doi = {10.1089/big.2017.0012},
    note ={PMID: 28328252},

URL = { 
        https://doi.org/10.1089/big.2017.0012

},
eprint = { 
        https://doi.org/10.1089/big.2017.0012

}
,
    abstract = { Abstract The ability of automatically recognizing and typing entities in natural language without prior knowledge (e.g., predefined entity types) is a major challenge in processing such data. Most existing entity typing systems are limited to certain domains, genres, and languages. In this article, we propose a novel unsupervised entity-typing framework by combining symbolic and distributional semantics. We start from learning three types of representations for each entity mention: general semantic representation, specific context representation, and knowledge representation based on knowledge bases. Then we develop a novel joint hierarchical clustering and linking algorithm to type all mentions using these representations. This framework does not rely on any annotated data, predefined typing schema, or handcrafted features; therefore, it can be quickly adapted to a new domain, genre, and/or language. Experiments on genres (news and discussion forum) show comparable performance with state-of-the-art supervised typing systems trained from a large amount of labeled data. Results on various languages (English, Chinese, Japanese, Hausa, and Yoruba) and domains (general and biomedical) demonstrate the portability of our framework. }
}

Downloads: 0

{"_id":"M2WGhXxc5yftCMrBv","bibbaseid":"lifu-jonathan-xiaoman-heng-xiang-jiawei-lin-a-liberalentityextractionrapidconstructionoffinegrainedentitytypingsystems-2017","author_short":["Lifu, H.","Jonathan, M.","Xiaoman, P.","Heng, J.","Xiang, R.","Jiawei, H.","Lin, Z.","A., H. J."],"bibdata":{"bibtype":"article","type":"article","author":[{"firstnames":["Huang"],"propositions":[],"lastnames":["Lifu"],"suffixes":[]},{"firstnames":["May"],"propositions":[],"lastnames":["Jonathan"],"suffixes":[]},{"firstnames":["Pan"],"propositions":[],"lastnames":["Xiaoman"],"suffixes":[]},{"firstnames":["Ji"],"propositions":[],"lastnames":["Heng"],"suffixes":[]},{"firstnames":["Ren"],"propositions":[],"lastnames":["Xiang"],"suffixes":[]},{"firstnames":["Han"],"propositions":[],"lastnames":["Jiawei"],"suffixes":[]},{"firstnames":["Zhao"],"propositions":[],"lastnames":["Lin"],"suffixes":[]},{"firstnames":["Hendler","James"],"propositions":[],"lastnames":["A."],"suffixes":[]}],"title":"Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems","journal":"Big Data","volume":"5","number":"1","pages":"19-31","year":"2017","doi":"10.1089/big.2017.0012","note":"PMID: 28328252","url":"https://doi.org/10.1089/big.2017.0012 ","eprint":"https://doi.org/10.1089/big.2017.0012 ","abstract":"Abstract The ability of automatically recognizing and typing entities in natural language without prior knowledge (e.g., predefined entity types) is a major challenge in processing such data. Most existing entity typing systems are limited to certain domains, genres, and languages. In this article, we propose a novel unsupervised entity-typing framework by combining symbolic and distributional semantics. We start from learning three types of representations for each entity mention: general semantic representation, specific context representation, and knowledge representation based on knowledge bases. Then we develop a novel joint hierarchical clustering and linking algorithm to type all mentions using these representations. This framework does not rely on any annotated data, predefined typing schema, or handcrafted features; therefore, it can be quickly adapted to a new domain, genre, and/or language. Experiments on genres (news and discussion forum) show comparable performance with state-of-the-art supervised typing systems trained from a large amount of labeled data. Results on various languages (English, Chinese, Japanese, Hausa, and Yoruba) and domains (general and biomedical) demonstrate the portability of our framework. ","bibtex":"@article{doi:10.1089/big.2017.0012,\nauthor = { Huang Lifu and May Jonathan and Pan Xiaoman and Ji Heng and Ren Xiang and Han Jiawei and Zhao Lin and Hendler James A. },\ntitle = {Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems},\njournal = {Big Data},\nvolume = {5},\nnumber = {1},\npages = {19-31},\nyear = {2017},\ndoi = {10.1089/big.2017.0012},\n note ={PMID: 28328252},\n\nURL = { \n https://doi.org/10.1089/big.2017.0012\n\n},\neprint = { \n https://doi.org/10.1089/big.2017.0012\n\n}\n,\n abstract = { Abstract The ability of automatically recognizing and typing entities in natural language without prior knowledge (e.g., predefined entity types) is a major challenge in processing such data. Most existing entity typing systems are limited to certain domains, genres, and languages. In this article, we propose a novel unsupervised entity-typing framework by combining symbolic and distributional semantics. We start from learning three types of representations for each entity mention: general semantic representation, specific context representation, and knowledge representation based on knowledge bases. Then we develop a novel joint hierarchical clustering and linking algorithm to type all mentions using these representations. This framework does not rely on any annotated data, predefined typing schema, or handcrafted features; therefore, it can be quickly adapted to a new domain, genre, and/or language. Experiments on genres (news and discussion forum) show comparable performance with state-of-the-art supervised typing systems trained from a large amount of labeled data. Results on various languages (English, Chinese, Japanese, Hausa, and Yoruba) and domains (general and biomedical) demonstrate the portability of our framework. }\n}\n\n","author_short":["Lifu, H.","Jonathan, M.","Xiaoman, P.","Heng, J.","Xiang, R.","Jiawei, H.","Lin, Z.","A., H. J."],"key":"doi:10.1089/big.2017.0012","id":"doi:10.1089/big.2017.0012","bibbaseid":"lifu-jonathan-xiaoman-heng-xiang-jiawei-lin-a-liberalentityextractionrapidconstructionoffinegrainedentitytypingsystems-2017","role":"author","urls":{"Paper":"https://doi.org/10.1089/big.2017.0012 "},"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://jonmay.github.io/webpage/cutelabname/cutelabname.bib","dataSources":["ZdhKtP2cSp3Aki2ge","X5WBAKQabka5TW5z7","hbZSwot2msWk92m5B","fcWjcoAgajPvXWcp7","GvHfaAWP6AfN6oLQE","j3Qzx9HAAC6WtJDHS","5eM3sAccSEpjSDHHQ"],"keywords":[],"search_terms":["liberal","entity","extraction","rapid","construction","fine","grained","entity","typing","systems","lifu","jonathan","xiaoman","heng","xiang","jiawei","lin","a."],"title":"Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems","year":2017}