Information Extraction by Text Classification: Corpus Mining for Feat ures. Zavrel, J., Berck, P., & Lavrijssen, W. In Proceedings of the Second International Conference on Language Resources and Evaluation LREC00, 2000.
Information Extraction by Text Classification: Corpus Mining for Feat ures [link]Website  abstract   bibtex   
This paper describes a method for building an Information Extraction (IE) system using standard text classification machine learning techniques, and datamining for complex features on a large corpus of example texts that are only superficially annotated. We have successfully used this method to build an IE system (Textractor) for job advertisements. 1. Introduction For rapid development of an Information Extraction system in a large new domain, the usual methods of semicorpusbased hand-crafting of extraction rules are often simply too laborious. Therefore one must turn to the use of machine learning techniques and try to induce the knowledge needed for extraction from annotated training samples. Techniques for the induction of extraction rules are e.g. described by (Freitag, 1998
@inProceedings{
 title = {Information Extraction by Text Classification: Corpus Mining for Feat ures},
 type = {inProceedings},
 year = {2000},
 websites = {citeseer.nj.nec.com/323617.html},
 id = {5aa6e257-773f-32cd-aad1-9567b7870d79},
 created = {2011-01-29T09:23:47.000Z},
 file_attached = {false},
 profile_id = {5284e6aa-156c-3ce5-bc0e-b80cf09f3ef6},
 group_id = {066b42c8-f712-3fc3-abb2-225c158d2704},
 last_modified = {2017-03-14T14:36:19.698Z},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {true},
 hidden = {false},
 citation_key = {Zavrel2000},
 private_publication = {false},
 abstract = {This paper describes a method for building an Information Extraction (IE) system using standard text classification machine learning techniques, and datamining for complex features on a large corpus of example texts that are only superficially annotated. We have successfully used this method to build an IE system (Textractor) for job advertisements. 1. Introduction For rapid development of an Information Extraction system in a large new domain, the usual methods of semicorpusbased hand-crafting of extraction rules are often simply too laborious. Therefore one must turn to the use of machine learning techniques and try to induce the knowledge needed for extraction from annotated training samples. Techniques for the induction of extraction rules are e.g. described by (Freitag, 1998},
 bibtype = {inProceedings},
 author = {Zavrel, Jakub and Berck, Peter and Lavrijssen, Willem},
 booktitle = {Proceedings of the Second International Conference on Language Resources and Evaluation LREC00}
}

Downloads: 0