A Novel Method of Chinese Web Information Extraction and Applications. Liu, Z. & Wang, Y. 2009 WASE International Conference on Information Engineering, Ieee, 2009.
A Novel Method of Chinese Web Information Extraction and Applications [link]Website  abstract   bibtex   
One promising application of natural language processing (NLP) research is in the area of information extraction (IE). In this paper, we present work flow of our IE system for the extraction of semantically rich information from the unstructured or semi-structured Chinese web pages. Knowledge engineering approach and automatic training approach are used to extract pattern and built knowledge repository. General IE system needs to label the unlabeled training Web pages. A novel methodology that does not need to label text is developed, including hierarchy filtration pattern matching based on syntax in best distance method and maximum forward boundary recognition using organization suffix repository and part of speech tagging method. As for applications of IE, a new application system based on IE is built. It is object-level vertical search system and object here is Chinese people, so IE is concerned with extracting people's related attributes from a collection of web pages about Chinese people. The results are displayed as hierarchy directory tree according to people's attributes. The system makes user find people quickly and easily.
@article{
 title = {A Novel Method of Chinese Web Information Extraction and Applications},
 type = {article},
 year = {2009},
 identifiers = {[object Object]},
 keywords = {extraction,ie,information,machine learning,ml,natural language processing,nlp},
 pages = {65-68},
 websites = {http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5211147},
 publisher = {Ieee},
 id = {6c20f9bb-9fe5-36fe-a4d6-62e958971e50},
 created = {2012-02-28T00:51:15.000Z},
 file_attached = {false},
 profile_id = {5284e6aa-156c-3ce5-bc0e-b80cf09f3ef6},
 group_id = {066b42c8-f712-3fc3-abb2-225c158d2704},
 last_modified = {2017-03-14T14:36:19.698Z},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {true},
 hidden = {false},
 citation_key = {Liu2009b},
 private_publication = {false},
 abstract = {One promising application of natural language processing (NLP) research is in the area of information extraction (IE). In this paper, we present work flow of our IE system for the extraction of semantically rich information from the unstructured or semi-structured Chinese web pages. Knowledge engineering approach and automatic training approach are used to extract pattern and built knowledge repository. General IE system needs to label the unlabeled training Web pages. A novel methodology that does not need to label text is developed, including hierarchy filtration pattern matching based on syntax in best distance method and maximum forward boundary recognition using organization suffix repository and part of speech tagging method. As for applications of IE, a new application system based on IE is built. It is object-level vertical search system and object here is Chinese people, so IE is concerned with extracting people's related attributes from a collection of web pages about Chinese people. The results are displayed as hierarchy directory tree according to people's attributes. The system makes user find people quickly and easily.},
 bibtype = {article},
 author = {Liu, Zhong and Wang, Ying},
 journal = {2009 WASE International Conference on Information Engineering}
}

Downloads: 0