Pattern-Based Extraction of Addresses from Web Page Content. Asadi, S., Yang, G., Zhou, X., Shi, Y., Zhai, B., & Jiang, W., W. In Proceedings of the 10th Asia-Pacific Web Conference, APWeb 2008, pages 407-418, 2008.
abstract   bibtex   
Extraction of addresses and location names from Web pages is a challenging task for search engines. Traditional information extraction and natural processing models remain unsuccessful in the context of the Web because of the uncontrolled heterogenous nature of the Web resources as well as the effects of HTML and other markup tags. We describe a new pattern-based approach for extraction of addresses from Web pages. Both HTML and vision-based segmentations are used to increase the quality of address extraction. The proposed system uses several address patterns and a small table of geographic knowledge to hit addresses and then itemize them into smaller components. The experiments show that this model can extract and itemize different addresses effectively without large gazetteers or human supervision.
@inProceedings{
 title = {Pattern-Based Extraction of Addresses from Web Page Content},
 type = {inProceedings},
 year = {2008},
 identifiers = {[object Object]},
 keywords = {Address Extraction,Address Itemization,Web page Analysis},
 pages = {407-418},
 id = {82f2a095-5e5a-325f-bc86-571150695765},
 created = {2012-12-24T15:02:36.000Z},
 file_attached = {false},
 profile_id = {5284e6aa-156c-3ce5-bc0e-b80cf09f3ef6},
 group_id = {066b42c8-f712-3fc3-abb2-225c158d2704},
 last_modified = {2017-03-14T14:36:19.698Z},
 tags = {address extraction},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {true},
 hidden = {false},
 citation_key = {Asadi2008},
 private_publication = {false},
 abstract = {Extraction of addresses and location names from Web pages is a challenging task for search engines. Traditional information extraction and natural processing models remain unsuccessful in the context of the Web because of the uncontrolled heterogenous nature of the Web resources as well as the effects of HTML and other markup tags. We describe a new pattern-based approach for extraction of addresses from Web pages. Both HTML and vision-based segmentations are used to increase the quality of address extraction. The proposed system uses several address patterns and a small table of geographic knowledge to hit addresses and then itemize them into smaller components. The experiments show that this model can extract and itemize different addresses effectively without large gazetteers or human supervision.},
 bibtype = {inProceedings},
 author = {Asadi, Saeid and Yang, Guowei and Zhou, Xiaofang and Shi, Yuan and Zhai, Boxuan and Jiang, Wendy Wen-Rong},
 booktitle = {Proceedings of the 10th Asia-Pacific Web Conference, APWeb 2008}
}

Downloads: 0