Automatic Identification of Temporal Information in Tourism Web Pages. Santini, M. In Proceedings of the 6th international Conference on Language Resources and Evaluation (2008), 2008. European Language Resources Association (ELRA).
Automatic Identification of Temporal Information in Tourism Web Pages [link]Website  abstract   bibtex   
This paper presents our work on the detection of temporal information in web pages. The pages examined within the scope of this study were taken from the tourism sector and the temporal information in question is thus particular to this area. The differences that exist between extraction from plain textual data and extraction from the web are brought to light. These differences mainly concern the spatial arrangement of the text, the use of punctuation and the respect of traditional syntactic rules. The temporal expressions to be extracted are classified into two kinds: temporal information that concerns one particular event and repetitive temporal information. We adopt a symbolic approach relying on patterns and rules for the detection, extraction and annotation of temporal expressions; our method is based on the use of transducers. First evaluations have shown promising results. Since the visual structure of a web page is very important and often informs the user before he has even read the text, a semiotic study is also presented in this paper.
@inProceedings{
 title = {Automatic Identification of Temporal Information in Tourism Web Pages},
 type = {inProceedings},
 year = {2008},
 identifiers = {[object Object]},
 websites = {http://www.lrec-conf.org/proceedings/lrec2008/},
 publisher = {European Language Resources Association (ELRA)},
 institution = {University of Brighton},
 editors = {[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]},
 id = {f3dc6a45-93fc-3fcd-be30-f90cf6a910b6},
 created = {2011-01-29T09:23:47.000Z},
 file_attached = {false},
 profile_id = {5284e6aa-156c-3ce5-bc0e-b80cf09f3ef6},
 group_id = {066b42c8-f712-3fc3-abb2-225c158d2704},
 last_modified = {2017-03-14T14:36:19.698Z},
 tags = {temporal extraction},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {true},
 hidden = {false},
 citation_key = {Santini2008},
 private_publication = {false},
 abstract = {This paper presents our work on the detection of temporal information in web pages. The pages examined within the scope of this study were taken from the tourism sector and the temporal information in question is thus particular to this area. The differences that exist between extraction from plain textual data and extraction from the web are brought to light. These differences mainly concern the spatial arrangement of the text, the use of punctuation and the respect of traditional syntactic rules. The temporal expressions to be extracted are classified into two kinds: temporal information that concerns one particular event and repetitive temporal information. We adopt a symbolic approach relying on patterns and rules for the detection, extraction and annotation of temporal expressions; our method is based on the use of transducers. First evaluations have shown promising results. Since the visual structure of a web page is very important and often informs the user before he has even read the text, a semiotic study is also presented in this paper.},
 bibtype = {inProceedings},
 author = {Santini, Marina},
 booktitle = {Proceedings of the 6th international Conference on Language Resources and Evaluation (2008)}
}

Downloads: 0