Towards a System for Ontology-Based Information Extraction from PDF Documents. Oro, E., Ruffolo, M., & Saccà, D. System, 18(05):673, 2008.
Towards a System for Ontology-Based Information Extraction from PDF Documents [link]Website  abstract   bibtex   
Information extraction is of paramount importance in several real world applications in the areas of business, competitive and military intelligence because it enables to acquire information contained in unstructured documents and store them in structured forms. Unstructured documents have different internal encodings, one of the most diffused encoding is the visualization-oriented Adobe portable document format (PDF). Although several sophisticated and indeed complex approaches were proposed, they are still limited in many aspects. In particular, existing information extraction systems cannot be applied to PDF documents because of their completely unstructured nature that pose many issues in defining IE approaches. In this paper the novel ontology-based system named XONTO, that allows the semantic extraction of information from PDF documents, is presented. The XONTO system is founded on the idea of self-describing ontologies in which objects and classes can be equipped by a set of rules named descriptors. These rules represent patterns that allow to automatically recognize and extract ontology objects contained in PDF documents also when information is arranged in tabular form. This way a self-describing ontology expresses the semantic of the information to extract and the rules that, in turn, populate itself. In the paper XONTO system behaviors and structure are sketched by means of a running example.
@article{
 title = {Towards a System for Ontology-Based Information Extraction from PDF Documents},
 type = {article},
 year = {2008},
 identifiers = {[object Object]},
 keywords = {Ontology-based information extraction,PDF document,attribute grammar,augmented transition network,knowledge representation and reasoning,logic programming,ontology,semantics},
 pages = {673},
 volume = {18},
 websites = {http://www.worldscinet.com/ijait/18/1805/S0218213009000354.html},
 id = {e90029b3-adde-378e-aeeb-bb5e7f487e49},
 created = {2012-02-09T21:39:35.000Z},
 file_attached = {false},
 profile_id = {5284e6aa-156c-3ce5-bc0e-b80cf09f3ef6},
 group_id = {066b42c8-f712-3fc3-abb2-225c158d2704},
 last_modified = {2017-03-14T14:36:19.698Z},
 tags = {ontology-based information extraction},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {true},
 hidden = {false},
 citation_key = {Oro2008},
 private_publication = {false},
 abstract = {Information extraction is of paramount importance in several real world applications in the areas of business, competitive and military intelligence because it enables to acquire information contained in unstructured documents and store them in structured forms. Unstructured documents have different internal encodings, one of the most diffused encoding is the visualization-oriented Adobe portable document format (PDF). Although several sophisticated and indeed complex approaches were proposed, they are still limited in many aspects. In particular, existing information extraction systems cannot be applied to PDF documents because of their completely unstructured nature that pose many issues in defining IE approaches. In this paper the novel ontology-based system named XONTO, that allows the semantic extraction of information from PDF documents, is presented. The XONTO system is founded on the idea of self-describing ontologies in which objects and classes can be equipped by a set of rules named descriptors. These rules represent patterns that allow to automatically recognize and extract ontology objects contained in PDF documents also when information is arranged in tabular form. This way a self-describing ontology expresses the semantic of the information to extract and the rules that, in turn, populate itself. In the paper XONTO system behaviors and structure are sketched by means of a running example.},
 bibtype = {article},
 author = {Oro, Ermelinda and Ruffolo, Massimo and Saccà, Domenico},
 journal = {System},
 number = {05}
}

Downloads: 0