Description of the LTG System Used for MUC-7. Mikheev, A., Grover, C., & Moens, M. 1998.
abstract   bibtex   
The basic building blocks in our muc system are reusable text handling tools which we have been developing and using for a number of years at the Language Technology Group. They are modular tools with stream input/output; each tool does a very speci c job, but can be combined with other tools in a unix pipeline. Di erent combinations of the same tools can thus be used in a pipeline for completing di erent tasks. Our architecture imposes an additional constraint on the input/output streams: they should have a common syntactic format. For this common format we chose eXtensible Markup Language (xml). xml is an ocial, simpli ed version of Standard Generalised Markup Language (sgml), simpli ed to make processing easier [3]. We were involved in the development of the xml standard, building on our expertise in the design of our own Normalised sgml (nsl) and nsl tool lt nsl [10], and our xml tool lt xml [11]. A detailed comparison of this sgml-oriented architecture with more traditional data-base oriented architectures can be found in [9]. A tool in our architecture is thus a piece of software which uses an api for all its access to xml and sgml data and performs a particular task: exploiting markup which has previously been added by other tools, removing markup, or adding new markup to the stream(s) without destroying the previously added markup. This approach allows us to remain entirely within the sgml paradigm for corpus markup while allowing us to be very general in the design of our tools, each of which can be used for many purposes. Furthermore, because we can pipe data through processes, the unix operating system itself provides the natural \glue" for integrating data-level applications. The sgml-handling api in our workbench is our lt nsl library [10] which can handle even the most complex document structures (dtds). It allows a tool to read, change or add attribute values and character data to sgml elements and to address a particular element in an nsl or xml stream using a query language called ltquery.
@misc{
 title = {Description of the LTG System Used for MUC-7},
 type = {misc},
 year = {1998},
 source = {Proceedings of Seventh Message Understanding Conference MUC7},
 id = {ba3a4503-726f-3c93-b390-7614803ae7e8},
 created = {2012-02-28T00:51:15.000Z},
 file_attached = {false},
 profile_id = {5284e6aa-156c-3ce5-bc0e-b80cf09f3ef6},
 group_id = {066b42c8-f712-3fc3-abb2-225c158d2704},
 last_modified = {2017-03-14T14:36:19.698Z},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {true},
 hidden = {false},
 citation_key = {Mikheev1998},
 private_publication = {false},
 abstract = {The basic building blocks in our muc system are reusable text handling tools which we have been developing and using for a number of years at the Language Technology Group. They are modular tools with stream input/output; each tool does a very speci c job, but can be combined with other tools in a unix pipeline. Di erent combinations of the same tools can thus be used in a pipeline for completing di erent tasks. Our architecture imposes an additional constraint on the input/output streams: they should have a common syntactic format. For this common format we chose eXtensible Markup Language (xml). xml is an ocial, simpli ed version of Standard Generalised Markup Language (sgml), simpli ed to make processing easier [3]. We were involved in the development of the xml standard, building on our expertise in the design of our own Normalised sgml (nsl) and nsl tool lt nsl [10], and our xml tool lt xml [11]. A detailed comparison of this sgml-oriented architecture with more traditional data-base oriented architectures can be found in [9]. A tool in our architecture is thus a piece of software which uses an api for all its access to xml and sgml data and performs a particular task: exploiting markup which has previously been added by other tools, removing markup, or adding new markup to the stream(s) without destroying the previously added markup. This approach allows us to remain entirely within the sgml paradigm for corpus markup while allowing us to be very general in the design of our tools, each of which can be used for many purposes. Furthermore, because we can pipe data through processes, the unix operating system itself provides the natural \glue" for integrating data-level applications. The sgml-handling api in our workbench is our lt nsl library [10] which can handle even the most complex document structures (dtds). It allows a tool to read, change or add attribute values and character data to sgml elements and to address a particular element in an nsl or xml stream using a query language called ltquery.},
 bibtype = {misc},
 author = {Mikheev, Andrei and Grover, Claire and Moens, Marc}
}

Downloads: 0