A Shallow Text Processing Core Engine. Neumann, G. & Piskorski, J. Journal of Computational Intelligence, 18:451–476, 2002. Series Number: 3abstract bibtex In this paper we present SPPC, a high-performance system for intelligent extraction of structured data from free text documents. SPPC consists of a set of domain-adaptive shallow core components that are realized by means of cascaded weighted finite state machines and generic dynamic tries. The system has been fully implemented for German; it includes morphological and on-line compound analysis, efficient POS-filtering, high performance named entity recognition and chunk parsing based on a novel divide-and-conquer strategy. The whole approach proved to be very useful for processing free word order languages like German. SPPC has a good performance (more than 6000 words per second on standard PC environments) and achieves high linguistic coverage, especially for the divide-and-conquer parsing strategy, where we obtained an f-measure of 87.14% on unseen data.
@article{Neumann/Piskorski:02,
title = {A {Shallow} {Text} {Processing} {Core} {Engine}},
volume = {18},
abstract = {In this paper we present SPPC, a high-performance
system for intelligent extraction of structured data
from free text documents. SPPC consists of a set of
domain-adaptive shallow core components that are
realized by means of cascaded weighted finite state
machines and generic dynamic tries. The system has been
fully implemented for German; it includes morphological
and on-line compound analysis, efficient POS-filtering,
high performance named entity recognition and chunk
parsing based on a novel divide-and-conquer strategy.
The whole approach proved to be very useful for
processing free word order languages like German. SPPC
has a good performance (more than 6000 words per second
on standard PC environments) and achieves high
linguistic coverage, especially for the
divide-and-conquer parsing strategy, where we obtained
an f-measure of 87.14\% on unseen data.},
journal = {Journal of Computational Intelligence},
author = {Neumann, Günter and Piskorski, Jakub},
year = {2002},
note = {Series Number: 3},
pages = {451--476},
}
Downloads: 0
{"_id":"vgipBr22LtMiGh3gg","bibbaseid":"neumann-piskorski-ashallowtextprocessingcoreengine-2002","author_short":["Neumann, G.","Piskorski, J."],"bibdata":{"bibtype":"article","type":"article","title":"A Shallow Text Processing Core Engine","volume":"18","abstract":"In this paper we present SPPC, a high-performance system for intelligent extraction of structured data from free text documents. SPPC consists of a set of domain-adaptive shallow core components that are realized by means of cascaded weighted finite state machines and generic dynamic tries. The system has been fully implemented for German; it includes morphological and on-line compound analysis, efficient POS-filtering, high performance named entity recognition and chunk parsing based on a novel divide-and-conquer strategy. The whole approach proved to be very useful for processing free word order languages like German. SPPC has a good performance (more than 6000 words per second on standard PC environments) and achieves high linguistic coverage, especially for the divide-and-conquer parsing strategy, where we obtained an f-measure of 87.14% on unseen data.","journal":"Journal of Computational Intelligence","author":[{"propositions":[],"lastnames":["Neumann"],"firstnames":["Günter"],"suffixes":[]},{"propositions":[],"lastnames":["Piskorski"],"firstnames":["Jakub"],"suffixes":[]}],"year":"2002","note":"Series Number: 3","pages":"451–476","bibtex":"@article{Neumann/Piskorski:02,\n\ttitle = {A {Shallow} {Text} {Processing} {Core} {Engine}},\n\tvolume = {18},\n\tabstract = {In this paper we present SPPC, a high-performance\nsystem for intelligent extraction of structured data\nfrom free text documents. SPPC consists of a set of\ndomain-adaptive shallow core components that are\nrealized by means of cascaded weighted finite state\nmachines and generic dynamic tries. The system has been\nfully implemented for German; it includes morphological\nand on-line compound analysis, efficient POS-filtering,\nhigh performance named entity recognition and chunk\nparsing based on a novel divide-and-conquer strategy.\nThe whole approach proved to be very useful for\nprocessing free word order languages like German. SPPC\nhas a good performance (more than 6000 words per second\non standard PC environments) and achieves high\nlinguistic coverage, especially for the\ndivide-and-conquer parsing strategy, where we obtained\nan f-measure of 87.14\\% on unseen data.},\n\tjournal = {Journal of Computational Intelligence},\n\tauthor = {Neumann, Günter and Piskorski, Jakub},\n\tyear = {2002},\n\tnote = {Series Number: 3},\n\tpages = {451--476},\n}\n\n","author_short":["Neumann, G.","Piskorski, J."],"key":"Neumann/Piskorski:02","id":"Neumann/Piskorski:02","bibbaseid":"neumann-piskorski-ashallowtextprocessingcoreengine-2002","role":"author","urls":{},"metadata":{"authorlinks":{}},"html":""},"bibtype":"article","biburl":"https://bibbase.org/zotero/ifromm","dataSources":["N4kJAiLiJ7kxfNsoh"],"keywords":[],"search_terms":["shallow","text","processing","core","engine","neumann","piskorski"],"title":"A Shallow Text Processing Core Engine","year":2002}