A Shallow Text Processing Core Engine. Neumann, G. & Piskorski, J. Journal of Computational Intelligence, 18:451–476, 2002. Series Number: 3
abstract   bibtex   
In this paper we present SPPC, a high-performance system for intelligent extraction of structured data from free text documents. SPPC consists of a set of domain-adaptive shallow core components that are realized by means of cascaded weighted finite state machines and generic dynamic tries. The system has been fully implemented for German; it includes morphological and on-line compound analysis, efficient POS-filtering, high performance named entity recognition and chunk parsing based on a novel divide-and-conquer strategy. The whole approach proved to be very useful for processing free word order languages like German. SPPC has a good performance (more than 6000 words per second on standard PC environments) and achieves high linguistic coverage, especially for the divide-and-conquer parsing strategy, where we obtained an f-measure of 87.14% on unseen data.
@article{Neumann/Piskorski:02,
	title = {A {Shallow} {Text} {Processing} {Core} {Engine}},
	volume = {18},
	abstract = {In this paper we present SPPC, a high-performance
system for intelligent extraction of structured data
from free text documents. SPPC consists of a set of
domain-adaptive shallow core components that are
realized by means of cascaded weighted finite state
machines and generic dynamic tries. The system has been
fully implemented for German; it includes morphological
and on-line compound analysis, efficient POS-filtering,
high performance named entity recognition and chunk
parsing based on a novel divide-and-conquer strategy.
The whole approach proved to be very useful for
processing free word order languages like German. SPPC
has a good performance (more than 6000 words per second
on standard PC environments) and achieves high
linguistic coverage, especially for the
divide-and-conquer parsing strategy, where we obtained
an f-measure of 87.14\% on unseen data.},
	journal = {Journal of Computational Intelligence},
	author = {Neumann, Günter and Piskorski, Jakub},
	year = {2002},
	note = {Series Number: 3},
	pages = {451--476},
}

Downloads: 0