Exploiting the Hierarchical Structure of a Thesaurus for Document Classification. Filtz, E.; Kirrane, S.; Polleres, A.; and Wohlgenannt, G. In 27th International Conference on COOPERATIVE INFORMATION SYSTEMS (CoopIS 2019), Rhodes, Greece, October, 2019. springer. to appear
Exploiting the Hierarchical Structure of a Thesaurus for Document Classification [pdf]Paper  abstract   bibtex   
Multi-label document classification is a challenging problem because of the potentially huge number of classes. Furthermore, real-world datasets often exhibit a strongly varying number of labels per document, and a power-law distribution of those class labels. Multi-label classification of legal documents is additionally complicated by long document texts and domain-specific use of language. In this paper we are using different approaches to compare the performance of text classification algorithms on existing datasets and corpora of legal documents, and contrast those experiments with results on general-purpose multi-label text classification datasets. Moreover, for the EUR-Lex legal datasets, we show that exploiting the hierarchy of the EuroVoc thesaurus helps to improve classification performance by reducing the number of potential classes while retaining the informative value of the classification itself.
@inproceedings{filt-etal-2019COOPIS,
 title = {Exploiting the Hierarchical Structure of a Thesaurus for Document Classification},
 author = {Erwin Filtz and Sabrina Kirrane and Axel Polleres and Gerhard Wohlgenannt},
 abstract = {Multi-label document classification is a challenging problem because of the potentially huge number
of classes. Furthermore, real-world datasets often exhibit a strongly varying number of labels per document, and a power-law distribution of those class labels.
Multi-label classification of legal documents is additionally complicated by long document texts and domain-specific use of language.
In this paper we are using different approaches to compare the performance of text classification algorithms on existing datasets and corpora of legal documents, and contrast those experiments with results on general-purpose multi-label text classification datasets.
Moreover, for the EUR-Lex legal datasets, we show that exploiting the hierarchy of the EuroVoc thesaurus helps to improve classification performance by reducing the number of potential classes while retaining the informative value of the classification itself.},
year = 2019,
month = oct,
day = {23-25},
address = {Rhodes, Greece},
booktitle = {27th International Conference on COOPERATIVE INFORMATION SYSTEMS (CoopIS 2019)},
publisher = springer,
url = {http://www.polleres.net/publications/filt-etal-2019COOPIS.pdf},
note ={to appear},
type = CONF,
}
Downloads: 0