A Lempel-Ziv text index on secondary storage

A Lempel-Ziv text index on secondary storage. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), volume 4580 LNCS, pages 83-94, 2007.
abstract bibtex

Full-text searching consists in locating the occurrences of a given pattern P[1..m] in a text T[1..u], both sequences over an alphabet of size σ. In this paper we define a new index for full-text searching on secondary storage, based on the Lempel-Ziv compression algorithm and requiring 8uHk + o(u log σ) bits of space, where Hk denotes the k-th order empirical entropy of T, for any k = o(logσ u). Our experimental results show that our index is significantly smaller than any other practical secondary-memory data structure: 1.4-2.3 times the text size including the text, which means 39%-65% the size of traditional indexes like String B-trees [Ferragina and Grossi, JACM 1999]. In exchange, our index requires more disk access to locate the pattern occurrences. Our index is able to report up to 600 occurrences per disk access, for a disk page of 32 kilobytes. If we only need to count pattern occurrences, the space can be reduced to about 1.04-1.68 times the text size, requiring about 20-60 disk accesses, depending on the pattern length. © Springer-Verlag Berlin Heidelberg 2007.

@inproceedings{37849014688,
    abstract = "Full-text searching consists in locating the occurrences of a given pattern P[1..m] in a text T[1..u], both sequences over an alphabet of size σ. In this paper we define a new index for full-text searching on secondary storage, based on the Lempel-Ziv compression algorithm and requiring 8uHk + o(u log σ) bits of space, where Hk denotes the k-th order empirical entropy of T, for any k = o(logσ u). Our experimental results show that our index is significantly smaller than any other practical secondary-memory data structure: 1.4-2.3 times the text size including the text, which means 39\%-65\% the size of traditional indexes like String B-trees [Ferragina and Grossi, JACM 1999]. In exchange, our index requires more disk access to locate the pattern occurrences. Our index is able to report up to 600 occurrences per disk access, for a disk page of 32 kilobytes. If we only need to count pattern occurrences, the space can be reduced to about 1.04-1.68 times the text size, requiring about 20-60 disk accesses, depending on the pattern length. © Springer-Verlag Berlin Heidelberg 2007.",
    year = "2007",
    title = "A Lempel-Ziv text index on secondary storage",
    volume = "4580 LNCS",
    pages = "83-94",
    booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)"
}

Downloads: 0

{"_id":"xTidokLcnGvJThQ4C","bibbaseid":"anonymous-alempelzivtextindexonsecondarystorage-2007","downloads":0,"creationDate":"2017-03-31T19:29:34.767Z","title":"A Lempel-Ziv text index on secondary storage","author_short":null,"year":2007,"bibtype":"inproceedings","biburl":"https://1fichier.com/?j9cpurkmnv","bibdata":{"bibtype":"inproceedings","type":"inproceedings","abstract":"Full-text searching consists in locating the occurrences of a given pattern P[1..m] in a text T[1..u], both sequences over an alphabet of size σ. In this paper we define a new index for full-text searching on secondary storage, based on the Lempel-Ziv compression algorithm and requiring 8uHk + o(u log σ) bits of space, where Hk denotes the k-th order empirical entropy of T, for any k = o(logσ u). Our experimental results show that our index is significantly smaller than any other practical secondary-memory data structure: 1.4-2.3 times the text size including the text, which means 39%-65% the size of traditional indexes like String B-trees [Ferragina and Grossi, JACM 1999]. In exchange, our index requires more disk access to locate the pattern occurrences. Our index is able to report up to 600 occurrences per disk access, for a disk page of 32 kilobytes. If we only need to count pattern occurrences, the space can be reduced to about 1.04-1.68 times the text size, requiring about 20-60 disk accesses, depending on the pattern length. © Springer-Verlag Berlin Heidelberg 2007.","year":"2007","title":"A Lempel-Ziv text index on secondary storage","volume":"4580 LNCS","pages":"83-94","booktitle":"Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)","bibtex":"@inproceedings{37849014688,\n abstract = \"Full-text searching consists in locating the occurrences of a given pattern P[1..m] in a text T[1..u], both sequences over an alphabet of size σ. In this paper we define a new index for full-text searching on secondary storage, based on the Lempel-Ziv compression algorithm and requiring 8uHk + o(u log σ) bits of space, where Hk denotes the k-th order empirical entropy of T, for any k = o(logσ u). Our experimental results show that our index is significantly smaller than any other practical secondary-memory data structure: 1.4-2.3 times the text size including the text, which means 39\\%-65\\% the size of traditional indexes like String B-trees [Ferragina and Grossi, JACM 1999]. In exchange, our index requires more disk access to locate the pattern occurrences. Our index is able to report up to 600 occurrences per disk access, for a disk page of 32 kilobytes. If we only need to count pattern occurrences, the space can be reduced to about 1.04-1.68 times the text size, requiring about 20-60 disk accesses, depending on the pattern length. © Springer-Verlag Berlin Heidelberg 2007.\",\n year = \"2007\",\n title = \"A Lempel-Ziv text index on secondary storage\",\n volume = \"4580 LNCS\",\n pages = \"83-94\",\n booktitle = \"Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)\"\n}\n\n","key":"37849014688","id":"37849014688","bibbaseid":"anonymous-alempelzivtextindexonsecondarystorage-2007","urls":{},"downloads":0,"html":""},"search_terms":["lempel","ziv","text","index","secondary","storage"],"keywords":[],"authorIDs":[],"dataSources":["gKiCRHjjC2iGthGEx"]}