Reducing the space requirement of LZ-index. Sadakane, K., Arroyuelo, D., & Navarro, G. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), volume 4009 LNCS, pages 318-329, 2006.
doi  abstract   bibtex   
The LZ-index is a compressed full-text self-index able to represent a text T1...u, over an alphabet of size σ = O(polylog(u)) and with k-th order empirical entropy Hk(T), using 4uHk(T) + o(u log σ) bits for any k = o(logσ u). It can report all the occ occurrences of a pattern P1...u in T in O(m3 log σ + (m + occ) log u) worst case time. Its main drawback is the factor 4 in its space complexity, which makes it larger than other state-of-the-art alternatives. In this paper we present two different approaches to reduce the space requirement of LZ-index. In both cases we achieve (2 + ε)uH k(T) + o(u log σ) bits of space, for any constant ε > 0, and we simultaneously improve the search time to O(m2 log m + (m + occ) log u). Both indexes support displaying any subtext of length ℓ in optimal O(ℓ/ logσ u) time. In addition, we show how the space can be squeezed to (1 + ε)uHk(T) + o(u log σ) to obtain a structure with O(m2) average search time for m ≥ 2 logσ u. © Springer-Verlag Berlin Heidelberg 2006.
@inproceedings{10.1007/11780441_29,
    abstract = "The LZ-index is a compressed full-text self-index able to represent a text T<inf>1...u</inf>, over an alphabet of size σ = O(polylog(u)) and with k-th order empirical entropy H<inf>k</inf>(T), using 4uH<inf>k</inf>(T) + o(u log σ) bits for any k = o(log<inf>σ</inf> u). It can report all the occ occurrences of a pattern P<inf>1...u</inf> in T in O(m<sup>3</sup> log σ + (m + occ) log u) worst case time. Its main drawback is the factor 4 in its space complexity, which makes it larger than other state-of-the-art alternatives. In this paper we present two different approaches to reduce the space requirement of LZ-index. In both cases we achieve (2 + ε)uH <inf>k</inf>(T) + o(u log σ) bits of space, for any constant ε > 0, and we simultaneously improve the search time to O(m<sup>2</sup> log m + (m + occ) log u). Both indexes support displaying any subtext of length ℓ in optimal O(ℓ/ log<inf>σ</inf> u) time. In addition, we show how the space can be squeezed to (1 + ε)uH<inf>k</inf>(T) + o(u log σ) to obtain a structure with O(m<sup>2</sup>) average search time for m ≥ 2 log<inf>σ</inf> u. © Springer-Verlag Berlin Heidelberg 2006.",
    year = "2006",
    title = "Reducing the space requirement of LZ-index",
    volume = "4009 LNCS",
    pages = "318-329",
    doi = "10.1007/11780441\_29",
    booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
    author = "Sadakane, Kunihiko and Arroyuelo, Diego and Navarro, Gonzalo"
}

Downloads: 0