Optimal-Time Dictionary-Compressed Indexes

Christiansen, Anders Roy; Ettienne, Mikko Berggren

Abstract

--- - We describe the first self-indexes able to count and locate pattern occurrences in optimal time within a space bounded by the size of the most popular dictionary compressors. To achieve this result, we combine several recent findings, including string attractors-new combinatorial objects encompassing most known compressibility measures for highly repetitive texts-and grammars based on locally consistent parsing. - More in detail, let gamma be the size of the smallest attractor for a text T of length n. The measure y is an (asymptotic) lower bound to the size of dictionary compressors based on Lempel-Ziv, context-free grammars, and many others. The smallest known text representations in terms of attractors use space O(gamma log(n/gamma)), and our lightest indexes work within the same asymptotic space. Let epsilon > 0 be a suitably small constant fixed at construction time, m be the pattern length, and occ be the number of its text occurrences. Our index counts pattern occurrences in O(m + log(2+epsilon) n) time and locates them in O(m + (occ + 1) log(epsilon) n) time. These times already outperform those of most dictionary-compressed indexes, while obtaining the least asymptotic space for any index searching within O((m + occ) polylog n) time. Further, by increasing the space to O(gamma log(n/gamma) log(epsilon) n), we reduce the locating time to the optimal O(m + occ), and within O(gamma log(n/gamma) log n) space we can also count in optimal O(m) time. No dictionary-compressed index had obtained this time before. All our indexes can be constructed in O(n) space and O(n log n) expected time. - As a by-product of independent interest, we show how to build, in O(n) expected time and without knowing the size y of the smallest attractor (which is NP-hard to find), a run-length context-free grammar of size O(gamma log(n/gamma)) generating (only) T. As a result, our indexes can be built without knowing gamma.

Más información

Título según WOS: Optimal-Time Dictionary-Compressed Indexes
Título de la Revista: ACM TRANSACTIONS ON ALGORITHMS
Volumen: 17
Número: 1
Editorial: ASSOC COMPUTING MACHINERY
Fecha de publicación: 2021
DOI:

10.1145/3426473

Notas: ISI