Indexing text using the Ziv-Lempel trie
Keywords: systems, compression, structures, database, recognition, algorithms, text, pattern, language, data, languages, query, character, processing, matching, natural, Compressed, Self-indexing, Succinct
Abstract
Let a text of ? characters over an alphabet of size ? be compressible to n phrases by the LZ78 algorithm. We show how to build a data structure based on the Ziv-Lempel trie, called the LZ-index, that takes 4n log2 n(1 +o(1)) bits of space (that is, 4 times the entropy of the text for ergodic sources) and reports the R occurrences of a pattern of length m in worst case time O(m3 log2 + (m + R) log n). We present a practical implementation of the LZ-index, which is faster than current alternatives when we take into consideration the time to report the positions or text contexts of the occurrences found. © 2003 Elsevier B.V. All rights reserved.
Más información
Título según SCOPUS: | Indexing text using the Ziv-Lempel trie |
Título de la Revista: | JOURNAL OF DISCRETE ALGORITHMS |
Volumen: | 2 |
Número: | 1 SPEC. ISS. |
Editorial: | ELSEVIER SCIENCE BV |
Fecha de publicación: | 2004 |
Página de inicio: | 87 |
Página final: | 114 |
Idioma: | English |
URL: | http://www.scopus.com/inward/record.url?eid=2-s2.0-10644236411&partnerID=q2rCbXpz |
DOI: |
10.1016/S1570-8667(03)00066-2 |
Notas: | SCOPUS |