Indexing text using the Ziv-Lempel trie

Navarro G.

Keywords: systems, compression, structures, database, recognition, algorithms, text, pattern, language, data, languages, query, character, processing, matching, natural, Compressed, Self-indexing, Succinct

Abstract

Let a text of ? characters over an alphabet of size ? be compressible to n phrases by the LZ78 algorithm. We show how to build a data structure based on the Ziv-Lempel trie, called the LZ-index, that takes 4n log2 n(1 +o(1)) bits of space (that is, 4 times the entropy of the text for ergodic sources) and reports the R occurrences of a pattern of length m in worst case time O(m3 log2 + (m + R) log n). We present a practical implementation of the LZ-index, which is faster than current alternatives when we take into consideration the time to report the positions or text contexts of the occurrences found. © 2003 Elsevier B.V. All rights reserved.

Más información

Título según SCOPUS: Indexing text using the Ziv-Lempel trie
Título de la Revista: JOURNAL OF DISCRETE ALGORITHMS
Volumen: 2
Número: 1 SPEC. ISS.
Editorial: ELSEVIER SCIENCE BV
Fecha de publicación: 2004
Página de inicio: 87
Página final: 114
Idioma: English
URL: http://www.scopus.com/inward/record.url?eid=2-s2.0-10644236411&partnerID=q2rCbXpz
DOI:

10.1016/S1570-8667(03)00066-2

Notas: SCOPUS