Practical and flexible pattern matching over Ziv-Lempel compressed text

Navarro G.; Raffinot M.

Keywords: systems, compression, database, algorithms, approximation, pattern, bit-parallelism, data, theory, parallel, string, processing, matching, problem, solving, Compressed, Ziv-Lempel, Abstracting

Abstract

We address the problem of string matching on Ziv-Lempel compressed text. The goal is to search for a pattern in a text without uncompressing it. This is a highly relevant issue to keep compressed text databases where efficient searching is still possible. We develop a general technique for string matching when the text comes as a sequence of blocks. This abstracts the essential features of Ziv-Lempel compression. We then apply the scheme to each particular type of compression. We present an algorithm to find all the matches of a pattern in a text compressed using LZ77. When we apply our scheme to LZ78, we obtain a much more efficient search algorithm, which is faster than uncompressing the text and then searching it. Finally, we propose a new hybrid compression scheme which is between LZ77 and LZ78, being in practice as good to compress as LZ77 and as fast to search as LZ78. We show also how to search for some extended patterns on Ziv-Lempel compressed text, such as classes of characters and approximate string matching. © 2003 Elsevier B.V. All rights reserved.

Más información

Título según SCOPUS: Practical and flexible pattern matching over Ziv-Lempel compressed text
Título de la Revista: JOURNAL OF DISCRETE ALGORITHMS
Volumen: 2
Número: 3
Editorial: ELSEVIER SCIENCE BV
Fecha de publicación: 2004
Página de inicio: 347
Página final: 371
URL: http://www.scopus.com/inward/record.url?eid=2-s2.0-10644267814&partnerID=q2rCbXpz
Notas: SCOPUS