A compressed text index on secondary memory

González R.; Navarro G.

Keywords: access, text, arrays, secondary, classical, counterpart, memories, Random, Suffix, Indices, High-order, Disk-based

Abstract

We introduce a practical disk-based compressed text index that, when the text is compressible, takes much less space than the suffix array. It provides good I/O times for searching, which in particular improve when the text is compressible. In this aspect our index is unique, as most compressed indexes are slower than their classical counterparts on secondary memory. We analyze our index and show experimentally that it is extremely competitive on compressible texts. As side contributions, we introduce a compressed rank dictionary for secondary memory operating in one I/O access, as well as a simple encoding of sequences that achieves high-order compression and provides constant-time random access, both in main and secondary memory.

Más información

Título de la Revista: Journal of Combinatorial Mathematics and Combinatorial Computing
Volumen: 71
Editorial: Charles Babbage Research Centre
Fecha de publicación: 2009
Página de inicio: 127
Página final: 154
URL: http://www.scopus.com/inward/record.url?eid=2-s2.0-78651561359&partnerID=q2rCbXpz