A compressed text index on secondary memory
Keywords: access, text, arrays, secondary, classical, counterpart, memories, Random, Suffix, Indices, High-order, Disk-based
Abstract
We introduce a practical disk-based compressed text index that, when the text is compressible, takes much less space than the suffix array. It provides good I/O times for searching, which in particular improve when the text is compressible. In this aspect our index is unique, as most compressed indexes are slower than their classical counterparts on secondary memory. We analyze our index and show experimentally that it is extremely competitive on compressible texts. As side contributions, we introduce a compressed rank dictionary for secondary memory operating in one I/O access, as well as a simple encoding of sequences that achieves high-order compression and provides constant-time random access, both in main and secondary memory.
Más información
Título de la Revista: | Journal of Combinatorial Mathematics and Combinatorial Computing |
Volumen: | 71 |
Editorial: | Charles Babbage Research Centre |
Fecha de publicación: | 2009 |
Página de inicio: | 127 |
Página final: | 154 |
URL: | http://www.scopus.com/inward/record.url?eid=2-s2.0-78651561359&partnerID=q2rCbXpz |