Relative Lempel-Ziv with Constant-Time Random Access

Ferrada, Hector; Gagie, Travis; Gog, Simon; Puglisi, Simon J.; Moura, E; Crochemore, M

Abstract

Relative Lempel-Ziv (RLZ) is a variant of LZ77 that can compress well collections of similar genomes while still allowing fast random access to them. In theory, at the cost of using sublinear extra space, accessing an arbitrary character takes constant time. We show that even in practice this works quite well: e.g., we can compress 36 S. cerevisiae genomes from a total of 464 MB to 11 MB and still support random access to them in under 50 nanoseconds per character, even when the accessed substrings are short. Our theoretical contribution is an optimized representation of RLZ's pointers.

Más información

Título según WOS: Relative Lempel-Ziv with Constant-Time Random Access
Título de la Revista: BIO-INSPIRED SYSTEMS AND APPLICATIONS: FROM ROBOTICS TO AMBIENT INTELLIGENCE, PT II
Volumen: 8799
Editorial: SPRINGER INTERNATIONAL PUBLISHING AG
Fecha de publicación: 2014
Página de inicio: 13
Página final: 17
Notas: ISI