Space-Efficient Computation of the Burrows-Wheeler Transform
Abstract
The Burrows-Wheeler Transform (BWT) has become an essential tool for compressed text indexing. Computing it efficiently and within little space is essential for the practicality of the indexes that build on it. A recent algorithm (Munro, Navarro & Nekrich, SODA 2017) computes the BWT in O(n) time using O(nlgs) bits of space for a text of length n over an alphabet of size sigma. The result is of theoretical nature and its practicality is far from obvious. In this paper we engineer their solution and show that, while a basic implementation is slow in practice, the algorithm is amenable to parallelization. For a wide range of alphabet sizes, our resulting implementation outperforms all the compact constructions in the space/time tradeoff map. On the smallest alphabets we are outperformed in time, but nevertheless achieve the least space within reasonable time. For example, in DNA sequences, the most widely used application of BWTs, our construction uses 4.84 bits per base and builds the BWT at a rate of 2.13 megabases per second, whereas the closest previous alternative uses around 7.09 bits per base and runs at 4.17 megabases per second.
Más información
Título según WOS: | Space-Efficient Computation of the Burrows-Wheeler Transform |
Título según SCOPUS: | Space-Efficient Computation of the Burrows-Wheeler Transform |
Título de la Revista: | 2021 DATA COMPRESSION CONFERENCE (DCC 2021) |
Editorial: | IEEE COMPUTER SOC |
Fecha de publicación: | 2019 |
Página de inicio: | 132 |
Página final: | 141 |
Idioma: | English |
DOI: |
10.1109/DCC.2019.00021 |
Notas: | ISI, SCOPUS |