Bi-Directional r-Indexes
Abstract
Indexing highly repetitive texts is important in fields such as bioinformatics and versioned repositories. The run-length compression of the Burrows-Wheeler transform (BWT) provides a compressed representation particularly well-suited to text indexing. The r-index is one such index. It enables fast locating of occurrences of a pattern within O(r) words of space, where r is the number of equal-letter runs in the BWT. Its mechanism of locating is to maintain one suffix array sample along the backward-search of the pattern, and to compute all the pattern positions from that sample once the backward-search is complete. In this paper we develop this algorithm further, and propose a new bi-directional text index called the br-index, which supports extending the matched pattern both in forward and backward directions, and locating the occurrences of the pattern at any step of the search, within O(r + rR) words of space, where rR is the number of equal-letter runs in the BWT of the reversed text. Our experiments show that the br-index captures the long repetitions of the text, and outperforms the existing indexes in text searching allowing some mismatches except in an internal part.
Más información
Título según SCOPUS: | Bi-Directional r-Indexes |
Título de la Revista: | Leibniz International Proceedings in Informatics, LIPIcs |
Volumen: | 223 |
Fecha de publicación: | 2022 |
DOI: |
10.4230/LIPICS.CPM.2022.11 |
Notas: | SCOPUS |