Bi-Directional r-Indexes

Arakawa, Yuma; Sadakane, Kunihiko

Abstract

Indexing highly repetitive texts is important in fields such as bioinformatics and versioned repositories. The run-length compression of the Burrows-Wheeler transform (BWT) provides a compressed representation particularly well-suited to text indexing. The r-index is one such index. It enables fast locating of occurrences of a pattern within O(r) words of space, where r is the number of equal-letter runs in the BWT. Its mechanism of locating is to maintain one suffix array sample along the backward-search of the pattern, and to compute all the pattern positions from that sample once the backward-search is complete. In this paper we develop this algorithm further, and propose a new bi-directional text index called the br-index, which supports extending the matched pattern both in forward and backward directions, and locating the occurrences of the pattern at any step of the search, within O(r + rR) words of space, where rR is the number of equal-letter runs in the BWT of the reversed text. Our experiments show that the br-index captures the long repetitions of the text, and outperforms the existing indexes in text searching allowing some mismatches except in an internal part.

Más información

Título según SCOPUS: Bi-Directional r-Indexes
Título de la Revista: Leibniz International Proceedings in Informatics, LIPIcs
Volumen: 223
Fecha de publicación: 2022
DOI:

10.4230/LIPICS.CPM.2022.11

Notas: SCOPUS