Computing MEMs on Repetitive Text Collections

Abstract

We consider the problem of computing the Maximal Exact Matches (MEMs) of a given pattern P[1 . . m] on a large repetitive text collection T[1 . . n], which is represented as a (hopefully much smaller) run-length context-free grammar of size grl. We show that the problem can be solved in time O(m2 logϵ n), for any constant ϵ > 0, on a data structure of size O(grl). Further, on a locally consistent grammar of size O(δ log nδ ), the time decreases to O(m log m(log m + logϵ n)). The value δ is a function of the substring complexity of T and Ω(δ log nδ ) is a tight lower bound on the compressibility of repetitive texts T, so our structure has optimal size in terms of n and δ.

Más información

Título según SCOPUS: ID SCOPUS_ID:85165193289 Not found in local SCOPUS DB
Título de la Revista: Leibniz International Proceedings in Informatics, LIPIcs
Volumen: 259
Fecha de publicación: 2023
DOI:

10.4230/LIPICS.CPM.2023.24

Notas: SCOPUS