Computing MEMs on Repetitive Text Collections
Abstract
We consider the problem of computing the Maximal Exact Matches (MEMs) of a given pattern P[1 . . m] on a large repetitive text collection T[1 . . n], which is represented as a (hopefully much smaller) run-length context-free grammar of size grl. We show that the problem can be solved in time O(m2 logϵ n), for any constant ϵ > 0, on a data structure of size O(grl). Further, on a locally consistent grammar of size O(δ log nδ ), the time decreases to O(m log m(log m + logϵ n)). The value δ is a function of the substring complexity of T and Ω(δ log nδ ) is a tight lower bound on the compressibility of repetitive texts T, so our structure has optimal size in terms of n and δ.
Más información
Título según SCOPUS: | ID SCOPUS_ID:85165193289 Not found in local SCOPUS DB |
Título de la Revista: | Leibniz International Proceedings in Informatics, LIPIcs |
Volumen: | 259 |
Fecha de publicación: | 2023 |
DOI: |
10.4230/LIPICS.CPM.2023.24 |
Notas: | SCOPUS |