Computing MEMs on Repetitive Text Collections

Abstract

We consider the problem of computing the Maximal Exact Matches (MEMs) of a given pattern P[1 . . m] on a large repetitive text collection T[1 . . n], which is represented as a (hopefully much smaller) run-length context-free grammar of size grl. We show that the problem can be solved in time O(m2 log? n), for any constant ? > 0, on a data structure of size O(grl). Further, on a locally consistent grammar of size O(? log n? ), the time decreases to O(m log m(log m + log? n)). The value ? is a function of the substring complexity of T and ?(? log n? ) is a tight lower bound on the compressibility of repetitive texts T, so our structure has optimal size in terms of n and ?. © 2023 Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. All rights reserved.

Más información

Título según SCOPUS: Computing MEMs on Repetitive Text Collections
Título de la Revista: Leibniz International Proceedings in Informatics, LIPIcs
Volumen: 259
Editorial: Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
Fecha de publicación: 2023
Idioma: English
DOI:

10.4230/LIPIcs.CPM.2023.24

Notas: SCOPUS