An LMS-Based Grammar Self-index with Local Consistency Properties

Abstract

A grammar self-index of a text T (Claude et al. 2012) consists of a grammar G that only produces T and a geometric data structure that indexes the string cuts of the right-hand sides of G ’s rules. This representation uses space proportional to G, the size of the grammar, which is small when the text is repetitive. However, the index is slow for matching long patterns; it finds the occ occurrences of a pattern P[1.m] in O((m2+ occ) log G) time. The most expensive part is a set of binary searches for the different cuts P[ 1. j] P[ j+ 1. m] in the geometric data structure. Christiansen et al. 2010 solved this problem by building a locally consistent grammar that only searches for O(log m) cuts of P. Their representation, however, requires significant extra space (tough still in O(G)) to store a set of permutations for the nonterminal symbols. In this work, we propose another locally consistent grammar that builds on the idea of LMS substrings (Nong et al. 2009). Our grammar also requires to try O(log m) cuts when searching for P, but it does not need to store permutations. As a result, we obtain a self-index that searches in time O((mlog m+ occ) log G) and is of practical size. Our experiments showed that our index is faster than previous grammar-based indexes at the price of increasing the space by a 1.8x factor on average. Other experimental results showed that our data structure becomes convenient when the patterns to search for are long.

Más información

Título según SCOPUS: An LMS-Based Grammar Self-index with Local Consistency Properties
Título de la Revista: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volumen: 12944
Editorial: Springer Science and Business Media Deutschland GmbH
Fecha de publicación: 2021
Página final: 113
Idioma: English
DOI:

10.1007/978-3-030-86692-1_9

Notas: SCOPUS