Contextual Pattern Matching
Abstract
--- - The research on indexing repetitive string collections has focused on the same search problems used for regular string collections, though they can make little sense in this scenario. For example, the basic pattern matching query "list all the positions where pattern P appears" can produce huge outputs when P appears in an area shared by many documents. All those occurrences are essentially the same. - In this paper we propose a new query that can be more appropriate in these collections, which we call contextual pattern matching. The basic query of this type gives, in addition to P, a context length l, and asks to report the occurrences of all distinct strings XPY, with vertical bar X vertical bar = vertical bar Y vertical bar = l. While this query is easily solved in optimal time and linear space, we focus on using space related to the repetitiveness of the text collection and present the first solution of this kind. Letting (r) over bar be the maximum of the number of runs in the BWT of the text T[1..n] and of its reverse, our structure uses O((r) over bar log(n/(r) over bar)) space and finds the c contextual occurrences XPY of (P, l) in time O(vertical bar P vertical bar log log n + c log n). We give other space/time tradeoffs as well, for compressed and uncompressed indexes.
Más información
Título según WOS: | ID WOS:001344660300001 Not found in local WOS DB |
Título de la Revista: | STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2020 |
Volumen: | 12303 |
Editorial: | SPRINGER INTERNATIONAL PUBLISHING AG |
Fecha de publicación: | 2020 |
Página de inicio: | 3 |
Página final: | 10 |
DOI: |
10.1007/978-3-030-59212-7_1 |
Notas: | ISI |