Indexing text with approximate q-grams

Navarro G.; Sutinen E.; Tanninen, J; Tarhio J.

Abstract

We present a new index for approximate string matching. The index collects text q-samples, i.e. disjoint text substrings of length q, at fixed intervals and stores their positions. At search time, part of the text is filtered out by noticing that any occurrence of the pattern must be reflected in the presence of some text q-samples that match approximately inside the pattern. We show experimentally that the parameterization mechanism of the related filtration scheme provides a compromise between the space requirement of the index and the error level for which the filtration is still efficient.

Más información

Título según WOS: Indexing text with approximate q-grams
Título de la Revista: STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2020
Volumen: 1848
Editorial: SPRINGER INTERNATIONAL PUBLISHING AG
Fecha de publicación: 2000
Página de inicio: 350
Página final: 363
Idioma: English
Notas: ISI