Distributed text search using suffix arrays

Arroyuelo, Diego; Bonacic, Carolina; Gil-Costa, Verónica; Marin, Mauricio; Navarro, Gonzalo

Abstract

Text search is a classical problem in Computer Science, with many data-intensive applications. For this problem, suffix arrays are among the most widely known and used data structures, enabling fast searches for phrases, terms, substrings and regular expressions in large texts. Potential application domains for these operations include large-scale search services, such as Web search engines, where it is necessary to efficiently process intensive-traffic streams of on-line queries. This paper proposes strategies to enable such services by means of suffix arrays. We introduce techniques for deploying suffix arrays on clusters of distributed-memory processors and then study the processing of multiple queries on the distributed data structure. Even though the cost of individual search operations in sequential (non-distributed) suffix arrays is low in practice, the problem of processing multiple queries on distributed-memory systems, so that hardware resources are used efficiently, is relevant to services aimed at achieving high query throughput at low operational costs. Our theoretical and experimental performance studies show that our proposals are suitable solutions for building efficient and scalable on-line search services based on suffix arrays. (C) 2014 Elsevier B.V. All rights reserved.

Más información

Título según WOS: Distributed text search using suffix arrays
Título de la Revista: PARALLEL COMPUTING
Volumen: 40
Número: 9
Editorial: ELSEVIER SCIENCE BV
Fecha de publicación: 2014
Página de inicio: 471
Página final: 495
Idioma: English
DOI:

10.1016/j.parco.2014.06.007

Notas: ISI - ISI