Top-k document retrieval in optimal time and linear space

Navarro G.; Nekrich Y.

Keywords: distance, patterns, size, structures, optimization, time, algorithms, spaces, data, relevance, relevant, query, retrieval, measure, documents, word, Linear, Optimal, minimal, Document, Alphabet

Abstract

We describe a data structure that uses O (n)-word space and reports k most relevant documents that contain a query pattern P in optimal O(|P| + k) time. Our construction supports an ample set of important relevance measures, such as the frequency of P in a document and the minimal distance between two occurrences of P in a document. We show how to reduce the space of the data structure from O(n log n) to O (n(log ? + log D + log log n)) bits, where ? is the alphabet size and D is the total number of documents. Copyright © SIAM.

Más información

Título de la Revista: Unknown (9781611972108)
Editorial: Unknown
Fecha de publicación: 2012
Página de inicio: 1066
Página final: 1077
URL: http://www.scopus.com/inward/record.url?eid=2-s2.0-84860191488&partnerID=40&md5=fe7896298caf678462e7e79909e8112e