Improved compressed indexes for full-text document retrieval

Belazzougui D.; Navarro G.

Keywords: information, length, frequencies, index, arbitrary, query, retrieval, processing, documents, Functions, Total, Document, Answer, Full-text, Hash

Abstract

We give new space/time tradeoffs for compressed indexes that answer document retrieval queries on general sequences. On a collection of D documents of total length n, current approaches require at least or |CSA| + O(n/lg D/lg lg D) or 2 |CSA| + o(n) bits of space, where CSA is a full-text index. Using monotone minimum perfect hash functions, we give new algorithms for document listing with frequencies and top-k document retrieval using just |CSA| + O(n lg lg lg D) bits. We also improve current solutions that use 2|CSA| + o(n) bits, and consider other problems such as colored range listing, top-k most important documents, and computing arbitrary frequencies. © 2011 Springer-Verlag.

Más información

Título de la Revista: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volumen: 7024
Editorial: Society of Laparoendoscopic Surgeons
Fecha de publicación: 2011
Página de inicio: 386
Página final: 397
URL: http://www.scopus.com/inward/record.url?eid=2-s2.0-80053949092&partnerID=q2rCbXpz