Improved compressed indexes for full-text document retrieval
Keywords: information, length, frequencies, index, arbitrary, query, retrieval, processing, documents, Functions, Total, Document, Answer, Full-text, Hash
Abstract
We give new space/time tradeoffs for compressed indexes that answer document retrieval queries on general sequences. On a collection of D documents of total length n, current approaches require at least or |CSA| + O(n/lg D/lg lg D) or 2 |CSA| + o(n) bits of space, where CSA is a full-text index. Using monotone minimum perfect hash functions, we give new algorithms for document listing with frequencies and top-k document retrieval using just |CSA| + O(n lg lg lg D) bits. We also improve current solutions that use 2|CSA| + o(n) bits, and consider other problems such as colored range listing, top-k most important documents, and computing arbitrary frequencies. © 2011 Springer-Verlag.
Más información
Título de la Revista: | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Volumen: | 7024 |
Editorial: | Society of Laparoendoscopic Surgeons |
Fecha de publicación: | 2011 |
Página de inicio: | 386 |
Página final: | 397 |
URL: | http://www.scopus.com/inward/record.url?eid=2-s2.0-80053949092&partnerID=q2rCbXpz |