Faster Top-k Document Retrieval in Optimal Space
Abstract
We consider the problem of retrieving the k documents from a collection of strings where a given pattern P appears most often. We show that, by representing the collection using a Compressed Suffix Array CSA, a data structure using the asymptotically optimal vertical bar CSA vertical bar + o(n) bits can answer queries in the time needed by CSA to find the suffix array interval of the pattern plus O( k lg(2) k lg(epsilon) n) accesses to suffix array cells, for any constant epsilon > 0. This is lg n/lg k times faster than the only previous solution using optimal space, lg k times slower than the fastest structure that uses twice the space, and lg 2 k lg(epsilon) n times the lower-bound cost of obtaining k document identifiers from the CSA. To obtain the result we introduce a tool called the sampled document array, which can be of independent interest.
Más información
| Título según WOS: | Faster Top-k Document Retrieval in Optimal Space |
| Título de la Revista: | GAMES AND LEARNING ALLIANCE, GALA 2024 |
| Volumen: | 8214 |
| Editorial: | SPRINGER INTERNATIONAL PUBLISHING AG |
| Fecha de publicación: | 2013 |
| Página de inicio: | 255 |
| Página final: | 262 |
| Idioma: | English |
| Notas: | ISI |