Indexing Highly Repetitive String Collections, Part I: Repetitiveness Measures
Abstract
Two decades ago, a breakthrough in indexing string collections made it possible to represent them within their compressed space while at the same time offering indexed search functionalities. As this new technology permeated through applications like bioinformatics, the string collections experienced a growth that outperforms Moore's Law and challenges our ability to handle them even in compressed form. It turns out, fortunately, that many of these rapidly growing string collections are highly repetitive, so that their information content is orders of magnitude lower than their plain size. The statistical compression methods used for classical collections, however, are blind to this repetitiveness, and therefore a new set of techniques has been developed to properly exploit it. The resulting indexes form a new generation of data structures able to handle the huge repetitive string collections that we are facing. In this survey, formed by two parts, we cover the algorithmic developments that have led to these data structures.
Más información
| Título según WOS: | Indexing Highly Repetitive String Collections, Part I: Repetitiveness Measures |
| Título de la Revista: | ACM COMPUTING SURVEYS |
| Volumen: | 54 |
| Número: | 2 |
| Editorial: | ASSOC COMPUTING MACHINERY |
| Fecha de publicación: | 2021 |
| DOI: |
10.1145/3434399 |
| Notas: | ISI |