On the reproducibility of experiments of indexing repetitive document collections

Fariña, Antonio; Martínez-Prieto, Miguel A.; Claude, Francisco; Navarro, Gonzalo; Lastra-Díaz, Juan J.; Prezza, Nicola; Seco, Diego

Abstract

This work introduces a companion reproducible paper with the aim of allowing the exact replication of the methods, experiments, and results discussed in a previous work Claude et al., (2016). In that parent paper, we proposed many and varied techniques for compressing indexes which exploit that highly repetitive collections are formed mostly of documents that are near-copies of others. More concretely, we describe a replication framework, called uiHRDC (universal indexes for Highly Repetitive Document Collections), that allows our original experimental setup to be easily replicated using various document collections. The corresponding experimentation is carefully explained, providing precise details about the parameters that can be tuned for each indexing solution. Finally, note that we also provide uiHRDC as reproducibility package. (C) 2019 Elsevier Ltd. All rights reserved.

Más información

Título según WOS: On the reproducibility of experiments of indexing repetitive document collections
Título según SCOPUS: On the reproducibility of experiments of indexing repetitive document collections
Título de la Revista: INFORMATION SYSTEMS
Volumen: 83
Editorial: PERGAMON-ELSEVIER SCIENCE LTD
Fecha de publicación: 2019
Página de inicio: 181
Página final: 194
Idioma: English
DOI:

10.1016/j.is.2019.03.007

Notas: ISI, SCOPUS - ISI, SCOPUS