Out-of-time cross-validation strategies for classification in the presence of dataset shift

Maldonado, Sebastian; Lopez, Julio; Iturriaga, Andres

Abstract

© 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.Model selection is a highly important step in the process of extracting knowledge from datasets. This is usually done via partitioning strategies such as cross-validation in which the training and test subsets are selected randomly. However, it has been suggested in the literature that this is not the best approach in changing environments due to the risk of data obsolescence. This paper proposes novel out-of-time cross-validation mechanisms for model selection and evaluation designed for binary classification. Our approach extends the reasoning behind the rolling forecasting origin method for time-series analysis, providing an effective methodology for obtaining the prequential performance of a classifier on an out-of-time test sample. Our proposed method also includes a forgetting mechanism for identifying outdated samples that should be ignored in model training. Experiments on simulated and real-world datasets demonstrate the virtues of our approach in relation to various well-known validation strategies.

Más información

Título según WOS: Out-of-time cross-validation strategies for classification in the presence of dataset shift
Título de la Revista: APPLIED INTELLIGENCE
Volumen: 52
Número: 5
Editorial: Springer
Fecha de publicación: 2022
Página de inicio: 5770
Página final: 5783
DOI:

10.1007/S10489-021-02735-2

Notas: ISI