Mitigating the effect of dataset shift in clustering

Maldonado, Sebastian; Saltos, Ramiro; Vairetti, Carla; Delpiano, Jose

Abstract

Dataset shift is a relevant topic in unsupervised learning since many applications face evolving environ-ments, causing an important loss of generalization and performance. Most techniques that deal with this issue are designed for data stream clustering, whose goal is to process sequences of data efficiently under Big Data. In this study, we claim dataset shift is an issue for static clustering tasks in which data is col-lected over a long period. To mitigate it, we propose Time-weighted kernel k-means, a k-means variant that includes a time-dependent weighting process. We do this via the induced ordered weighted average (IOWA) operator. The weighting process acts as a gradual forgetting mechanism, prioritizing recent exam-ples over outdated ones in the clustering algorithm. The computational experiments show the potential Time-weighted kernel k-means has in evolving environments.(c) 2022 Elsevier Ltd. All rights reserved.

Más información

Título según WOS: ID WOS:000870845200008 Not found in local WOS DB
Título de la Revista: PATTERN RECOGNITION
Volumen: 134
Editorial: ELSEVIER SCI LTD
Fecha de publicación: 2023
DOI:

10.1016/j.patcog.2022.109058

Notas: ISI