Batch and online anomaly detection for scientific applications in a Kubernetes environment
Abstract
We present a cloud based anomaly detection service framework that uses a containerized Spark cluster and ancillary user interfaces all managed by Kubernetes. The stack of technology put together allows for fast, reliable, resilient and easily scalable service for either batch or streaming data. At the heart of the service, we utilize an improved version of the algorithm Isolation Forest called Extended Isolation Forest for robust and efficient anomaly detection. We showcase the design and a normal workflow of our infrastructure which is ready to deploy on any Kubernetes cluster without extra technical knowledge. With exposed APIs and simple graphical interfaces, users can load any data and detect anomalies on the loaded set or on newly presented data points using a batch or a streaming mode. With the latter, users can subscribe and get notifications on the desired output. Our aim is to develop and apply these techniques to use with scientific data. In particular we are interested in finding anomalous objects within the overwhelming set of images and catalogs produced by current and future astronomical surveys, but that can be easily adopted to other fields.
Más información
Título según WOS: | ID WOS:000473395700003 Not found in local WOS DB |
Título de la Revista: | PROCEEDINGS OF THE ACM WORKSHOP ON SCIENTIFIC CLOUD COMPUTING (SCIENCECLOUD'18) |
Editorial: | ASSOC COMPUTING MACHINERY |
Fecha de publicación: | 2018 |
DOI: |
10.1145/3217880.3217883 |
Notas: | ISI |