Unsupervised anomaly detection in large databases using Bayesian networks

Cansado, A; Soto A.

Abstract

Today, there has been a massive proliferation of huge databases storing valuable information. The opportunities of an effective use of these new data sources are enormous; however, the huge size and dimensionality of current large databases calls for new ideas to scale up current statistical and computational approaches. This article presents an application of artificial intelligence technology to the problem of automatic detection of candidate anomalous records in a large database. We build our approach with three main goals in mind: 1) an effective detection of the records that are potentially anomalous; 2) a suitable selection of the subset of attributes that explains what makes a record anomalous; and 3) an efficient implementation that allows us to scale the approach to large databases. Our algorithm, called Bayesian network anomaly detector (BNAD), uses the joint probability density function (pdf) provided by a Bayesian network (BN) to achieve these goals. By using appropriate data structures, advanced caching techniques, the flexibility of Gaussian mixture models, and the efficiency of BNs to model joint pdfs, BNAD manages to efficiently learn a suitable BN from a large dataset. We test BNAD using synthetic and real databases, the latter from the fields of manufacturing and astronomy, obtaining encouraging results.

Más información

Título según WOS: Unsupervised anomaly detection in large databases using Bayesian networks
Título según SCOPUS: Unsupervised anomaly detection in large databases using bayesian networks
Título de la Revista: APPLIED ARTIFICIAL INTELLIGENCE
Volumen: 22
Número: 4
Editorial: TAYLOR & FRANCIS INC
Fecha de publicación: 2008
Página de inicio: 309
Página final: 330
Idioma: English
URL: http://www.tandfonline.com/doi/abs/10.1080/08839510801972801
DOI:

10.1080/08839510801972801

Notas: ISI, SCOPUS