An HMM-DNN-Based System for the Detection and Classification of Low-Frequency Acoustic Signals from Baleen Whales, Earthquakes, and Air Guns off Chile

Buchan, Susannah J. J.; Duran, Miguel; Rojas, Constanza; Wuth, Jorge; Mahu, Rodrigo; Stafford, Kathleen M. M.; Yoma, Nestor Becerra


Marine passive acoustic monitoring can be used to study biological, geophysical, and anthropogenic phenomena in the ocean. The wide range of characteristics from geophysical, biological, and anthropogenic sounds sources makes the simultaneous automatic detection and classification of these sounds a significant challenge. Here, we propose a single Hidden Markov Model-based system with a Deep Neural Network (HMM-DNN) for the detection and classification of low-frequency biological (baleen whales), geophysical (earthquakes), and anthropogenic (air guns) sounds. Acoustic data were obtained from the Preparatory Commission for the Comprehensive Nuclear-Test-Ban Treaty Organization station off Juan Fernandez, Chile (station HA03) and annotated by an analyst (498 h of audio data containing 30,873 events from 19 different classes), and then divided into training (60%), testing (20%), and tuning (20%) subsets. Each audio frame was represented as an observation vector obtained through a filterbank-based spectral feature extraction procedure. The HMM-DNN training procedure was carried out discriminatively by setting HMM states as targets. A model with Gaussian Mixtures Models and HMM (HMM-GMM) was trained to obtain an initial set of HMM target states. Feature transformation based on Linear Discriminant Analysis and Maximum Likelihood Linear Transform was also incorporated. The HMM-DNN system displayed good capacity for correctly detecting and classifying events, with high event-level accuracy (84.46%), high weighted average sensitivity (84.46%), and high weighted average precision (89.54%). Event-level accuracy increased with higher event signal-to-noise ratios. Event-level metrics per class also showed that our HMM-DNN system generalized well for most classes but performances were best for classes that either had a high number of training exemplars (e.g., generally above 50) and/or were for classes of signals that had low variability in spectral features, duration, and energy levels. Fin whale and Antarctic blue whale song and air guns performed particularly well.

Más información

Título según WOS: ID WOS:000998108100001 Not found in local WOS DB
Título de la Revista: REMOTE SENSING
Volumen: 15
Número: 10
Editorial: MDPI
Fecha de publicación: 2023


Notas: ISI