Sound Source Localization Using Hybrid Convolutional Recurrent Neural Networks in Undesirable Conditions

Zamorano, Bastian Estay; Firoozabadi, Ali Dehghan; Brutti, Alessio; Adasme, Pablo; Zabala-Blanco, David; Jativa, Pablo Palacios; Azurdia-Meza, Cesar A.

Abstract

Sound event localization and detection (SELD) is a fundamental task in spatial audio processing that involves identifying both the type and location of sound events in acoustic scenes. Current SELD models often struggle with low signal-to-noise ratios (SNRs) and high reverberation. This article addresses SELD by reformulating direction of arrival (DOA) estimation as a multi-class classification task, leveraging deep convolutional recurrent neural networks (CRNNs). We propose and evaluate two modified architectures: M-DOAnet, an optimized version of DOAnet for localization and tracking, and M-SELDnet, a modified version of SELDnet, which has been designed for joint SELD. Both modified models were rigorously evaluated on the STARSS23 dataset, which comprises 13-class, real-world indoor scenes totaling over 7 h of audio, using spectrograms and acoustic intensity maps from first-order Ambisonics (FOA) signals. M-DOAnet achieved exceptional localization (6.00 degrees DOA error, 72.8% F1-score) and perfect tracking (100% MOTA with zero identity switches). It also demonstrated high computational efficiency, training in 4.5 h (164 s/epoch). In contrast, M-SELDnet delivered strong overall SELD performance (0.32 rad DOA error, 0.75 F1-score, 0.38 error rate, 0.20 SELD score), but with significantly higher resource demands, training in 45 h (1620 s/epoch). Our findings underscore a clear trade-off between model specialization and multifunctionality, providing practical insights for designing SELD systems in real-time and computationally constrained environments.

Más información

Título según WOS: ID WOS:001539766600001 Not found in local WOS DB
Título de la Revista: ELECTRONICS
Volumen: 14
Número: 14
Editorial: MDPI
Fecha de publicación: 2025
DOI:

10.3390/electronics14142778

Notas: ISI