PDI - Resultado de Búsqueda

Zamorano, Bastian Estay; Firoozabadi, Ali Dehghan; Brutti, Alessio; Adasme, Pablo; Zabala-Blanco, David; Jativa, Pablo Palacios; Azurdia-Meza, Cesar A.

Abstract

Sound event localization and detection (SELD) is a fundamental task in spatial audio processing that involves identifying both the type and location of sound events in acoustic scenes. Current SELD models often struggle with low signal-to-noise ratios (SNRs) and high reverberation. This article addresses SELD by reformulating direction of arrival (DOA) estimation as a multi-class classification task, leveraging deep convolutional recurrent neural networks (CRNNs). We propose and evaluate two modified architectures: M-DOAnet, an optimized version of DOAnet for localization and tracking, and M-SELDnet, a modified version of SELDnet, which has been designed for joint SELD. Both modified models were rigorously evaluated on the STARSS23 dataset, which comprises 13-class, real-world indoor scenes totaling over 7 h of audio, using spectrograms and acoustic intensity maps from first-order Ambisonics (FOA) signals. M-DOAnet achieved exceptional localization (6.00° DOA error, 72.8% F1-score) and perfect tracking (100% MOTA with zero identity switches). It also demonstrated high computational efficiency, training in 4.5 h (164 s/epoch). In contrast, M-SELDnet delivered strong overall SELD performance (0.32 rad DOA error, 0.75 F1-score, 0.38 error rate, 0.20 SELD score), but with significantly higher resource demands, training in 45 h (1620 s/epoch). Our findings underscore a clear trade-off between model specialization and multifunctionality, providing practical insights for designing SELD systems in real-time and computationally constrained environments. © 2025 by the authors.

Más información

Título según WOS:	Sound Source Localization Using Hybrid Convolutional Recurrent Neural Networks in Undesirable Conditions
Título según SCOPUS:	Sound Source Localization Using Hybrid Convolutional Recurrent Neural Networks in Undesirable Conditions
Título de la Revista:	Electronics (Switzerland)
Volumen:	14
Número:	14
Editorial:	Multidisciplinary Digital Publishing Institute (MDPI)
Fecha de publicación:	2025
Idioma:	English
DOI:	10.3390/electronics14142778
Notas:	ISI, SCOPUS

Sound Source Localization Using Hybrid Convolutional Recurrent Neural Networks in Undesirable Conditions

Abstract

Más información