Estimating the Number of Speakers by Novel Zig-Zag Nested Microphone Array Based on Wavelet Packet and Adaptive GCC Method
Abstract
In this paper, a new speaker counting algorithm is proposed by novel zig-zag nested array (ZZNA) combining with adaptive generalized cross-correlation (GCC) function (with phase transform (PHAT) and maximum likelihood (ML)) and wavelet packet transform (WPT) with an agglomerative classification method by Elbow decisioning criteria. The proper ZZNA is introduced for covering the acoustical environments and removing the spatial aliasing. Then, the WPT with different frequency resolution is considered for preparing the frequency subbands. The adaptive GCC function based on PHAT and ML weighting filters is done on the microphone pairs for each subbands. Finally, the unsupervised agglomerative classification method with Elbow criteria is considered for classifying the information and speakers' counting. The proposed ZZNA-WAGC method is compared with Hilbert envelope, multi-channel correlational recurrent neural network by using of ambisonics features (AF-CRNN) and estimating the number of speakers by density-based classification and clustering decision (ENS-DCCD) algorithms to show the superiority of the method in undesirable scenarios.
Más información
Título según SCOPUS: | ID SCOPUS_ID:85146965808 Not found in local SCOPUS DB |
Fecha de publicación: | 2022 |
Página de inicio: | 358 |
Página final: | 363 |
DOI: |
10.1109/ICSC56524.2022.10009025 |
Notas: | SCOPUS |