COMPENSATING ACOUSTIC MISMATCH FOR ROBUST SPEAKER VERICATION

Poblete, Victor; González Isaac; Astudillo, Alexandra; Vergara, Gastón; The International Institute of Acoustics and Vibration (IIAV) and the UK’s Institute of Acoustics

Keywords: speaker verification, feature extraction, additive noise, anechoic chamber, distant speech.

Abstract

Automatic speaker verification works better when the user speaks near the microphone in a noisy environment. Interaction with such systems may involve variations of speaker-microphone distance, a factor that together with additive noise of a room can dramatically decrease speech intelligibility and speech quality of recorded signal, causing a dramatic increase in the equal error rates (EERs). In this work, we extracted two sets of features: MFCC (Mel Frequency Cepstral Coefficients) and LNCC (Locally Normalized Cepstral Coefficients) to address the acoustic mismatch problem between training and verification environments. To analyze the robustness of these features to compensate for acoustic mismatches, several experiments of text-independent speaker verification (TI-SV) are performed with signals corrupted by additive noise at different signal to noise ratios (SNRs) along with different distances between loudspeaker and microphone within a same room. The reverberation time (T60) of an anechoic chamber is determined for four positions of loudspeaker-microphone distance. At each distance, versions of the YOHO speech corpus are re-recorded sequentially with a single microphone. Five types of noise are selected and recorded in the anechoic chamber. These noises are added to the YOHO versions to generate noisy signals of the utterances at various SNRs: 20 dB, 15 dB, 10 dB, 5 dB, 0 dB and -5 dB.We processed 3920 testing utterances x 4 distances x 5 noise x 6 SNRs = 470,400 signals. Our results indicate that LNCC provides relative reductions in EER, over standard MFCC. The highest reductions in EER are obtained with Airplane noise at SNR=10dB at a loudspeaker-microphone distance of 0.94 m, as high as 68% and 56% when compared with MFCC+CMN, or MFCC+RASTA processing, respectively.

Más información

Editorial: The International Institute of Acoustics and Vibration (IIAV)
Fecha de publicación: 2017
Página de inicio: 1
Página final: 4
Idioma: English
Financiamiento/Sponsor: Funded by grant Universidad Austral de Chile DID-UACh 2015-63