Distant speech emotion recognition in an indoor human-robot interaction scenario

Grágeda, N; Busso, C; Alvarado E.; Mahu R.; Yoma N.B.

Keywords: human-computer interaction, speech emotion recognition

Abstract

Social robotics and human-robot partnership are becoming very relevant topics defining many challenges for state-of-the-art speech technology. This paper presents the first evaluation of speech emotion recognition (SER) technology with non-acted speech data recorded in a real indoor human-robot interaction (HRI) scenario. The challenge is typified by distant speech processing, reverberation, and additive external and robot engine noise. We train and evaluate a machine learning-based based on simulated acoustic modelling that includes room impulse responses (RIRs), external noise, and beamforming response. We observe increased performance in the prediction of arousal, valence, and dominance with the proposed training procedure combined with delay-and-sum and minimum variance distortionless response (MVDR), with gain as high as 180%, compared with the result obtained with the model trained with the original data in controlled environments. Moreover, the degradation achieved when compared with the original matched training/testing condition is just 39%.

Más información

Título según WOS: Distant speech emotion recognition in an indoor human-robot interaction scenario
Título según SCOPUS: Distant speech emotion recognition in an indoor human-robot interaction scenario
Título de la Revista: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volumen: 2023-
Editorial: International Speech Communication Association
Fecha de publicación: 2023
Página de inicio: 3657
Página final: 3661
Idioma: English
DOI:

10.21437/Interspeech.2023-1169

Notas: ISI, SCOPUS