Distant speech emotion recognition in an indoor human-robot interaction scenario
Keywords: human-computer interaction, speech emotion recognition
Abstract
Social robotics and human-robot partnership are becoming very relevant topics defining many challenges for state-of-the-art speech technology. This paper presents the first evaluation of speech emotion recognition (SER) technology with non-acted speech data recorded in a real indoor human-robot interaction (HRI) scenario. The challenge is typified by distant speech processing, reverberation, and additive external and robot engine noise. We train and evaluate a machine learning-based based on simulated acoustic modelling that includes room impulse responses (RIRs), external noise, and beamforming response. We observe increased performance in the prediction of arousal, valence, and dominance with the proposed training procedure combined with delay-and-sum and minimum variance distortionless response (MVDR), with gain as high as 180%, compared with the result obtained with the model trained with the original data in controlled environments. Moreover, the degradation achieved when compared with the original matched training/testing condition is just 39%.
Más información
| Título según WOS: | Distant speech emotion recognition in an indoor human-robot interaction scenario |
| Título según SCOPUS: | Distant speech emotion recognition in an indoor human-robot interaction scenario |
| Título de la Revista: | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
| Volumen: | 2023- |
| Editorial: | International Speech Communication Association |
| Fecha de publicación: | 2023 |
| Página de inicio: | 3657 |
| Página final: | 3661 |
| Idioma: | English |
| DOI: |
10.21437/Interspeech.2023-1169 |
| Notas: | ISI, SCOPUS |