DNN-HMM based Automatic Speech Recognition for HRI Scenarios
Keywords: speech recognition, DNN-HMM, time-varying acoustic channel
Abstract
In this paper, we propose to replace the classical black box integration of automatic speech recognition technology in HRI applications with the incorporation of the HRI environment representation and modeling, and the robot and user states and contexts. Accordingly, this paper focuses on the environment representation and modeling by training a deep neural network-hidden Markov model based automatic speech recognition engine combining clean utterances with the acoustic-channel responses and noise that were obtained from an HRI testbed built with a PR2 mobile manipulation robot. This method avoids recording a training database in all the possible acoustic environments given an HRI scenario. Moreover, different speech recognition testing conditions were produced by recording two types of acoustics sources, i.e. a loudspeaker and human speakers, using a Microsoft Kinect mounted on top of the PR2 robot, while performing head rotations and movements towards and away from the fixed sources. In this generic HRI scenario, the resulting automatic speech recognition engine provided a word error rate that is at least 26% and 38% lower than publicly available speech recognition APIs with the playback (i.e. loudspeaker) and human testing databases, respectively, with a limited amount of training data.
Más información
Fecha de publicación: | 2018 |
Año de Inicio/Término: | March 05-08, 2018 |
Página de inicio: | 150 |
Página final: | 159 |
Idioma: | English |
URL: | https://doi.org/10.1145/3171221.3171280 |
DOI: |
Novoa2018b |