DNN-HMM based Automatic Speech Recognition for HRI Scenarios

Novoa, José; Wuth, Jorge; Escudero, Juan Pablo; Fredes, Josué; Mahu, Rodrigo; Becerra Yoma, Néstor

Keywords: speech recognition, DNN-HMM, time-varying acoustic channel

Abstract

In this paper, we propose to replace the classical black box integration of automatic speech recognition technology in HRI applications with the incorporation of the HRI environment representation and modeling, and the robot and user states and contexts. Accordingly, this paper focuses on the environment representation and modeling by training a deep neural network-hidden Markov model based automatic speech recognition engine combining clean utterances with the acoustic-channel responses and noise that were obtained from an HRI testbed built with a PR2 mobile manipulation robot. This method avoids recording a training database in all the possible acoustic environments given an HRI scenario. Moreover, different speech recognition testing conditions were produced by recording two types of acoustics sources, i.e. a loudspeaker and human speakers, using a Microsoft Kinect mounted on top of the PR2 robot, while performing head rotations and movements towards and away from the fixed sources. In this generic HRI scenario, the resulting automatic speech recognition engine provided a word error rate that is at least 26% and 38% lower than publicly available speech recognition APIs with the playback (i.e. loudspeaker) and human testing databases, respectively, with a limited amount of training data.

Más información

Fecha de publicación: 2018
Año de Inicio/Término: March 05-08, 2018
Página de inicio: 150
Página final: 159
Idioma: English
URL: https://doi.org/10.1145/3171221.3171280
DOI:

Novoa2018b