The Role of Speech Technology in User Perception and Context Acquisition in HRI

Wuth, Jorge; Correa, Pedro; Nunez, Tomas; Saavedra, Matias; Yoma, Nestor Becerra

Abstract

The role and relevance of speech synthesis and speech recognition in social robotics is addressed in this paper. To increase the generality of this study, the interaction of a human being with one and two robots when executing tasks was considered. By making use of these scenarios, a state-of-the-art speech synthesizer was compared with non-linguistic utterances (1) from the human preference and (2) perception of the robots' capabilities, (3) speech recognition was compared with typed text to input commands regarding the user preference, and (4) the importance of knowing the context of robots and (5) the role of synthetic voice to acquire this context were evaluated. Speech synthesis and recognition are different technologies but generating and understanding speech should be understood as different dimensions of the same spoken language phenomenon. Also, robot context denotes all the information about operating conditions and completeness status of the task that is being executed by the robot. Two robotic setups for online experiments were built. With the first setup, where only one robot was employed, our findings indicate that: highly natural synthetic speech is preferred over beep-like audio; users also prefer to enter commands by voice rather than by typing text; and, the robot voice has a more important effect on the perceived robot's capability than the possibility to input commands by voice. The analysis presented here suggests that when the users interacted with a single robot, its voice as a social cue and cause of anthropomorphization lost relevance while the interaction was carried out and the users could evaluate better the robot's capability with respect to its task. In the experiment with the second setup, a two-robot collaborative testbed was employed. When the robots communicated to each other to sort out the problems while they were trying to accomplish a mission, the user observed the situation from a more distanced position and the "reflective" perspective dominated. Our results indicate that to acquire the robots' context was perceived as essential for a successful human-robot collaboration to accomplish a given objective. For this purpose, synthesized speech was preferred over text on a screen for context acquisition.

Más información

Título según WOS: The Role of Speech Technology in User Perception and Context Acquisition in HRI
Título de la Revista: INTERNATIONAL JOURNAL OF SOCIAL ROBOTICS
Número: 5
Editorial: Springer
Fecha de publicación: 2020
DOI:

10.1007/S12369-020-00682-5

Notas: ISI