4.4 Article

The Role of Speech Technology in User Perception and Context Acquisition in HRI

期刊

INTERNATIONAL JOURNAL OF SOCIAL ROBOTICS
卷 13, 期 5, 页码 949-968

出版社

SPRINGER
DOI: 10.1007/s12369-020-00682-5

关键词

Human-robot interaction; Perceived robot capability; Synthesized speech; Context acquisition; Speech recognition

类别

资金

  1. Conicyt-Fondecyt [1151306]
  2. ONRG [62909-17-1-2002]

向作者/读者索取更多资源

This paper discusses the role and relevance of speech synthesis and recognition in social robotics, comparing natural synthetic speech with non-linguistic utterances. The importance of understanding robot context and the role of synthetic voice were also evaluated. In experiments with one and two robots, it was found that users prefer synthetic speech over beep-like audio, prefer entering commands by voice, and find the robot voice more influential on perceived capability than voice input.
The role and relevance of speech synthesis and speech recognition in social robotics is addressed in this paper. To increase the generality of this study, the interaction of a human being with one and two robots when executing tasks was considered. By making use of these scenarios, a state-of-the-art speech synthesizer was compared with non-linguistic utterances (1) from the human preference and (2) perception of the robots' capabilities, (3) speech recognition was compared with typed text to input commands regarding the user preference, and (4) the importance of knowing the context of robots and (5) the role of synthetic voice to acquire this context were evaluated. Speech synthesis and recognition are different technologies but generating and understanding speech should be understood as different dimensions of the same spoken language phenomenon. Also, robot context denotes all the information about operating conditions and completeness status of the task that is being executed by the robot. Two robotic setups for online experiments were built. With the first setup, where only one robot was employed, our findings indicate that: highly natural synthetic speech is preferred over beep-like audio; users also prefer to enter commands by voice rather than by typing text; and, the robot voice has a more important effect on the perceived robot's capability than the possibility to input commands by voice. The analysis presented here suggests that when the users interacted with a single robot, its voice as a social cue and cause of anthropomorphization lost relevance while the interaction was carried out and the users could evaluate better the robot's capability with respect to its task. In the experiment with the second setup, a two-robot collaborative testbed was employed. When the robots communicated to each other to sort out the problems while they were trying to accomplish a mission, the user observed the situation from a more distanced position and the reflective perspective dominated. Our results indicate that to acquire the robots' context was perceived as essential for a successful human-robot collaboration to accomplish a given objective. For this purpose, synthesized speech was preferred over text on a screen for context acquisition.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据