4.6 Article

Voice Orientation Recognition: New Paradigm of Speech-Based Human-Computer Interaction

出版社

TAYLOR & FRANCIS INC
DOI: 10.1080/10447318.2023.2233128

关键词

Voice orientation recognition; human-computer interaction; speech interaction; mouth radiation pattern; attention mechanism; >

向作者/读者索取更多资源

Speech-based HCI, as a popular form of interaction, allows people to communicate verbally with machines. However, current speech-based HCI fails to comprehend deeper pointing information. This paper proposes Oriennet, which identifies the orientation of human voice with 95% accuracy in determining whether people are facing the device or not.
As one of the most preferred forms of Human-Computer Interaction (HCI) nowadays, speech-based HCI enables people to communicate verbally with machines, leveraging technologies such as speech recognition and speech synthesis. Current paradigm of speech-based HCI focus on the content of speech only, failing to comprehend deeper pointing information in voice interaction. In particular, when encountering scenarios with multiple smart voice devices around, if people intend to interact with a certain device, the lack of extra pointing information (like the role played by the direction of eye gaze) would cause unintended response from the other devices, resulting in poor interaction experience during HCI. Hence, an interesting problem is: Is it possible for the devices to be aware of the orientation of human voice with only the acoustic speech signals? There is little research studying this topic, except for very a few primary works with much room for improvement. The main challenge of this study lies in capturing the concealed orientation information embedded within the speech signal, while simultaneously maintaining the scheme's practicality and high precision. In this paper, we propose Oriennet, for identifying the orientation of human voice. With a series of features intentionally designed in view of the indoor voice propagation model and mouth radiation pattern, as well as the application of attention mechanism, Oriennet achieve 95% accuracy in terms of judging whether people are facing the device or not. Even for the fine-grained task of classifying people's specific orientation from 8 different directions, our work achieved an accuracy of 74%, far outperforming the existed works. We have validated the robustness of Oriennet under various conditions (noisy environment; different people, rooms, languages, locations; fewer microphones), demonstrating its promising applicability in real-life scenarios.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据