4.7 Article

Speech Emotion Recognition Enhanced Traffic Efficiency Solution for Autonomous Vehicles in a 5G-Enabled Space-Air-Ground Integrated Intelligent Transportation System

期刊

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TITS.2021.3119921

关键词

Speech recognition; Autonomous vehicles; Satellites; Emotion recognition; Next generation networking; Computer science; Communication networks; Speech emotion recognition; autonomous vehicles; artificial intelligence; 5G-enabled SAGIN; ITS

资金

  1. National Natural Science Foundation of China [61373162]
  2. Sichuan Provincial Science and Technology Department Project [2019YFG0183]
  3. Sichuan Provincial Key Laboratory Project [KJ201402]
  4. Japan Society for the Promotion of Science (JSPS) [JP18K18044, JP21K17736]

向作者/读者索取更多资源

Speech emotion recognition (SER) is becoming an important aspect of human-computer interaction for autonomous vehicles in the next generation of transportation systems. However, current vehicle-mounted SER systems have limitations in terms of communication network capacity and accuracy. To address these issues, a solution is proposed using a 5G-enabled space-air-ground integrated network to improve traffic efficiency and enhance the performance and user experience of autonomous vehicles.
Speech emotion recognition (SER) is becoming the main human-computer interaction logic for autonomous vehicles in the next generation of intelligent transportation systems (ITSs). It can improve not only the safety of autonomous vehicles but also the personalized in-vehicle experience. However, current vehicle-mounted SER systems still suffer from two major shortcomings. One is the insufficient service capacity of the vehicle communication network, which is unable to meet the SER needs of autonomous vehicles in next-generation ITSs in terms of the data transmission rate, power consumption, and latency. Second, the accuracy of SER is poor, and it cannot provide sufficient interactivity and personalization between users and vehicles. To address these issues, we propose an SER-enhanced traffic efficiency solution for autonomous vehicles in a 5G-enabled space-air-ground integrated network (SAGIN)-based ITS. First, we convert the vehicle speech information data into spectrograms and input them into an AlexNet network model to obtain the high-level features of the vehicle speech acoustic model. At the same time, we convert the vehicle speech information data into text information and input it into the Bidirectional Encoder Representations from Transformers (BERT) model to obtain the high-level features of the corresponding text model. Finally, these two sets of high-level features are cascaded together to obtain fused features, which are sent to a softmax classifier for emotion matching and classification. Experiments show that the proposed solution can improve not only the SAGIN's service capabilities, resulting in a large capacity, high bandwidth, ultralow latency, and high reliability, but also the accuracy of vehicle SER as well as the performance, practicality, and user experience of the ITS.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据