☆ 4.6 Article

A lightweight speech recognition method with target-swap knowledge distillation for Mandarin air traffic control communications

PEERJ COMPUTER SCIENCE (2023)

期刊

PEERJ COMPUTER SCIENCE

卷 9, 期 -, 页码 -

出版社

PEERJ INC

DOI: 10.7717/peerj-cs.1650

关键词

Automatic speech recognition; Knowledge distillation; Air traffic control communications; Model compression; Mandarin ASR; Lightweight ASR

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems Computer Science, Theory & Methods

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This article introduces knowledge distillation into ASR for Mandarin ATC communications to enhance the generalization performance of the lightweight model. By using the Target-Swap Knowledge Distillation (TSKD) strategy, the potential overconfidence of the teacher model regarding the target class can be mitigated. Experimental results demonstrate that the generated lightweight ASR model achieves a balance between recognition accuracy and transcription latency.

Miscommunications between air traffic controllers (ATCOs) and pilots in air traffic control (ATC) may lead to catastrophic aviation accidents. Thanks to advances in speech and language processing, automatic speech recognition (ASR) is an appealing approach to prevent misunderstandings. To allow ATCOs and pilots sufficient time to respond instantly and effectively, the ASR systems for ATC must have both superior recognition performance and low transcription latency. However, most existing ASR works for ATC are primarily concerned with recognition performance while paying little attention to recognition speed, which motivates the research in this article. To address this issue, this article introduces knowledge distillation into the ASR for Mandarin ATC communications to enhance the generalization performance of the light model. Specifically, we propose a simple yet effective lightweight strategy, named Target-Swap Knowledge Distillation (TSKD), which swaps the logit output of the teacher and student models for the target class. It can mitigate the potential overconfidence of the teacher model regarding the target class and enable the student model to concentrate on the distillation of knowledge from non-target classes. Extensive experiments are conducted to demonstrate the effectiveness of the proposed TSKD in homogeneous and heterogeneous architectures. The experimental results reveal that the generated lightweight ASR model achieves a balance between recognition accuracy and transcription latency.

A lightweight speech recognition method with target-swap knowledge distillation for Mandarin air traffic control communications

期刊

PEERJ COMPUTER SCIENCE

出版社

PEERJ INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A lightweight speech recognition method with target-swap knowledge distillation for Mandarin air traffic control communications

期刊

PEERJ COMPUTER SCIENCE

出版社

PEERJ INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文