☆ 3.8 Proceedings Paper

DUAL-BRANCH ATTENTION-IN-ATTENTION TRANSFORMER FOR SINGLE-CHANNEL SPEECH ENHANCEMENT

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) (2022)

期刊

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)

卷 -, 期 -, 页码 7847-7851

出版社

IEEE

DOI: 10.1109/ICASSP43922.2022.9746273

关键词

Speech enhancement; dual-branch; attention-in-attention; transformer

类别

Acoustics Computer Science, Artificial Intelligence Engineering, Electrical & Electronic

资金

National Natural Science Foundation of China [61631016]
National Key R&D Program of China [SQ2020YFF0426386]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study proposes a dual-branch attention-in-attention transformer (DB-AIAT) to handle both coarse and fine-grained regions of the spectrum. The proposed DB-AIAT achieves state-of-the-art performance on Voice Bank + DEMAND dataset with a relatively small model size.

Curriculum learning begins to thrive in the speech enhancement area, which decouples the original spectrum estimation task into multiple easier sub-tasks to achieve better performance. Motivated by that, we propose a dual-branch attention-in-attention transformer dubbed DB-AIAT to handle both coarse- and fine-grained regions of the spectrum in parallel. From a complementary perspective, a magnitude masking branch is proposed to coarsely estimate the overall magnitude spectrum, and simultaneously a complex refining branch is elaborately designed to compensate for the missing spectral details and implicitly derive phase information. Within each branch, we propose a novel attention-in-attention transformer-based module to replace the conventional RNNs and temporal convolutional networks for temporal sequence modeling. Specifically, the proposed attention-in-attention transformer consists of adaptive temporal-frequency attention transformer blocks and an adaptive hierarchical attention module, aiming to capture long-term temporal-frequency dependencies and further aggregate global hierarchical contextual information. Experimental results on Voice Bank + DEMAND demonstrate that DB-AIAT yields state-of-the-art performance (e.g., 3.31 PESQ, 95.6% STOI and 10.79dB SSNR) over previous advanced systems with a relatively small model size (2.81M).

DUAL-BRANCH ATTENTION-IN-ATTENTION TRANSFORMER FOR SINGLE-CHANNEL SPEECH ENHANCEMENT

期刊

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)

出版社

IEEE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

DUAL-BRANCH ATTENTION-IN-ATTENTION TRANSFORMER FOR SINGLE-CHANNEL SPEECH ENHANCEMENT

期刊

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)

出版社

IEEE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文