☆ 4.6 Article

Neural Network Adaptation and Data Augmentation for Multi-Speaker Direction-of-Arrival Estimation

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2021)

期刊

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

卷 29, 期 -, 页码 1303-1317

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TASLP.2021.3060257

关键词

Data models; Adaptation models; Direction-of-arrival estimation; Neural networks; Location awareness; Data collection; Robots; DOA estimation; data augmentation; sound source localization; weakly-supervised learning

类别

Acoustics Engineering, Electrical & Electronic

资金

European Union under the EU Horizon 2020 Research and Innovation Action MuMMER Project (MultiModal Mall Entertainment Robot) [688147]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper proposes a novel approach for multi-speaker direction-of-arrival estimation using data augmentation and weakly-supervised domain adaptation. By generating source domain data with simulation and collecting real data annotated with weak labels, the proposed method achieves similar performance as fully-labeled real data. The approach suggests an effective development procedure for DOA estimation models applied to new types of microphone arrays with minimal data collection efforts.

Deep neural networks have been successfully applied to sound direction-of-arrival estimation under challenging conditions. However, such a learning-based approach requires a large amount of labeled training data, which is difficult to acquire. To address this problem, we propose a novel approach for multi-speaker direction-of-arrival estimation with data augmentation and weakly-supervised domain adaptation. We generate source domain data with simulation, and collect real data annotated with the number of sound sources as the weak labels. The real data are further augmented by mixing single-source segments. Then, weakly-supervised domain adaptation is applied to models pre-trained on the simulated data. We define a loss function for the adaptation process which exploits the weak labels and the mixture component information in the augmented data. Experiments with real robot audio data show that our proposed approach achieves similar performance as if the fully-labeled real data are used. This paper suggests an effective development procedure for DOA estimation models applied to new types of microphone arrays with minimal data collection efforts.

Neural Network Adaptation and Data Augmentation for Multi-Speaker Direction-of-Arrival Estimation

期刊

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Neural Network Adaptation and Data Augmentation for Multi-Speaker Direction-of-Arrival Estimation

期刊

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文