Journal
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS
Volume 12, Issue 12, Pages 3573-3583Publisher
SPRINGER HEIDELBERG
DOI: 10.1007/s13042-021-01406-5
Keywords
Blind source separation (BSS); Underdetermined speech mixtures; Signal recovery; Dictionary learning; Sparse representation
Categories
Funding
- Key-Area Research and Development Program of Guangdong Province [2019B010121001, 2019B010118001]
- National Key Research and Development Project [2018YFB1802400]
- National Natural Science Foundation of China [62003095, 61673124, 61973087, 61773128, 62003094, 61803094, U1911401]
- Natural Science Foundation of Guangdong Province [2018A 0303130080, 2019A 1515011377]
Ask authors/readers for more resources
In this paper, a novel framework is proposed to solve the underdetermined blind source separation of speech mixtures problem using a compressed sensing model. The method includes noise reduction pretreatment, blind identification for accurate mixing matrix estimation, and simultaneous updating of codewords and coefficients for dictionary selection. The approach reduces computational complexity and demonstrates superiority in experimental results.
Underdetermined blind source separation of speech mixtures is a challenging issue in the classical Cocktail-party problem. Recently, there has been attention to use dictionary learning to solve this problem. In this paper, we build a novel framework to solve the underdetermined blind separation of speech mixtures as a sparse signal recovery problem by using a compressed sensing model. First, to eliminate the influence of additive white Gaussian noise, a wavelet transform with tunable Q-factor is used as noise reduction pretreatment. Second, to obtain an accurate mixing matrix estimation, a blind identification method is designed by identifying single source data. Third, to find the best dictionary to represent the training signals, an arbitrary subset of codewords and the corresponding coefficients are updated simultaneously. In the source signal recovery stage, a block processing is used into the mixing signals so that the source components are separated from each block by using sparse representation. Then, the whole source signals are reconstructed by concatenating the separated source components from all the block. The advantage is reducing the computational complexity. Finally, experimental results by separating the underdetermined speech mixtures demonstrate the superiority of the proposed algorithm.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available