4.4 Article

A Wideband Scalable Bit Rate Mixed Excitation Linear Prediction-Enhanced Speech Coder by Preserving Speaker-Specific Features

期刊

出版社

SPRINGER BIRKHAUSER
DOI: 10.1007/s00034-022-02277-z

关键词

LPC; Linear prediction; MELP; Speech coding; Speech processing; Wideband speech coding

向作者/读者索取更多资源

There is a growing demand for voice-activated applications due to the significant growth of mobile devices and services. It is important to capture individual speaker characteristics in addition to the relevant information in the speech signal. This paper proposes a wideband scalable bit rate speech coder that efficiently represents excitation using glottal instants and linear predictive coding based on the mel scale.
There has been a significant growth in the mobile devices and services, fuelling an increasing demand for voice-activated applications. In this context, it is important that individual speaker characteristics are captured, in addition to the salient information in the speech signal. Thus, efficient speech coders that can achieve the dual goals of compact speech representation that maintains speech intelligibility and quality, and preservation of speaker-specific characteristics are attractive. A wideband scalable bit rate mixed excitation linear prediction-enhanced speech coder with an efficient representation for excitation using glottal instants and linear predictive coding based on mel scale is proposed in this paper. The instantaneous pitch or epoch is included in the excitation to get an accurate estimation of glottal instants, a vital parameter in speaker recognition. By optimizing the bit requirement using speech category-based coding, the proposed wideband coder can operate at bit rates ranging from 3.3 to 5.1 kbps with an average bit rate of 3.6 kbps. The proposed coder provides, at 3.6 kbps, similar perceptual quality, as measured by mean opinion score and perceptual evaluation of speech quality, as that of code excited linear prediction operating at 6.4 kbps. The performance of the proposed coder in speaker recognition is analysed, and it gives an equal error rate of 12.5%, which is very promising.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据