☆ 4.4 Article

A Wideband Scalable Bit Rate Mixed Excitation Linear Prediction-Enhanced Speech Coder by Preserving Speaker-Specific Features

CIRCUITS SYSTEMS AND SIGNAL PROCESSING (2023)

期刊

CIRCUITS SYSTEMS AND SIGNAL PROCESSING

卷 -, 期 -, 页码 -

出版社

SPRINGER BIRKHAUSER

DOI: 10.1007/s00034-022-02277-z

关键词

LPC; Linear prediction; MELP; Speech coding; Speech processing; Wideband speech coding

类别

Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

There is a growing demand for voice-activated applications due to the significant growth of mobile devices and services. It is important to capture individual speaker characteristics in addition to the relevant information in the speech signal. This paper proposes a wideband scalable bit rate speech coder that efficiently represents excitation using glottal instants and linear predictive coding based on the mel scale.

There has been a significant growth in the mobile devices and services, fuelling an increasing demand for voice-activated applications. In this context, it is important that individual speaker characteristics are captured, in addition to the salient information in the speech signal. Thus, efficient speech coders that can achieve the dual goals of compact speech representation that maintains speech intelligibility and quality, and preservation of speaker-specific characteristics are attractive. A wideband scalable bit rate mixed excitation linear prediction-enhanced speech coder with an efficient representation for excitation using glottal instants and linear predictive coding based on mel scale is proposed in this paper. The instantaneous pitch or epoch is included in the excitation to get an accurate estimation of glottal instants, a vital parameter in speaker recognition. By optimizing the bit requirement using speech category-based coding, the proposed wideband coder can operate at bit rates ranging from 3.3 to 5.1 kbps with an average bit rate of 3.6 kbps. The proposed coder provides, at 3.6 kbps, similar perceptual quality, as measured by mean opinion score and perceptual evaluation of speech quality, as that of code excited linear prediction operating at 6.4 kbps. The performance of the proposed coder in speaker recognition is analysed, and it gives an equal error rate of 12.5%, which is very promising.

A Wideband Scalable Bit Rate Mixed Excitation Linear Prediction-Enhanced Speech Coder by Preserving Speaker-Specific Features

期刊

CIRCUITS SYSTEMS AND SIGNAL PROCESSING

出版社

SPRINGER BIRKHAUSER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A Wideband Scalable Bit Rate Mixed Excitation Linear Prediction-Enhanced Speech Coder by Preserving Speaker-Specific Features

期刊

CIRCUITS SYSTEMS AND SIGNAL PROCESSING

出版社

SPRINGER BIRKHAUSER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文