4.2 Article

A vector space modeling approach to spoken language identification

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TASL.2006.876860

关键词

acoustic segment models (ASMs); artificial neural network (ANN); spoken language identification; support vector machine (SVM); text categorization; vector space model (VSM)

向作者/读者索取更多资源

We propose a novel approach to automatic spoken language identification (LID) based on vector space modeling (VSM). It is assumed that the overall sound characteristics of all spoken languages can be covered by a universal collection of acoustic units, which can be characterized by the acoustic segment models (ASMs). A spoken utterance is then decoded into a sequence of ASM units. The ASM framework furthers the idea of language-independent phone models for LID by introducing an unsupervised learning procedure to circumvent the need for phonetic transcription. Analogous to representing a text document as a term vector, we convert a spoken utterance into a feature vector with its attributes representing the co-occurrence statistics of the acoustic units. As such, we can build a vector space classifier for LED. The proposed VSM approach leads to a discriminative classifier backend, which is demonstrated to give superior performance over likelihood-based n-gram language modeling (LM) backend for long utterances. We evaluated the proposed VSM framework on 1996 and 2003 NIST Language Recognition Evaluation (LRE) databases, achieving an equal error rate (EER) of 2.75% and 4.02% in the 1996 and 2003 LRE 30-s tasks, respectively, which represents one of the best results reported on these popular tasks.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.2
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据