4.6 Article

Calculating the distance between languages with deep learning

期刊

出版社

ELSEVIER SCI LTD
DOI: 10.1016/j.bspc.2023.105686

关键词

Accent similarity; Deep learning; Siamese network

向作者/读者索取更多资源

This paper proposes a computational method for calculating the similarity between different languages or language varieties and validates its effectiveness through experiments. The results show that the proposed model outperforms comparative experiments in the identification task and can assist linguists in pre-classifying sound files.
Artificial intelligence (AI) has been implemented in various fields, including speech recognition. In this paper, a computational method is proposed for calculating the similarity between different languages or language vari-eties, with their similarity represented in terms of distance. In this process, we extracted mel spectrogram fea-tures from speech signals to provide the feature vectors and derived pairs of signal tokens based on vectors. Then, we trained a Siamese time-delay neural network to calculate the distance between two signal tokens. If the token pairs are from the same language group, the distance obtained using this Siamese network model is zero. In this preliminary experiment, three types of regional Mandarin Chinese (BJ, FJ, GD) were used as the dataset. The results gave the F1-score of 0.794, 0.623, and 0.715 for the classification task with respect to BJ, FJ, and GD dataset. In addition, 10 Taiwan Mandarin (TM) native speakers participated in identification and a pair-wise discrimination experiment to allow comparison with the Siamese network model. The 10 TM natives tended to misidentify GD-accented Mandarin as FJ-accented Mandarin resulting in a much greater distance between the two in the FJ-GD discrimination task compared to the Siamese network model. Overall, the results show that the performance of our model is better than the comparative experiment in completing the identification task, with the distance between the Siamese network model and the experiment having a mean absolute error (MAE) of 0.35. The familiarity might be the reason why the 10 TM natives displayed a bias towards BJ-accented Mandarin Chinese. To sum up, we provide a computational method to calculate the distance between two languages or language varieties, which can help linguists pre-classify sound files.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据