☆ 3.8 Article

Comparative Analysis of Deep Learning Architectures and Vision Transformers for Musical Key Estimation

INFORMATION (2023)

期刊

INFORMATION

卷 14, 期 10, 页码 -

出版社

MDPI

DOI: 10.3390/info14100527

关键词

music information retrieval (MIR); musical key estimation; deep learning; vision transformers; convolutional neural networks (CNNs)

类别

Computer Science, Information Systems

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper presents a comprehensive comparison between deep learning architectures and vision transformers in the field of musical key estimation. The results show that DenseNet achieves remarkable accuracy, but vision transformers demonstrate superior performance in temporal metrics. The findings contribute to accurate and efficient algorithms for music recommendation systems and automatic music transcription, providing valuable insights for practical implementations.

The musical key serves as a crucial element in a piece, offering vital insights into the tonal center, harmonic structure, and chord progressions while enabling tasks such as transposition and arrangement. Moreover, accurate key estimation finds practical applications in music recommendation systems and automatic music transcription, making it relevant across academic and industrial domains. This paper presents a comprehensive comparison between standard deep learning architectures and emerging vision transformers, leveraging their success in various domains. We evaluate their performance on a specific subset of the GTZAN dataset, analyzing six different deep learning models. Our results demonstrate that DenseNet, a conventional deep learning architecture, achieves remarkable accuracy of 91.64%, outperforming vision transformers. However, we delve deeper into the analysis to shed light on the temporal characteristics of each deep learning model. Notably, the vision transformer and SWIN transformer exhibit a slight decrease in overall performance (1.82% and 2.29%, respectively), yet they demonstrate superior performance in temporal metrics compared to the DenseNet architecture. The significance of our findings lies in their contribution to the field of musical key estimation, where accurate and efficient algorithms play a pivotal role. By examining the strengths and weaknesses of deep learning architectures and vision transformers, we can gain valuable insights for practical implementations, particularly in music recommendation systems and automatic music transcription. Our research provides a foundation for future advancements and encourages further exploration in this area.

Comparative Analysis of Deep Learning Architectures and Vision Transformers for Musical Key Estimation

期刊

INFORMATION

出版社

MDPI

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Comparative Analysis of Deep Learning Architectures and Vision Transformers for Musical Key Estimation

期刊

INFORMATION

出版社

MDPI

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文