3.8 Proceedings Paper

NEURAL AUDIO-TO-SCORE MUSIC TRANSCRIPTION FOR UNCONSTRAINED POLYPHONY USING COMPACT OUTPUT REPRESENTATIONS

出版社

IEEE
DOI: 10.1109/ICASSP43922.2022.9746239

关键词

Audio-to-Score Transcription; Connectionist Temporal Classification; Unconstrained Polyphony

资金

  1. MCIN/AEI [PID2020-118447RA-I00]
  2. FEDER [IDIFEDER/2020/003]
  3. Valencian Government [IDIFEDER/2020/003]

向作者/读者索取更多资源

This research introduces a new output representation to address the limitations of sequence-based A2S recognition framework and provides an initial approximation for dealing with unconstrained polyphony. The proposed method is validated using synthetic audio from string quartets and piano sonatas with intricate polyphonic mixtures, and it improves the state-of-the-art rates for fixed-polyphony.
Neural Audio-to-Score (A2S) Music Transcription systems have shown promising results with pieces containing a fixed number of voices. However, they still exhibit fundamental limitations that constrain their applicability in wider scenarios. This work aims at tackling two of them: we introduce a novel output representation which addresses shortcomings related to the sequence-based A2S recognition framework and we report a first approximation to dealing with unconstrained polyphony. This is validated on a Convolutional Recurrent Neural Network (CRNN) with Connectionist Temporal Classification (CTC) A2S scheme using synthetic audio from string quartets and piano sonatas with intricate polyphonic mixtures. Our results, which improve fixed-polyphony state-of-the-art rates, may be considered a reference for future A2S works dealing with an unconstrained number of voices.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据