期刊
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
卷 -, 期 -, 页码 4603-4607出版社
IEEE
DOI: 10.1109/ICASSP43922.2022.9746239
关键词
Audio-to-Score Transcription; Connectionist Temporal Classification; Unconstrained Polyphony
资金
- MCIN/AEI [PID2020-118447RA-I00]
- FEDER [IDIFEDER/2020/003]
- Valencian Government [IDIFEDER/2020/003]
This research introduces a new output representation to address the limitations of sequence-based A2S recognition framework and provides an initial approximation for dealing with unconstrained polyphony. The proposed method is validated using synthetic audio from string quartets and piano sonatas with intricate polyphonic mixtures, and it improves the state-of-the-art rates for fixed-polyphony.
Neural Audio-to-Score (A2S) Music Transcription systems have shown promising results with pieces containing a fixed number of voices. However, they still exhibit fundamental limitations that constrain their applicability in wider scenarios. This work aims at tackling two of them: we introduce a novel output representation which addresses shortcomings related to the sequence-based A2S recognition framework and we report a first approximation to dealing with unconstrained polyphony. This is validated on a Convolutional Recurrent Neural Network (CRNN) with Connectionist Temporal Classification (CTC) A2S scheme using synthetic audio from string quartets and piano sonatas with intricate polyphonic mixtures. Our results, which improve fixed-polyphony state-of-the-art rates, may be considered a reference for future A2S works dealing with an unconstrained number of voices.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据