☆ 4.6 Article

FSD50K: An Open Dataset of Human-Labeled Sound Events

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2022)

期刊

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

卷 30, 期 -, 页码 829-852

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TASLP.2021.3133208

关键词

Videos; Task analysis; Labeling; Vocabulary; Speech recognition; Ontologies; Benchmark testing; Audio dataset; sound event; recognition; classification; tagging; data collection; environmental sound

类别

Acoustics Engineering, Electrical & Electronic

资金

European Union [688382]
Google Faculty Research Awards
Maria de Maeztu Units of Excellence Programme [MDM-2015-0502]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Most existing datasets for sound event recognition (SER) are relatively small and/or domain-specific. AudioSet, which is based on over 2 M tracks from YouTube videos, is not an open dataset. To address this issue, researchers introduce FSD50K, an open dataset containing over 51k audio clips manually labeled using 200 classes from the AudioSet Ontology. The dataset is freely distributable and detailed descriptions and analysis are provided. Baseline systems and insights on data splitting for SER are also presented.

Most existing datasets for sound event recognition (SER) are relatively small and/or domain-specific, with the exception of AudioSet, based on over 2 M tracks from YouTube videos and encompassing over 500 sound classes. However, AudioSet is not an open dataset as its official release consists of pre-computed audio features. Downloading the original audio tracks can be problematic due to YouTube videos gradually disappearing and usage rights issues. To provide an alternative benchmark dataset and thus foster SER research, we introduce FSD50K, an open dataset containing over 51 k audio clips totalling over 100 h of audio manually labeled using 200 classes drawn from the AudioSet Ontology. The audio clips are licensed under Creative Commons licenses, making the dataset freely distributable (including waveforms). We provide a detailed description of the FSD50K creation process, tailored to the particularities of Freesound data, including challenges encountered and solutions adopted. We include a comprehensive dataset characterization along with discussion of limitations and key factors to allow its audio-informed usage. Finally, we conduct sound event classification experiments to provide baseline systems as well as insight on the main factors to consider when splitting Freesound audio data for SER. Our goal is to develop a dataset to be widely adopted by the community as a new open benchmark for SER research.

FSD50K: An Open Dataset of Human-Labeled Sound Events

期刊

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

FSD50K: An Open Dataset of Human-Labeled Sound Events

期刊

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文