4.1 Article

DescribeML: A dataset description tool for machine learning

期刊

SCIENCE OF COMPUTER PROGRAMMING
卷 231, 期 -, 页码 -

出版社

ELSEVIER
DOI: 10.1016/j.scico.2023.103030

关键词

Datasets; Machine learning; Model-driven engineering; Fairness; Domain-specific languages

向作者/读者索取更多资源

Datasets are crucial for training and evaluating machine learning models, but they can also lead to undesirable behaviors like biased predictions. To tackle this issue, the machine learning community suggests adopting consistent guidelines for dataset descriptions. However, these guidelines rely on natural language descriptions, which hinder automated computation and analysis. To overcome this, we present DescribeML, a language engineering tool that provides precise, structured descriptions of machine learning datasets, including their composition, provenance, and social concerns.
Datasets are essential for training and evaluating machine learning models. However, they are also the root cause of many undesirable model behaviors, such as biased predictions. To address this issue, the machine learning community is proposing as a best practice the adoption of common guidelines for describing datasets. However, these guidelines are based on natural language descriptions of the dataset, hampering the automatic computation and analysis of such descriptions. To overcome this situation, we present DescribeML, a language engineering tool to precisely describe machine learning datasets in terms of their composition, provenance, and social concerns in a structured format. The tool is implemented as a Visual Studio Code extension.(c) 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons .org /licenses /by-nc -nd /4 .0/).

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.1
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据