☆ 4.8 Article

Advances, challenges and opportunities in creating data for trustworthy AI

NATURE MACHINE INTELLIGENCE (2022)

期刊

NATURE MACHINE INTELLIGENCE

卷 4, 期 8, 页码 669-677

出版社

NATURE PORTFOLIO

DOI: 10.1038/s42256-022-00516-1

关键词

类别

Computer Science, Artificial Intelligence Computer Science, Interdisciplinary Applications

资金

NSF CAREER grant

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

It is clear that creating, using, and maintaining high-quality annotated datasets for AI applications requires careful attention. This Perspective explores the challenges and best practices for different stages in the data-to-AI pipeline, aiming to promote a more data-centric approach. As AI moves from research to deployment, developing appropriate datasets and data pipelines for AI models becomes increasingly challenging. While publicly available AI model builders can achieve top performance, the design and curation of data for AI often rely on bespoke manual work, which significantly impacts the model's trustworthiness. This Perspective discusses key considerations from data design to evaluation, highlighting technical advancements for a more scalable and rigorous data-for-AI pipeline. Additionally, it explores the impact of recent data regulations and policies on AI.

It has become rapidly clear in the past few years that the creation, use and maintenance of high-quality annotated datasets for robust and reliable AI applications requires careful attention. This Perspective discusses challenges, considerations and best practices for various stages in the data-to-AI pipeline, to encourage a more data-centric approach. As artificial intelligence (AI) transitions from research to deployment, creating the appropriate datasets and data pipelines to develop and evaluate AI models is increasingly the biggest challenge. Automated AI model builders that are publicly available can now achieve top performance in many applications. In contrast, the design and sculpting of the data used to develop AI often rely on bespoke manual work, and they critically affect the trustworthiness of the model. This Perspective discusses key considerations for each stage of the data-for-AI pipeline-starting from data design to data sculpting (for example, cleaning, valuation and annotation) and data evaluation-to make AI more reliable. We highlight technical advances that help to make the data-for-AI pipeline more scalable and rigorous. Furthermore, we discuss how recent data regulations and policies can impact AI.

Advances, challenges and opportunities in creating data for trustworthy AI

期刊

NATURE MACHINE INTELLIGENCE

出版社

NATURE PORTFOLIO

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Advances, challenges and opportunities in creating data for trustworthy AI

期刊

NATURE MACHINE INTELLIGENCE

出版社

NATURE PORTFOLIO

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文