4.7 Article

Data Sharing in Chemistry: Lessons Learned and a Case for Mandating Structured Reaction Data

向作者/读者索取更多资源

The past decade has witnessed impressive developments in predictive chemistry and reaction informatics driven by machine learning applications in computer-aided synthesis planning. However, in order to further advance the role of AI in this field, significant improvements in reporting and curating reaction data are needed. Currently, publicly available data is mostly unstructured and biased towards high-yielding reactions, limiting the types of models that can be trained successfully. This Perspective analyzes successful initiatives in data curation and sharing in chemistry and molecular biology, highlighting the Open Reaction Database and proposing actions to make reaction data more FAIR.
The past decade has seen a number of impressive developmentsinpredictive chemistry and reaction informatics driven by machine learningapplications to computer-aided synthesis planning. While many of thesedevelopments have been made even with relatively small, bespoke datasets, in order to advance the role of AI in the field at scale, theremust be significant improvements in the reporting of reaction data.Currently, the majority of publicly available data is reported inan unstructured format and heavily imbalanced toward high-yieldingreactions, which influences the types of models that can be successfullytrained. In this Perspective, we analyze several data curation andsharing initiatives that have seen success in chemistry and molecularbiology. We discuss several factors that have contributed to theirsuccess and how we can take lessons from these case studies and applythem to reaction data. Finally, we spotlight the Open Reaction Databaseand summarize key actions the community can take toward making reactiondata more findable, accessible, interoperable, and reusable (FAIR),including the use of mandates from funding agencies and publishers.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据