4.7 Article

Data Sharing in Chemistry: Lessons Learned and a Case for Mandating Structured Reaction Data

相关参考文献

注意:仅列出部分参考文献,下载原文获取全部文献信息。
Article Chemistry, Medicinal

Call for a Public Open Database of All Chemical Reactions

Pierre Baldi

Summary: Currently, there is no publicly available comprehensive database of all known chemical reactions and associated information, but establishing such a database is crucial for the development of chemical sciences and technologies, as well as for leveraging the power of modern AI and machine learning methods. While an international consortium would ideally create and maintain this repository, starting the process through governmental agencies like the National Science Foundation or the National Institutes of Health might be more feasible in the near future, using a multipronged approach that involves negotiations with commercial stakeholders, crowd-sourcing actions, automated extraction methods, and legislative actions.

JOURNAL OF CHEMICAL INFORMATION AND MODELING (2022)

Article Chemistry, Medicinal

Improving Few- and Zero-Shot Reaction Template Prediction Using Modern Hopfield Networks

Philipp Seidl et al.

Summary: Finding synthesis routes for molecules is crucial for drug and material discovery. This study introduces a template-based single-step retrosynthesis model, which significantly improves template relevance prediction by learning an encoding of molecules and reaction templates.

JOURNAL OF CHEMICAL INFORMATION AND MODELING (2022)

Article Chemistry, Medicinal

Prediction of the Chemical Context for Buchwald-Hartwig Coupling Reactions

Samuel Genheden et al.

Summary: This study presents machine learning models for predicting the chemical context in Buchwald-Hartwig coupling reactions. The models show excellent accuracy and sensitivity, with the multi-label model outperforming the single-label model. However, it is important to periodically re-train the models due to the temporal characteristic of context usage.

MOLECULAR INFORMATICS (2022)

News Item Multidisciplinary Sciences

NIH ISSUES A SEISMIC MANDATE: SHARE DATA PUBLICLY

Max Kozlov

Summary: The data-sharing policy has the potential to establish a global standard for biomedical research, according to scientists, but they have concerns regarding logistics and equity.

NATURE (2022)

Article Chemistry, Medicinal

LAutomated Chemical Reaction Extraction from Scientific Literature

Jiang Guo et al.

Summary: Access to structured chemical reaction data is crucial for chemists in bench experiments and applications like computer-aided drug design. This study focuses on developing automated methods for extracting reactions from chemical literature. Two-stage deep learning models based on Transformer are utilized, achieving high performance and data efficiency with only hundreds of annotated reactions.

JOURNAL OF CHEMICAL INFORMATION AND MODELING (2022)

Article Health Care Sciences & Services

Many researchers were not compliant with their published data sharing statement: a mixed-methods study

Mirko Gabelica et al.

Summary: This study analyzed the compliance of researchers with their data availability statements (DAS) in manuscripts published in open-access journals. The results suggest that even when authors indicate their willingness to share data, the actual compliance rate is the same as for authors who do not provide the DAS, indicating that the DAS may not be sufficient to ensure data sharing.

JOURNAL OF CLINICAL EPIDEMIOLOGY (2022)

Article Chemistry, Multidisciplinary

Predicting reaction conditions from limited data through active transfer learning

Eunjae Shim et al.

Summary: This article demonstrates the use of transfer learning and active learning to accelerate the development of new chemical reactions. Specifically tuned machine learning models based on random forest classifiers are used to expand the applicability of Pd-catalyzed cross-coupling reactions to new types of nucleophiles. The results show that model transfer is effective even when trained on relatively small amounts of data. Additionally, a model simplification scheme and an active transfer learning strategy are introduced to improve the predictive capability of the models.

CHEMICAL SCIENCE (2022)

Article Computer Science, Artificial Intelligence

Chemformer: a pre-trained transformer for computational chemistry

Ross Irwin et al.

Summary: This study presents a Transformer-based model, called Chemformer, which can be quickly applied to sequence-to-sequence and discriminative cheminformatics tasks. The use of self-supervised pre-training improves performance and speeds up convergence on downstream tasks. The model achieves state-of-the-art results for accuracy in the field of cheminformatics.

MACHINE LEARNING-SCIENCE AND TECHNOLOGY (2022)

Article Chemistry, Medicinal

Multilabel Classification Models for the Prediction of Cross-Coupling Reaction Conditions

Michael R. Maser et al.

Summary: Machine-learned ranking models have been developed for predicting substrate-specific cross-coupling reaction conditions. Graph encodings and gradient-boosting machines were found to be very effective for this learning task, with a novel reaction-level graph attention operation in the top-performing model.

JOURNAL OF CHEMICAL INFORMATION AND MODELING (2021)

Review Chemistry, Multidisciplinary

The Open Reaction Database

Steven M. Kearnes et al.

Summary: The study introduces the Open Reaction Database (ORD) schema and infrastructure for structuring and sharing organic reaction data, providing a centralized data repository on GitHub. This consistent data representation and infrastructure aim to enhance the development of computer-aided synthesis planning, reaction prediction, and other predictive chemistry tasks.

JOURNAL OF THE AMERICAN CHEMICAL SOCIETY (2021)

Article Chemistry, Medicinal

ReactionDataExtractor: A Tool for Automated Extraction of Information from Chemical Reaction Schemes

Damian M. Wilary et al.

Summary: This work presents ReactionDataExtractor, a software tool for automatic extraction of information from multistep reaction schemes. It uses a combination of rules and unsupervised machine-learning approaches to identify various components in the reaction schemes. The tool achieved precision and recall metrics of 67% to 91% in data extraction.

JOURNAL OF CHEMICAL INFORMATION AND MODELING (2021)

Article Biochemistry & Molecular Biology

PubChem in 2021: new data content and improved web interfaces

Sunghwan Kim et al.

Summary: PubChem, a popular chemical information resource, has made substantial improvements in the past two years by adding data from over 100 new sources, updating its homepage and record pages, introducing new services like the Periodic Table and Pathway pages, and creating a special data collection related to COVID-19 and SARS-CoV-2 in response to the pandemic.

NUCLEIC ACIDS RESEARCH (2021)

Editorial Material Chemistry, Organic

Encouraging Submission of FAIR Data at The Journal of Organic Chemistry and Organic Letters

Angela M. Hunter et al.

JOURNAL OF ORGANIC CHEMISTRY (2020)

Article Chemistry, Medicinal

Ring Breaker: Neural Network Driven Synthesis Prediction of the Ring System Chemical Space

Amol Thakkar et al.

JOURNAL OF MEDICINAL CHEMISTRY (2020)

Editorial Material Chemistry, Organic

Toward FAIRness and a User-Friendly Repository for Supporting NMR Data

Barbara C. Sorkin et al.

JOURNAL OF ORGANIC CHEMISTRY (2020)

Article Chemistry, Multidisciplinary

Molecular Machine Learning: The Future of Synthetic Chemistry?

Philipp M. Pflueger et al.

ANGEWANDTE CHEMIE-INTERNATIONAL EDITION (2020)

Article Chemistry, Multidisciplinary

AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning

Samuel Genheden et al.

JOURNAL OF CHEMINFORMATICS (2020)

Article Multidisciplinary Sciences

State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis

Igor V. Tetko et al.

NATURE COMMUNICATIONS (2020)

Article Chemistry, Multidisciplinary

Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy

Philippe Schwaller et al.

CHEMICAL SCIENCE (2020)

Article Multidisciplinary Sciences

The digitization of organic synthesis

Ian W. Davies

NATURE (2019)

Article Multidisciplinary Sciences

A robotic platform for flow synthesis of organic compounds informed by AI planning

Connor W. Coley et al.

SCIENCE (2019)

Article Chemistry, Multidisciplinary

Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction

Philippe Schwaller et al.

ACS CENTRAL SCIENCE (2019)

Article Biochemistry & Molecular Biology

PubChem 2019 update: improved access to chemical data

Sunghwan Kim et al.

NUCLEIC ACIDS RESEARCH (2019)

Review Chemistry, Multidisciplinary

Machine Learning in Computer-Aided Synthesis Planning

Connor W. Coley et al.

ACCOUNTS OF CHEMICAL RESEARCH (2018)

Article Multidisciplinary Sciences

Predicting reaction performance in C-N cross-coupling using machine learning

Derek T. Ahneman et al.

SCIENCE (2018)

Article Multidisciplinary Sciences

Planning chemical syntheses with deep neural networks and symbolic AI

Marwin H. S. Segler et al.

NATURE (2018)

Review Crystallography

Should we remediate small molecule structures? If so, who should do it?

Carl H. Schwalbe

CRYSTALLOGRAPHY REVIEWS (2018)

Article Chemistry, Multidisciplinary

Using Machine Learning To Predict Suitable Conditions for Organic Reactions

Hanyu Gao et al.

ACS CENTRAL SCIENCE (2018)

Article Biochemistry & Molecular Biology

Protein Ontology (PRO): enhancing and scaling up the representation of protein entities

Darren A. Natale et al.

NUCLEIC ACIDS RESEARCH (2017)

Article Biochemistry & Molecular Biology

PubChem BioAssay: 2017 update

Yanli Wang et al.

NUCLEIC ACIDS RESEARCH (2017)

Article Chemistry, Multidisciplinary

Electronic lab notebooks: can they replace paper?

Samantha Kanza et al.

JOURNAL OF CHEMINFORMATICS (2017)

Article Chemistry, Multidisciplinary

Prediction of Organic Reaction Outcomes Using Machine Learning

Connor W. Coley et al.

ACS CENTRAL SCIENCE (2017)

Review Biochemistry & Molecular Biology

Dissemination of original NMR data enhances reproducibility and integrity in chemical research

Jonathan Bisson et al.

NATURAL PRODUCT REPORTS (2016)

Article Chemistry, Multidisciplinary

PubChemRDF: towards the semantic annotation of PubChem compound and substance databases

Gang Fu et al.

JOURNAL OF CHEMINFORMATICS (2015)

Article Multidisciplinary Sciences

Nanomole-scale high-throughput chemistry for the synthesis of complex molecules

Alexander Buitrago Santanilla et al.

SCIENCE (2015)

Article Biochemistry & Molecular Biology

PubChem's BioAssay Database

Yanli Wang et al.

NUCLEIC ACIDS RESEARCH (2012)

Article Chemistry, Multidisciplinary

Deducing chemical structure from crystallographically determined atomic coordinates

Ian J. Bruno et al.

ACTA CRYSTALLOGRAPHICA SECTION B-STRUCTURAL SCIENCE (2011)

Article Chemistry, Multidisciplinary

Space groups P1 and Cc: how are they doing?

Richard E. Marsh

ACTA CRYSTALLOGRAPHICA SECTION B-STRUCTURAL SCIENCE CRYSTAL ENGINEERING AND MATERIALS (2009)

Article Biochemistry & Molecular Biology

PubChem: a public information system for analyzing bioactivities of small molecules

Yanli Wang et al.

NUCLEIC ACIDS RESEARCH (2009)

Article Chemistry, Multidisciplinary

Further conventions for NMR shielding and chemical shifts (IUPAC recommendations 2008)

Robin K. Harris et al.

PURE AND APPLIED CHEMISTRY (2008)

Article Biochemistry & Molecular Biology

The Universal Protein Resource (UniProt)

Amos Bairoch et al.

NUCLEIC ACIDS RESEARCH (2008)

Article Chemistry, Multidisciplinary

NMR nomenclature. Nuclear spin properties and conventions for chemical shifts - (IUPAC recommendations 2001)

RK Harris et al.

PURE AND APPLIED CHEMISTRY (2001)