4.8 Article

Transformational machine learning: Learning how to learn from many related scientific problems

Publisher

NATL ACAD SCIENCES
DOI: 10.1073/pnas.2108013118

Keywords

AI; drug design; transfer learning; stacking; multitask learning

Funding

  1. Wallenberg AI, Autonomous Systems and Software Program (WASP) - Knut and Alice Wallenberg Foundation
  2. Engineering and Physical Sciences Research Council project Robot Chemist
  3. Alan Turing Institute project Spatial Learning: Applications in Structure Based Drug Design
  4. Engineering and Physical Sciences Research Council project Action on Cancer

Ask authors/readers for more resources

The transformational machine learning (TML) method, which transforms features by training ML models on other tasks, significantly improves predictive performance across various domains of machine learning. The TML features generally outperform intrinsic features, leading to enhanced scientific understanding and ecosystem-based approach to ML.
Almost all machine learning (ML) is based on representing examples using intrinsic features. When there aremultiple relatedML problems (tasks), it is possible to transform these features into extrinsic features by first training ML models on other tasks and letting them each make predictions for each example of the new task, yielding a novel representation. We call this transformational ML (TML). TML is very closely related to, and synergistic with, transfer learning, multitask learning, and stacking. TML is applicable to improving any nonlinear ML method. We tested TML using the most important classes of nonlinear ML: random forests, gradient boosting machines, support vector machines, k-nearest neighbors, and neural networks. To ensure the generality and robustness of the evaluation, we utilized thousands of ML problems from three scientific domains: drug design, predicting gene expression, and ML algorithm selection. We found that TML significantly improved the predictive performance of all the ML methods in all the domains (4 to 50% average improvements) and that TML features generally outperformed intrinsic features. Use of TML also enhances scientific understanding through explainable ML. In drug design, we found that TML provided insight into drug target specificity, the relationships between drugs, and the relationships between target proteins. TML leads to an ecosystem-based approach to ML, where new tasks, examples, predictions, and so on synergistically interact to improve performance. To contribute to this ecosystem, all our data, code, and our similar to 50,000 ML models have been fully annotated with metadata, linked, and openly published using Findability, Accessibility, Interoperability, and Reusability principles (similar to 100 Gbytes).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available