☆ 4.7 Article

Unleashing high content screening in hit detection-Benchmarking AI workflows including novelty detection

COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL (2022)

Journal

COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL

Volume 20, Issue -, Pages 5453-5465

Publisher

ELSEVIER

DOI: 10.1016/j.csbj.2022.09.023

Keywords

High -content screening; Machine learning; Deep learning; Classifier; Novelty detection; Bioactives; Hit detection; Cell painting

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Complex mixtures containing natural products are a valuable source of novel drug candidates. High content screening is commonly used to identify potential bioactive compounds. Machine learning algorithms play a crucial role in analyzing the large amounts of data generated by these assays and can predict the biological activity of samples.

Complex mixtures containing natural products are still an interesting source of novel drug candidates. High content screening (HCS) is a popular tool to screen for such. In particular, multiplexed HCS assays promise comprehensive bioactivity profiles, but generate also high amounts of data. Yet, only some machine learning (ML) applications for data analysis are available and these usually require a profound knowledge of the underlying cell biology. Unfortunately, there are no applications that simply predict if samples are biologically active or not (any kind of bioactivity). Within this work, we benchmark ML algorithms for binary classification, starting with classical ML models, which are the standard classifiers of the scikit-learn library or ensemble models of these classifiers (a total of 92 models tested). Followed by a partial least square regression (PLSR)-based classification (44 tested models in total) and simple artificial neural networks (ANNs) with dense layers (72 tested models in total). In addition, a novelty detection (ND) was examined, which is supposed to handle unknown patterns. For the final analysis the models, with and without upstream ND, were tested with two independent data sets. In our analysis, a stacking model, an ensamble model of class ML algorithms, performed best to predict new and unknown data. ND improved the predictions of the models and was useful to handle unknown patterns. Importantly, the classifier presented here can be easily rebuilt and be adapted to the data and demands of other groups. The hit detector (ND + stacking model) is universal and suitable for a broader application to support the search for new drug candidates.(c) 2022 The Author(s). Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Unleashing high content screening in hit detection-Benchmarking AI workflows including novelty detection

Journal

COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Unleashing high content screening in hit detection-Benchmarking AI workflows including novelty detection

Journal

COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper