4.7 Review

Harnessing model organism genomics to underpin the machine learning-based prediction of essential genes in eukaryotes-Biotechnological implications

Journal

BIOTECHNOLOGY ADVANCES
Volume 54, Issue -, Pages -

Publisher

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.biotechadv.2021.107822

Keywords

Essential gene; Machine learning (ML); Prediction; Eukaryote; Model organisms; Caenorhabditis; Drosophila; Parasite

Funding

  1. National Health and Medical Research Council (NHMRC) of Australia
  2. Australian Research Council (ARC)
  3. Yourgene Health
  4. Research Training Program Scholarship via The University of Melbourne
  5. Fiocruz, Brazil (Fundacao Oswaldo Cruz/Instituto Aggeu Magalhaes-IAM)

Ask authors/readers for more resources

The development of high-quality genomes and advances in functional genomics have led to the need for innovative techniques for predicting and investigating essential genes. Machine learning approaches offer the potential to reliably predict gene importance and could be a valuable tool for biological research.
The availability of high-quality genomes and advances in functional genomics have enabled large-scale studies of essential genes in model eukaryotes, including the 'elegant worm' (Caenorhabditis elegans; Nematoda) and the 'vinegar fly' (Drosophila melanogaster; Arthropoda). However, this is not the case for other, much less-studied organisms, such as socioeconomically important parasites, for which functional genomic platforms usually do not exist. Thus, there is a need to develop innovative techniques or approaches for the prediction, identification and investigation of essential genes. A key approach that could enable the prediction of such genes is machine learning (ML). Here, we undertake an historical review of experimental and computational approaches employed for the characterisation of essential genes in eukaryotes, with a particular focus on model ecdysozoans (C. elegans and D. melanogaster), and discuss the possible applicability of ML-approaches to organisms such as socioeco-nomically important parasites. We highlight some recent results showing that high-performance ML, combined with feature engineering, allows a reliable prediction of essential genes from extensive, publicly available 'omic data sets, with major potential to prioritise such genes (with statistical confidence) for subsequent functional genomic validation. These findings could 'open the door' to fundamental and applied research areas. Evidence of some commonality in the essential gene-complement between these two organisms indicates that an ML-engineering approach could find broader applicability to ecdysozoans such as parasitic nematodes or arthro-pods, provided that suitably large and informative data sets become/are available for proper feature engineering, and for the robust training and validation of algorithms. This area warrants detailed exploration to, for example, facilitate the identification and characterisation of essential molecules as novel targets for drugs and vaccines against parasitic diseases. This focus is particularly important, given the substantial impact that such diseases have worldwide, and the current challenges associated with their prevention and control and with drug resis-tance in parasite populations.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available