☆ 4.6 Article

Machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions

PLOS ONE (2022)

Journal

PLOS ONE

Volume 17, Issue 6, Pages -

Publisher

PUBLIC LIBRARY SCIENCE

DOI: 10.1371/journal.pone.0256411

Keywords

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This study improves the accuracy and prediction capability of identifying translation initiation sites in genes associated with neurologic repeat expansion disorders by optimizing feature selection, data sampling, and machine learning algorithms.

A number of neurologic diseases associated with expanded nucleotide repeats, including an inherited form of amyotrophic lateral sclerosis, have an unconventional form of translation called repeat-associated non-AUG (RAN) translation. It has been speculated that the repeat regions in the RNA fold into secondary structures in a length-dependent manner, promoting RAN translation. Repeat protein products are translated, accumulate, and may contribute to disease pathogenesis. Nucleotides that flank the repeat region, especially ones closest to the initiation site, are believed to enhance translation initiation. A machine learning model has been published to help identify ATG and near-cognate translation initiation sites; however, this model has diminished predictive power due to its extensive feature selection and limited training data. Here, we overcome this limitation and increase prediction accuracy by the following: a) capture the effect of nucleotides most critical for translation initiation via feature reduction, b) implement an alternative machine learning algorithm better suited for limited data, c) build comprehensive and balanced training data (via sampling without replacement) that includes previously unavailable sequences, and d) split ATG and near-cognate translation initiation codon data to train two separate models. We also design a supplementary scoring system to provide an additional prognostic assessment of model predictions. The resultant models have high performance, with similar to 85-88% accuracy, exceeding that of the previously published model by >18%. The models presented here are used to identify translation initiation sites in genes associated with a number of neurologic repeat expansion disorders. The results confirm a number of sites of translation initiation upstream of the expanded repeats that have been found experimentally, and predict sites that are not yet established.

Machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions

Journal

PLOS ONE

Publisher

PUBLIC LIBRARY SCIENCE

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions

Journal

PLOS ONE

Publisher

PUBLIC LIBRARY SCIENCE

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper