4.0 Article

Tailored machine learning models for functional RNA detection in genome-wide screens

Journal

NAR GENOMICS AND BIOINFORMATICS
Volume 5, Issue 3, Pages -

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/nargab/lqad072

Keywords

-

Ask authors/readers for more resources

This article introduces a software framework for in silico prediction of non-coding and protein-coding genetic loci, which allows for the alignment-based training, evaluation, and application of machine learning models with user-defined parameters. Instead of using the one-size-fits-all approach of pervasive in silico annotation pipelines, this framework focuses on the structured generation and evaluation of models based on arbitrary features and input data, aiming for stable and explainable results. Furthermore, the software package is applied to a full-genome screen of Drosophila melanogaster and evaluated against the well-known but less flexible program RNAz.
The in silico prediction of non-coding and protein-coding genetic loci has received considerable attention in comparative genomics aiming in particular at the identification of properties of nucleotide sequences that are informative of their biological role in the cell. We present here a software framework for the alignment-based training, evaluation and application of machine learning models with user-defined parameters. Instead of focusing on the one-size-fits-all approach of pervasive in silico annotation pipelines, we offer a framework for the structured generation and evaluation of models based on arbitrary features and input data, focusing on stable and explainable results. Furthermore, we showcase the usage of our software package in a full-genome screen of Drosophila melanogaster and evaluate our results against the well-known but much less flexible program RNAz.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.0
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available