4.5 Article

Accurate identification of RNA D modification using multiple features

Journal

RNA BIOLOGY
Volume 18, Issue 12, Pages 2236-2246

Publisher

TAYLOR & FRANCIS INC
DOI: 10.1080/15476286.2021.1898160

Keywords

Dihydrouridine; prediction; imbalanced Datasets; feature Selection; XGBoost

Funding

  1. Natural Science Foundation of China [61902259, 62002242]
  2. Natural Science Foundation of Guangdong Province [2018A0303130084]
  3. Scientific Research Foundation in Shenzhen [JCYJ20170818100431895, JCYJ20180306172207178]
  4. Post-doctoral Foundation Project of Shenzhen Polytechnic [6020330003K]

Ask authors/readers for more resources

The researchers proposed a novel predictor, iRNAD_XGBoost, to identify potential D modification sites in tRNAs using multiple RNA sequence representations. The optimized model showed high accuracy in cross-validation tests and demonstrated consistent prediction efficiencies for positive and negative samples.
As one of the common post-transcriptional modifications in tRNAs, dihydrouridine (D) has prominent effects on regulating the flexibility of tRNA as well as cancerous diseases. Facing with the expensive and time-consuming sequencing techniques to detect D modification, precise computational tools can largely promote the progress of molecular mechanisms and medical developments. We proposed a novel predictor, called iRNAD_XGBoost, to identify potential D sites using multiple RNA sequence representations. In this method, by considering the imbalance problem using hybrid sampling method SMOTEEEN, the XGBoost-selected top 30 features are applied to construct model. The optimized model showed high Sn and Sp values of 97.13% and 97.38% over jackknife test, respectively. For the independent experiment, these two metrics separately achieved 91.67% and 94.74%. Compared with iRNAD method, this model illustrated high generalizability and consistent prediction efficiencies for positive and negative samples, which yielded satisfactory MCC scores of 0.94 and 0.86, respectively. It is inferred that the chemical property and nucleotide density features (CPND), electron-ion interaction pseudopotential (EIIP and PseEIIP) as well as dinucleotide composition (DNC) are crucial to the recognition of D modification. The proposed predictor is a promising tool to help experimental biologists investigate molecular functions.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available