4.6 Article

A machine learning framework that integrates multi-omics data predicts cancer-related LncRNAs

Journal

BMC BIOINFORMATICS
Volume 22, Issue 1, Pages -

Publisher

BMC
DOI: 10.1186/s12859-021-04256-8

Keywords

LncRNA; Multi-omics data; Machine learning; Neural network; Node embedding; Cancer

Funding

  1. National Key R&D Program of China [2019YFB1404700, 2018AAA0100100]
  2. National Natural Science Foundation of China [61861146002, 61732012, 61932008]
  3. Natural Science Foundation of Shandong Province, China [ZR2020QF038]

Ask authors/readers for more resources

In this study, a new machine learning approach called LGDLDA was proposed for predicting disease-related lncRNAs based on multi-omics data, machine learning methods, and neural network neighborhood information aggregation. The method integrates neighborhood information in similarity matrices and uses embedded node representations to approximate observed matrices, achieving accurate and effective prediction of disease-related lncRNAs. The proposed method outperformed other lncRNA-disease prediction methods in terms of stability and performance improvement in cancer-related lncRNA predictions.
Background LncRNAs (Long non-coding RNAs) are a type of non-coding RNA molecule with transcript length longer than 200 nucleotides. LncRNA has been novel candidate biomarkers in cancer diagnosis and prognosis. However, it is difficult to discover the true association mechanism between lncRNAs and complex diseases. The unprecedented enrichment of multi-omics data and the rapid development of machine learning technology provide us with the opportunity to design a machine learning framework to study the relationship between lncRNAs and complex diseases. Results In this article, we proposed a new machine learning approach, namely LGDLDA (LncRNA-Gene-Disease association networks based LncRNA-Disease Association prediction), for disease-related lncRNAs association prediction based multi-omics data, machine learning methods and neural network neighborhood information aggregation. Firstly, LGDLDA calculates the similarity matrix of lncRNA, gene and disease respectively, and it calculates the similarity between lncRNAs through the lncRNA expression profile matrix, lncRNA-miRNA interaction matrix and lncRNA-protein interaction matrix. We obtain gene similarity matrix by calculating the lncRNA-gene association matrix and the gene-disease association matrix, and we obtain disease similarity matrix by calculating the disease ontology, the disease-miRNA association matrix, and Gaussian interaction profile kernel similarity. Secondly, LGDLDA integrates the neighborhood information in similarity matrices by using nonlinear feature learning of neural network. Thirdly, LGDLDA uses embedded node representations to approximate the observed matrices. Finally, LGDLDA ranks candidate lncRNA-disease pairs and then selects potential disease-related lncRNAs. Conclusions Compared with lncRNA-disease prediction methods, our proposed method takes into account more critical information and obtains the performance improvement cancer-related lncRNA predictions. Randomly split data experiment results show that the stability of LGDLDA is better than IDHI-MIRW, NCPLDA, LncDisAP and NCPHLDA. The results on different simulation data sets show that LGDLDA can accurately and effectively predict the disease-related lncRNAs. Furthermore, we applied the method to three real cancer data including gastric cancer, colorectal cancer and breast cancer to predict potential cancer-related lncRNAs.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available