4.6 Article Proceedings Paper

A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations

期刊

BMC BIOINFORMATICS
卷 22, 期 SUPPL 6, 页码 -

出版社

BMC
DOI: 10.1186/s12859-021-03999-8

关键词

Non-coding variants; Epigenome; Semi-supervised learning; Deep learning; Pseudo label

资金

  1. Japan Society for the Promotion of Science (JSPS) KAKENHI [19H03213]
  2. JSPS KAKENHI [20K06606]
  3. Grants-in-Aid for Scientific Research [19H03213, 20K06606] Funding Source: KAKEN

向作者/读者索取更多资源

Understanding the functional effects of non-coding variants is crucial in studying gene-expression regulation and disease development. A novel semi-supervised deep learning model with pseudo labeling has been proposed to improve predictive performance, especially in dealing with limited datasets.
Background Understanding the functional effects of non-coding variants is important as they are often associated with gene-expression alteration and disease development. Over the past few years, many computational tools have been developed to predict their functional impact. However, the intrinsic difficulty in dealing with the scarcity of data leads to the necessity to further improve the algorithms. In this work, we propose a novel method, employing a semi-supervised deep-learning model with pseudo labels, which takes advantage of learning from both experimentally annotated and unannotated data. Results We prepared known functional non-coding variants with histone marks, DNA accessibility, and sequence context in GM12878, HepG2, and K562 cell lines. Applying our method to the dataset demonstrated its outstanding performance, compared with that of existing tools. Our results also indicated that the semi-supervised model with pseudo labels achieves higher predictive performance than the supervised model without pseudo labels. Interestingly, a model trained with the data in a certain cell line is unlikely to succeed in other cell lines, which implies the cell-type-specific nature of the non-coding variants. Remarkably, we found that DNA accessibility significantly contributes to the functional consequence of variants, which suggests the importance of open chromatin conformation prior to establishing the interaction of non-coding variants with gene regulation. Conclusions The semi-supervised deep learning model coupled with pseudo labeling has advantages in studying with limited datasets, which is not unusual in biology. Our study provides an effective approach in finding non-coding mutations potentially associated with various biological phenomena, including human diseases.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据