4.2 Article

Machine learning techniques for pathogenicity prediction of non-synonymous single nucleotide polymorphisms in human body

出版社

SPRINGER HEIDELBERG
DOI: 10.1007/s12652-021-03581-3

关键词

Artificial neural network; Machine learning techniques; Mutation; Non-synonymous Single Nucleotide Polymorphisms (nsSNPs); Pathogenic; Variants

向作者/读者索取更多资源

The proposed framework aims to distinguish disease-causing missense variants from neutral mutations by predicting the pathogenicity of SNP-induced amino acid changes. It utilizes an attributes selection tool and a machine learning-based classifier to optimize the set of selected attributes for improved classification accuracy. Benchmark datasets were used to evaluate the framework, and it outperformed other methods in terms of accuracy. Artificial neural network was found to be the best machine learning technique for this task.
The rapidly growing human genetic variation data resulted from high-throughput genotyping and sequencing methods has motivated bioinformaticians to find solutions to deal with this huge amount of data. Single Nucleotide Polymorphism (SNP) is considered one of the important reasons for human genome variability and is involved in many human diseases, such as cancer. The non-synonymous SNP (nsSNP) mutation in the coding region causes amino acid changes which may produce protein functional alterations that may affect cell proliferation and thus it is the reason for many diseases. In this research, a framework is proposed to distinguish disease-causing missense variants from neutral mutations, as its task is pathogenicity prediction. The framework contains two main components which are an attributes selection tool and a classifier based on machine learning techniques (MLTs). The attributes selection tool works conjunctionally with the classifier to select the set of attributes that can classify the required dataset with the highest possible accuracy and a minimum set of attributes. The attributes selection tool is based on the swarm intelligence optimization approach which can optimize the set of selected attributes. Artificial neural network is the adopted MLT in this research, while decision tree and K-nearest Neighbor are investigated for the purpose of comparison. Benchmark datasets were used to evaluate the proposed framework yielding promising findings and results. The results of the proposed framework outperformed other methods on the same benchmark datasets. The best artificial neural network models achieved accuracies of 82%, 96%, 76.3%, for NovelVar, VaribenchSelectedPure and SwissvarFilteredMix respectively. For more validation, the three datasets were combined to form one large dataset, and the achieved accuracy was 82.9%.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.2
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据