4.2 Article

Machine learning techniques for pathogenicity prediction of non-synonymous single nucleotide polymorphisms in human body

Publisher

SPRINGER HEIDELBERG
DOI: 10.1007/s12652-021-03581-3

Keywords

Artificial neural network; Machine learning techniques; Mutation; Non-synonymous Single Nucleotide Polymorphisms (nsSNPs); Pathogenic; Variants

Ask authors/readers for more resources

The proposed framework aims to distinguish disease-causing missense variants from neutral mutations by predicting the pathogenicity of SNP-induced amino acid changes. It utilizes an attributes selection tool and a machine learning-based classifier to optimize the set of selected attributes for improved classification accuracy. Benchmark datasets were used to evaluate the framework, and it outperformed other methods in terms of accuracy. Artificial neural network was found to be the best machine learning technique for this task.
The rapidly growing human genetic variation data resulted from high-throughput genotyping and sequencing methods has motivated bioinformaticians to find solutions to deal with this huge amount of data. Single Nucleotide Polymorphism (SNP) is considered one of the important reasons for human genome variability and is involved in many human diseases, such as cancer. The non-synonymous SNP (nsSNP) mutation in the coding region causes amino acid changes which may produce protein functional alterations that may affect cell proliferation and thus it is the reason for many diseases. In this research, a framework is proposed to distinguish disease-causing missense variants from neutral mutations, as its task is pathogenicity prediction. The framework contains two main components which are an attributes selection tool and a classifier based on machine learning techniques (MLTs). The attributes selection tool works conjunctionally with the classifier to select the set of attributes that can classify the required dataset with the highest possible accuracy and a minimum set of attributes. The attributes selection tool is based on the swarm intelligence optimization approach which can optimize the set of selected attributes. Artificial neural network is the adopted MLT in this research, while decision tree and K-nearest Neighbor are investigated for the purpose of comparison. Benchmark datasets were used to evaluate the proposed framework yielding promising findings and results. The results of the proposed framework outperformed other methods on the same benchmark datasets. The best artificial neural network models achieved accuracies of 82%, 96%, 76.3%, for NovelVar, VaribenchSelectedPure and SwissvarFilteredMix respectively. For more validation, the three datasets were combined to form one large dataset, and the achieved accuracy was 82.9%.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.2
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available