3.9 Article

EightyDVec: a method for protein sequence similarity analysis using physicochemical properties of amino acids

Publisher

TAYLOR & FRANCIS LTD
DOI: 10.1080/21681163.2021.1956369

Keywords

Sequence similarity; amino acids; physiochemical property; Markov chain transition matrix; phylogenetic

Ask authors/readers for more resources

This research introduces an efficient alignment-free tool, EightyDVec, for protein sequence comparison. The method generates feature vectors based on physiochemical properties of amino acids to conveniently compare sequences. Validation on four datasets demonstrated the great effectiveness of EightyDVec in similarity analysis of protein sequences.
Similarity analysis of protein sequences can expose the evolutionary relationship among them. It is required to design effective computational algorithms that can compare the similarities among the colossal amount of sequences. Alignment-based approaches to this problem are often computationally expensive, especially when the number of sequences is large. This research aims to develop an efficient alignment-free tool in the field of protein sequence comparison and phylogenetic study. The proposed method, namely EightyDVec, performs a feature generation process based on the physiochemical properties of amino acids that best describe the evolutionary relationship among the species in a protein family. Using EightyDVec, protein sequences are transformed into 80-dimensional feature vectors and the comparisons between sequences are performed conveniently through these vectors. Four different datasets are used to validate the accuracy of EightyDVec, and the obtained results have shown the great effectiveness of the proposed method in the similarity analysis of protein sequences.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.9
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available