☆ 4.6 Article

Singular value decomposition of protein sequences as a method to visualize sequence and residue space

PROTEIN SCIENCE (2022)

期刊

PROTEIN SCIENCE

卷 31, 期 10, 页码 -

出版社

WILEY

DOI: 10.1002/pro.4422

关键词

bioinformatics; protein design; singular value decomposition; taxonomy

类别

Biochemistry & Molecular Biology

资金

National Institute of General Medical Sciences [GM068462]
Johns Hopkins
National Institutes of Health

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Singular value decomposition (SVD) of multiple sequence alignments (MSAs) is an important method for identifying and extracting features of sequence subgroups that are related to protein structure, function, stability, and taxonomy. This article provides a comprehensive description of the mathematics behind SVD and demonstrates its application through examples and analysis of protein families. Python scripts for SVD analysis of MSAs are also provided.

Singular value decomposition (SVD) of multiple sequence alignments (MSAs) is an important and rigorous method to identify subgroups of sequences within the MSA, and to extract consensus and covariance sequence features that define the alignment and distinguish the subgroups. This information can be correlated to structure, function, stability, and taxonomy. However, the mathematics of SVD is unfamiliar to many in the field of protein science. Here, we attempt to present an intuitive yet comprehensive description of SVD analysis of MSAs. We begin by describing the underlying mathematics of SVD in a way that is both rigorous and accessible. Next, we use SVD to analyze sequences generated with a simplified model in which the extent of sequence conservation and covariance between different positions is controlled, to show how conservation and covariance produce features in the decomposed coordinate system. We then use SVD to analyze alignments of two protein families, the homeodomain and the Ras superfamilies. Both families show clear evidence of sequence clustering when projected into singular value space. We use k-means clustering to group MSA sequences into specific clusters, show how the residues that distinguish these clusters can be identified, and show how these clusters can be related to taxonomy and function. We end by providing a description a set of Python scripts that can be used for SVD analysis of MSAs, displaying results, and identifying and analyzing sequence clusters. These scripts are freely available on GitHub.

Singular value decomposition of protein sequences as a method to visualize sequence and residue space

期刊

PROTEIN SCIENCE

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Singular value decomposition of protein sequences as a method to visualize sequence and residue space

期刊

PROTEIN SCIENCE

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文