4.4 Article

Information content of protein sequences

期刊

JOURNAL OF THEORETICAL BIOLOGY
卷 206, 期 3, 页码 379-386

出版社

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD
DOI: 10.1006/jtbi.2000.2138

关键词

-

向作者/读者索取更多资源

The complexity of large sets of non-redundant protein sequences is measured. This is done by estimating the Shannon entropy as well as applying compression algorithms to estimate the algorithmic complexity. The estimators are also applied to randomly generated surrogates of the protein data. Our results show that proteins are fairly close to random sequences. The entropy reduction due to correlations is only about 1%. However, precise estimations of the entropy of the source are not possible due to finite sample effects. Compression algorithms also indicate that the redundancy is in the order of 1%. These results confirm the idea that protein sequences can be regarded as slightly edited random strings. We discuss secondary structure and low-complexity regions as causes of the redundancy observed. The findings are related to numerical and biochemical experiments with random polypeptides. (C) 2000 Academic Press.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据