☆ 4.4 Article

Information content of protein sequences

JOURNAL OF THEORETICAL BIOLOGY (2000)

期刊

JOURNAL OF THEORETICAL BIOLOGY

卷 206, 期 3, 页码 379-386

出版社

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD

DOI: 10.1006/jtbi.2000.2138

关键词

类别

Biology Mathematical & Computational Biology

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

The complexity of large sets of non-redundant protein sequences is measured. This is done by estimating the Shannon entropy as well as applying compression algorithms to estimate the algorithmic complexity. The estimators are also applied to randomly generated surrogates of the protein data. Our results show that proteins are fairly close to random sequences. The entropy reduction due to correlations is only about 1%. However, precise estimations of the entropy of the source are not possible due to finite sample effects. Compression algorithms also indicate that the redundancy is in the order of 1%. These results confirm the idea that protein sequences can be regarded as slightly edited random strings. We discuss secondary structure and low-complexity regions as causes of the redundancy observed. The findings are related to numerical and biochemical experiments with random polypeptides. (C) 2000 Academic Press.

Information content of protein sequences

期刊

JOURNAL OF THEORETICAL BIOLOGY

出版社

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Information content of protein sequences

期刊

JOURNAL OF THEORETICAL BIOLOGY

出版社

ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文