☆ 4.5 Article

Revisiting the relationship between compositional sequence complexity and periodicity

COMPUTATIONAL BIOLOGY AND CHEMISTRY (2008)

期刊

COMPUTATIONAL BIOLOGY AND CHEMISTRY

卷 32, 期 1, 页码 17-28

出版社

ELSEVIER SCI LTD

DOI: 10.1016/j.compbiolchem.2007.09.001

关键词

information; hidden periodicity; nucleosome positioning; entropy; E. coli

类别

Biology Computer Science, Interdisciplinary Applications

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Background: Given a big sequence fragment or a set of functionally related sequences we consider two problems of a sequence analysis associated with the given sequence(s). The first problem is to measure sequence complexity (repetitiveness, compactness) to estimate how informative the set as a whole is. Usually an obtained measure should be compared with an appropriate random background calculated using permutation of the given sequences. We propose a novel and effective approach for background information measurement instead of the usual sequence reshuffling. The second problem is to detect a periodic bias to determine if it is one of the set features. Sequence periodicity, when sometimes one has in mind hidden periodicity, is a very basic genomic property. The sequence period of 3, which is considered to characterize coding sequences, and period 10-11, which may be due to the alternation of hydrophobic and hydrophilic amino acids, DNA curvature, and bendability were discovered and described. Searching for periodical biases brought significant results in the study of sequence-dependent nucleosome positioning: nucleosomal sites carry hidden period of about 10.4 bases. Results: Calculated differences between genomic sequences and background showed high biological relevancy of the method that we proposed in this study. Our algorithm was applied to a few natural and artificial datasets. We constructed a simple periodic dataset by replacement of every tenth dinucleotide in each sequence of a trial set by the same dinucleotide CC. We showed that the method reveals the introduced periodicity and that this periodical pattern carries higher information than in uninterrupted subsequences. An application of the method to the nucleosomal dataset revealed a weak pseudo-periodicity of 10.4 nucleotides confirming previous knowledge. An application of the method to Escherichia coli datasets revealed the well-known periodicity of 3 bp as a genic attribute, a secondary genic period slightly larger than 11 bp, and an intergenic period a bit smaller than 11 bp. Conclusions: We reported a novel compositional complexity-based method for sequence analysis. We found that the difference between the sequence complexity of a natural sequence and of background is especially high for a set consisting exclusively of coding sequences. Hidden periodicities were found with no need of any preliminary assumptions regarding a composition of periodic elements. We illustrated the power of the method by studying the sets with known weak periodic properties: a nucleosomal database and sets of different regions of E. coli. We showed that the method conveniently indicated all kinds of periodicity and related features in these sets of DNA sequences. (C) 2007 Elsevier Ltd. All rights reserved.

Revisiting the relationship between compositional sequence complexity and periodicity

期刊

COMPUTATIONAL BIOLOGY AND CHEMISTRY

出版社

ELSEVIER SCI LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Revisiting the relationship between compositional sequence complexity and periodicity

期刊

COMPUTATIONAL BIOLOGY AND CHEMISTRY

出版社

ELSEVIER SCI LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文