A measure representation of protein sequences similar to the measure representation of DNA sequences proposed in our previous paper [Yu , Phys. Rev. E 64, 031903 (2001)] and another induced measure are introduced. Multifractal analysis is then performed on these two kinds of measures of a large number of protein sequences derived from corresponding complete genomes. From the values of the D-q (generalized dimensions) spectra and related C-q (analogous specific heat) curves, it is concluded that these protein sequences are not completely random sequences. For substrings with length K=5, the D-q spectra of all organisms studied are multifractal-like and sufficiently smooth for the C-q curves to be meaningful. The C-q curves of all bacteria resemble a classical phase transition at a critical point. But the analogous phase transitions of higher organisms studied exhibit the shape of double-peaked specific heat function. But for the classification problem, the multifractal property is not sufficient. When the measure representations of protein sequences from complete genomes are considered as time series, a method based on correlation analysis after removing some memory from the time series is proposed to construct a phylogenetic tree. This construction is shown to be reasonably satisfactory.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据