☆ 4.5 Article

Alignment-Free Sequence Comparison (I): Statistics and Power

JOURNAL OF COMPUTATIONAL BIOLOGY (2009)

Journal

JOURNAL OF COMPUTATIONAL BIOLOGY

Volume 16, Issue 12, Pages 1615-1634

Publisher

MARY ANN LIEBERT, INC

DOI: 10.1089/cmb.2009.0198

Keywords

alignment-free; normal approximation; normal distribution; sequence alignment; word count statistics

Funding

EPSRC [GR/R52183/01]
BBSRC
EPSRC
National University of Singapore
NIH [P50 HG 002790, R21AG032743]
NATIONAL HUMAN GENOME RESEARCH INSTITUTE [P50HG002790, R21HG006199] Funding Source: NIH RePORTER
NATIONAL INSTITUTE ON AGING [R21AG032743] Funding Source: NIH RePORTER

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Large-scale comparison of the similarities between two biological sequences is a major issue in computational biology; a fast method, the D-2 statistic, relies on the comparison of the k-tuple content for both sequences. Although it has been known for some years that the D-2 statistic is not suitable for this task, as it tends to be dominated by single-sequence noise, to date no suitable adjustments have been proposed. In this article, we suggest two new variants of the D-2 word count statistic, which we call D-2(S) and D-2*. For D-2(S), which is a self-standardized statistic, we show that the statistic is asymptotically normally distributed, when sequence lengths tend to infinity, and not dominated by the noise in the individual sequences. The second statistic, D-2*, outperforms D-2(S) in terms of power for detecting the relatedness between the two sequences in our examples; but although it is straightforward to simulate from the asymptotic distribution of D-2*, we cannot provide a closed form for power calculations.

Alignment-Free Sequence Comparison (I): Statistics and Power

Journal

JOURNAL OF COMPUTATIONAL BIOLOGY

Publisher

MARY ANN LIEBERT, INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Alignment-Free Sequence Comparison (I): Statistics and Power

Journal

JOURNAL OF COMPUTATIONAL BIOLOGY

Publisher

MARY ANN LIEBERT, INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper