4.4 Article

An optimal algorithm for the maximum-density segment problem

Journal

SIAM JOURNAL ON COMPUTING
Volume 34, Issue 2, Pages 373-387

Publisher

SIAM PUBLICATIONS
DOI: 10.1137/S0097539704440430

Keywords

bioinformatics; biological sequence analysis; maximum-density segment; slope selection; computational geometry; sequence algorithm; data structure

Ask authors/readers for more resources

We address a fundamental problem arising from analysis of biomolecular sequences. The input consists of two numbers w(min) and w(max) and a sequence S of n number pairs (a(i), w(i)) with w(i) > 0. Let segment S(i, j) of S be the consecutive subsequence of S between indices i and j. The density of S(i, j) is d(i, j) = (a(i) + a(i+1) + ... + a(j))/(w(i) + w(i+1) + ... + w(j)). The maximum-density segment problem is to find a maximum-density segment over all segments S(i, j) with w(min) = w(i) + w(i+1) + ... + w(j) = w(max). The best previously known algorithm for the problem, due to Goldwasser, Kao, and Lu [Proceedings of the Second International Workshop on Algorithms in Bioinformatics, R. Guigo and D. Gusfield, eds., Lecture Notes in Comput. Sci. 2452, Springer-Verlag, New York, 2002, pp. 157-171], runs in O(n log(w(max)-w(min) + 1)) time. In the present paper, we solve the problem in O(n) time. Our approach bypasses the complicated right-skew decomposition, introduced by Lin, Jiang, and Chao [J. Comput. System Sci., 65 (2002), pp. 570-586]. As a result, our algorithm has the capability to process the input sequence in an online manner, which is an important feature for dealing with genome-scale sequences. Moreover, for a type of input sequences S representable in O(m) space, we show how to exploit the sparsity of S and solve the maximum-density segment problem for S in O(m) time.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available