Fast Algorithms for Finding Maximum-Density Segments of a Sequence with Applications to Bioinformatics

by Michael H. Goldwasser, Ming-Yang Kao and Hsueh-I Lu

Abstract: We study an abstract optimization problem arising from biomolecular sequence analysis. For a sequence A = < a1, a2, ..., an > of real numbers, a segment S is a consecutive subsequence < ai, ai+1, ..., aj >. The width of S is j-i+1, while the density is (sum{i <= k <=  j} ak)/(j-i+1). The maximum-density segment problem takes A and two integers L and U as input and asks for a segment of A with the largest possible density among those of width at least L and at most U. If U=n (or equivalently, U = 2L-1), we can solve the problem in O(n) time, improving upon the O(n log L)-time algorithm by Lin, Jiang and Chao for a general sequence A. Furthermore, if U and L are arbitrary, we solve the problem in O(n + n log(U-L+1)) time. There has been no nontrivial result for this case previously. Both results also hold for a weighted variant of the maximum-density segment problem.

Citation:
Fast Algorithms for Finding Maximum-Density Segments of a Sequence with Applications to Bioinformatics
Michael H. Goldwasser, Ming-Yang Kao, and Hsueh-I Lu
Proceedings of the Second Annual Workshop on Algorithms in Bioinformatics (WABI), volume 2452 of Lecture Notes in Computer Science (Springer-Verlag, Berlin), 2002, pp. 157-171.
DOI:10.1145/645907.673125
Download as: [ abstract, pdf, pdf.gz, ps, ps.Z, ps.gz ]

A complete version (with improvements) later appeared as:
Linear-Time Algorithms for Computing Maximum-Density Sequence Segments with Bioinformatics Applications
Michael H. Goldwasser, Ming-Yang Kao, and Hsueh-I Lu
Journal of Computer and System Sciences, 70(2):128-144, 2005.
DOI:10.1016/j.jcss.2004.08.001
Download preprint as: [ abstract, pdf, pdf.gz, ps, ps.gz ]

Michael Goldwasser