Fast Algorithms for Finding Maximum-Density Segments of a Sequence with Applications to Bioinformatics

by Michael H. Goldwasser, Ming-Yang Kao and Hsueh-I Lu

Abstract: We study an abstract optimization problem arising from biomolecular sequence analysis. For a sequence A = < a₁, a₂, ..., a_n > of real numbers, a segment S is a consecutive subsequence < a_i, a_i+1, ..., a_j >. The width of S is j-i+1, while the density is (sum_{{i <= k <= j}} a_k)/(j-i+1). The maximum-density segment problem takes A and two integers L and U as input and asks for a segment of A with the largest possible density among those of width at least L and at most U. If U=n (or equivalently, U = 2L-1), we can solve the problem in O(n) time, improving upon the O(n log L)-time algorithm by Lin, Jiang and Chao for a general sequence A. Furthermore, if U and L are arbitrary, we solve the problem in O(n + n log(U-L+1)) time. There has been no nontrivial result for this case previously. Both results also hold for a weighted variant of the maximum-density segment problem.

Citation:: Fast Algorithms for Finding Maximum-Density Segments of a Sequence with Applications to Bioinformatics; Michael H. Goldwasser, Ming-Yang Kao, and Hsueh-I Lu; Proceedings of the Second Annual Workshop on Algorithms in Bioinformatics (WABI), volume 2452 of Lecture Notes in Computer Science (Springer-Verlag, Berlin), 2002, pp. 157-171.; DOI:10.1145/645907.673125; Download as: [ abstract, pdf, pdf.gz, ps, ps.Z, ps.gz ]
A complete version (with improvements) later appeared as:: Linear-Time Algorithms for Computing Maximum-Density Sequence Segments with Bioinformatics Applications; Michael H. Goldwasser, Ming-Yang Kao, and Hsueh-I Lu; Journal of Computer and System Sciences, 70(2):128-144, 2005.; DOI:10.1016/j.jcss.2004.08.001; Download preprint as: [ abstract, pdf, pdf.gz, ps, ps.gz ]

Michael Goldwasser