Linear-Time Algorithms for Computing Maximum-Density Sequence Segments with Bioinformatics Applications

by Michael H. Goldwasser, Ming-Yang Kao, and Hsueh-I Lu

Abstract: We study an abstract optimization problem arising from biomolecular sequence analysis. For a sequence A of pairs (a_i,w_i) for i = 1, ..., n and w_i>0, a segment A(i,j) is a consecutive subsequence of A starting with index i and ending with index j. The width of A(i,j) is w(i,j) = sum_{{ i <= k <= j }} w_k and the density is (sum_{{ i <= k <= j }} a_k)/w(i,j). The maximum-density segment problem takes A and two values L and U as input and asks for a segment of A with the largest possible density among those of width at least L and at most U. When U is unbounded, we provide a relatively simple, O(n)-time algorithm, improving upon the O(n log L)-time algorithm by Lin, Jiang and Chao. We then extend this result, providing an O(n)-time algorithm for the case when both L and U are specified.

Citation:: Linear-Time Algorithms for Computing Maximum-Density Sequence Segments with Bioinformatics Applications; Michael H. Goldwasser, Ming-Yang Kao, and Hsueh-I Lu; Journal of Computer and System Sciences, 70(2):128-144, 2005.; DOI:10.1016/j.jcss.2004.08.001; Download preprint as: [ pdf, pdf.gz, ps, ps.gz ]
A preliminary version originally appeared as:: Fast Algorithms for Finding Maximum-Density Segments of a Sequence with Applications to Bioinformatics; Michael H. Goldwasser, Ming-Yang Kao, and Hsueh-I Lu; Proceedings of the Second Annual Workshop on Algorithms in Bioinformatics (WABI), volume 2452 of Lecture Notes in Computer Science (Springer-Verlag, Berlin), 2002, pp. 157-171.; DOI:10.1145/645907.673125; Download as: [ abstract, pdf, pdf.gz, ps, ps.Z, ps.gz ]

Michael Goldwasser