Linear-Time Algorithms for Computing Maximum-Density Sequence Segments with Bioinformatics Applications

by Michael H. Goldwasser, Ming-Yang Kao, and Hsueh-I Lu

Abstract: We study an abstract optimization problem arising from biomolecular sequence analysis. For a sequence A of pairs (ai,wi) for i = 1, ..., n and wi>0, a segment A(i,j) is a consecutive subsequence of A starting with index i and ending with index j. The width of A(i,j) is w(i,j) = sum{ i <= k <= j } wk and the density is (sum{ i <= k <= j } ak)/w(i,j). The maximum-density segment problem takes A and two values L and U as input and asks for a segment of A with the largest possible density among those of width at least L and at most U. When U is unbounded, we provide a relatively simple, O(n)-time algorithm, improving upon the O(n log L)-time algorithm by Lin, Jiang and Chao. We then extend this result, providing an O(n)-time algorithm for the case when both L and U are specified.

Citation:
Linear-Time Algorithms for Computing Maximum-Density Sequence Segments with Bioinformatics Applications
Michael H. Goldwasser, Ming-Yang Kao, and Hsueh-I Lu
Journal of Computer and System Sciences, 70(2):128-144, 2005.
DOI:10.1016/j.jcss.2004.08.001
Download preprint as: [ pdf, pdf.gz, ps, ps.gz ]

A preliminary version originally appeared as:
Fast Algorithms for Finding Maximum-Density Segments of a Sequence with Applications to Bioinformatics
Michael H. Goldwasser, Ming-Yang Kao, and Hsueh-I Lu
Proceedings of the Second Annual Workshop on Algorithms in Bioinformatics (WABI), volume 2452 of Lecture Notes in Computer Science (Springer-Verlag, Berlin), 2002, pp. 157-171.
DOI:10.1145/645907.673125
Download as: [ abstract, pdf, pdf.gz, ps, ps.Z, ps.gz ]

Michael Goldwasser