Abstract:Current biological evidence suggests a correlation between the function and the position of genes in chromosomes. Examples include operon structure in prokaryotic genomes and similar expression patterns of neighboring genes in some eukaryotic genomes. In this paper, we present a new model and algorithm for identifying conserved gene clusters from pairwise genome comparison. This generalizes a recent model called "gene teams." A gene team is a set of orthologous genes that appear in two or more species, possibly in a different order yet with the distance of adjacent genes in the team for each chromosome always no more than a certain threshold. We remove the constraint in the original model that each gene must have a unique copy in the chromosomes, and thus allow the analysis on complex prokaryotic or eukaryotic genomes with extensive paralogs. Our algorithm runs in O(mn) time and uses O(m+n) space, where m and n are the number of common genes in each chromosomes. We used this approach to study two bacterial genomes, E. coli and B. subtilis and successfully identified 85 conserved clusters, including clusters containing uncharacterized genes and a large cluster consisting of 21 ribosomal proteins. Our implementation is publicly available at http://euler.slu.edu/~goldwasser/cogteams/.