Course Home | Assignments | Data Sets/Tools | Python | Schedule | Git Submission | Tutoring

Distance Measurement in Molecular Phylogenetics


Reading: Chapter 6

Overview: Our goal is to use genetic analyses to predict evolutionary relationships between a collection of species and produce phylogenetic trees that portray those likely relationships. However, the starting point for any such analysis is typically to make accurate predictions as to the pairwise distance between any two species.


Distance Measurements

Goal: to determine a metric that not only makes clear which species are more closely related, but which can estimate the evolutionary time since two species may have diverged.

Consideration: How to measure distance will depend greatly on whether you are trying to do phylogenetic analysis of a group of closely related species (e.g., different types of whales), or a widely divergent group of species (e.g., the full tree of life).

Candidate measures:


Interpreting a conserved gene as a molecular clock

So let's assume that we have a specific gene that is conserved across all species in a study, and that we can do full pairwise sequence alignment between that gene for each pair of species. There is still a question of how to best use that as a "molecular clock" to accurately predict how much time was likely to have passed since two species diverged from each other. A simple approach is to look at the number of substitutions in the reference gene for the two species, and then to presume that those substitutions happen at a consistent rate over time. That might allow us to directly estimate time proportional to the number of observed substitutions in the common-day sequences.

However that is an over-simplified model for the following reasons:

Our goal is to determine:

From those, we can calculate the substitution rate, $r = \frac{K}{2T}$, noting the constant 2 in the denominator because both derived species have presumably been mutating for $T$ units of time relative to the presumed common ancestor. We consider three increasingly complex models for estimating $K$ from the observed sequences.


Michael Goldwasser
Last modified: Wednesday, 20 March 2019
Course Home | Assignments | Data Sets/Tools | Python | Schedule | Git Submission | Tutoring