next up previous
Next: Kimura 2-parameter Correction Up: NUCLEOTIDE DISTANCE MEASURES Previous: Simple counts as

Jukes - Cantor Correction

As the time of divergence between two sequences increases the probability of a second substitution at any one nucleotide site increases and the increase in the count of differences is slowed. This makes these counts not a desirable measure of distance. In some way, this slow down must be accounted for. The solution to this problem was first noted by Jukes and Cantor (1969; Evol.of Protein Molecules, Academic Press). Instead of calculating distance as a simple count take the distance as

(Kimura and Ohta 1972; J.Mol.Evol.2:87-90).

A plot of this function for the same range of parameters as in Figure 1 is given in Figure 2. This figure shows that this distance measure increases linearly with time (this is one property that is desirable for a distance measure). This is termed the Jukes & Cantor correction to distance and clearly indicates that divergence is a logarithmic function of time.

Observe the large increase in the variance as time increases. As D gets closer and closer over time to 0.75 the variance increases. In the limit as D approaches 0.75, the variance approaches infinity. This is an indication that the measure of distance becomes increasingly less reliable as time increases.

Note that in expectation D is less than 0.75 but in reality D can be greater than 0.75. If this is the case then a Jukes-Cantor correction cannot be done - is undefined because the argument of the logarithm will be zero. In this case you can apply a method developed by Tajima (1993, MBE 10:677-688). He suggests using the modified estimator

where

and

With variance

This is actually just a different formulation of the same quantity using a Taylor series expansion to avoid the logarithm. This estimator of distance is defined for all parameter values and actually has less bias than Jukes and Cantor's original correction for small levels of divergence. Tajima provides similar adjustments to all of the corrections noted below.



next up previous
Next: Kimura 2-parameter Correction Up: NUCLEOTIDE DISTANCE MEASURES Previous: Simple counts as