next up previous
Next: IAM based F-statistics Up: Microsatellite's and Genetic Distance Previous: IAM based distance

SMM/TPM based distance measures

 

For each of the previous estimates of distance the mutational process was assumed to be the IAM. As discussed previously, microsatellites are more likely evolving via the SMM or TPM, and high levels of homoplasy exist. This homoplasy will lead to underestimation of the true genetic distance in all of the above distance estimates. Slatkin (1995) and Goldstein et al. (1995a) have developed equivalent distance measures which take into account the consequences of the SMM. In the SMM the alleles themselves contain information about the previous mutation events. For example, under the SMM, you can assume that two alleles that differ by three repeat units must have had at least three mutational events occur since their most common ancestor. As such,

where a is the number of repeats of an allele, is the number of mutations that have occurred, and is the increment in repeat length. The expected mean of the distribution of , i.e. , is zero with a variance . This model directly relates to the TPM mutational model described by Di Rienzo et al. (1994). As mentioned previously, when there were no mutations of large effect, i.e. the SMM. Slatkin (1995) developed a genetic distance measure based on the average sum of squares of the difference in allele size. Average within population distance is calculated by

where is the number of subpopulations j examined, n is the sample size of each population, and is the number of repeats of the ith allele in the jth subpopulation. Average between population distance is

and the average distance for the entire population is

The first two measures, and , are equivalent to the distance measures, and , developed by Goldstein et al. (1995a). A comparison of , , and was conducted by Goldstein et al. (1995a) using data generated via a SMM simulation. This study showed that both the distance measures and asymptote faster than which remained linear for a longer period of time. However the measure was not superior in all respects, as has a lower variance. Goldstein et al. (1995a) concluded that for relatively short periods of time or was a better measure, but as time increased would become superior. When populations have only been separated for a short period of time (300 generations) the effect of mutation is minimal, most of the genetic difference should be the result of drift, and the IAM based estimates of distance are superior. As time increases (> 500 generations), mutations become an important force leading to genetic differentiation, and a method that bases a distance on these events becomes superior (Goldstein et al. 1995a; Forbes et al. 1995). The estimation of generation times is based on a moderate population size and a mutation rate of about .

Goldstein et al. (1995a) also studied the length of time that is linear with time, under more realistic conditions. This was achieved by imposing a size constraint and limiting the range of microsatellite repeats at each loci. They found that the greater the number of alleles a locus has the more useful it will be for the study of distantly related populations. Distance estimates from these loci are linear with time for a greater amount of time. They found that estimates from some loci may be linear for up to 1-2 million years. They point out that this difference in linearity may be a problem when combining data from different loci, and are currently working on a weighting scheme that may overcome this drawback. However, they also note that is still linear with time even when data from a number of loci were combined (by taking the average ), and that this measure is still useful in most situations.

Slatkin (1995) investigated how performs under the TPM. He showed that is linear with time even when the model allows for a low level of multi-step mutations. Not surprisingly, as the model becomes more like the IAM, becomes progressively worse. Slatkin(1995) also cautions that if the mutation rate does change as the number of repeats increases, this measure may no longer be linear.

A modification of the average sum of square distance method has recently been made by Goldstein et al. (1995b). They have developed a measure that is independent of population size when the populations are in a mutation-drift equilibrium, and, like , is linear for longer periods of time but has a lower initial variance than ,

where and is the mean number of repeats found in the alleles of population A and B respectively. An analysis of human microsatellites shows the is still superior to for closely related populations. However, the measure may be useful for the resolution of the deeper ancestral nodes (Goldstein et al. 1995b).



next up previous
Next: IAM based F-statistics Up: Microsatellite's and Genetic Distance Previous: IAM based distance