next up previous
Next: Estimating population structure Up: Microsatellite's and Genetic Distance Previous: Factors affecting the

Modelling microsatellite evolution

 

Simulation studies are useful in investigating the mutational processes occurring at microsatellite loci. These loci are assumed to be selectively neutral (i.e. no selective constraints on microsatellite evolution) which make the mutation-drift equilibrium models ( IAM and SMM, introduced above) applicable. Valdes et al. (1993) compared the observed allele frequency distributions of 108 human microsatellite loci to the values expected from the IAM and the SMM. They found that the observed distributions were consistent with those expected from the SMM in a population of constant size. However, because the data were not drawn from a single population they could not conclude that the SMM is the correct description of the mutational process at these loci.

A second study by Shriver et al. (1993) also investigated the correlation between observed and simulation values based on the SMM. This study compared three parameters; the number of alleles, the range of allele sizes, and the number of modes in the distribution of alleles. This study examined loci from large homogenous populations, to avoid the effects of population substructure, and loci with < 50% heterozygosity, so that expectations of the IAM and the SMM could be distinguished. They examined three classes of VNTR's including 31 microsatellites with 1-2 bp repeats, 12 microsatellites with 3-5 bp repeats, and 11 minisatellites (15-70 bp repeats). The comparisons indicated that all the microsatellites with 3-5 bp repeats, 65% of microsatellites with 1-2 bp repeats, and only 27% of the minisatellites matched the simulation values at all three measures. The minisatellite loci and to a degree the 1-2 bp repeat microsatellite loci are more similar to the expectations of the IAM. From this Shriver et al. (1993) concluded that the mutational processes at these three classes of VNTR's may be different.

Recently a model has been developed that more accurately explains the variation observed at dinucleotide repeat microsatellites (Di Rienzo et al. 1994). Di Rienzo et al. (1994) have developed a model, based on coalescence theory, that predicts the expected variance in repeat number () under different mutational processes and demographic histories. This model, the Two Phase Model (TPM), incorporates the mutational process of the SMM, but allows for mutations of a larger magnitude to occur. The main difference in the models is that once a mutation has occurred it has a probability p of being a one step mutation, and a probability 1-p of being a multi-step mutation. If a multi-step mutation occurs, then the change in repeat number is drawn from a symmetrical distribution g with a given variance of . Three parameters need to be specified; mutation rate , fraction of one step mutations p, and the variance of the distribution of multi-step mutations . The variance in the change in repeat number once a mutation has occurred can be calculated by,

From this the expected variance in repeat number can be estimated by

where 2N is the expected time to coalescence of any 2 randomly chosen alleles, in a diploid population of constant size N, and is the rate at which variance in repeat number accumulates per generation. Note, in this model the demographic history of the population, i.e. recent expansion in population size, can be incorporated by substituting 2N for the appropriate coalescence times.

Di Rienzo et al. (1994) compared the expected values of homozygosity and the frequency of the most common allele based on the TPM and SMM (when p=1), under the assumption of population expansion (a realistic situation for human populations), to values observed at 10 dinucleotide repeat microsatellites genotyped in a well defined human population. This was achieved by running the TPM simulation until similar values of were produced to that observed in each locus. At this point, levels of heterozygosity and the frequency of the most common allele produced by the simulation were compared to the observed values. They found that in 8 of the 10 loci they could reject the SMM, and that with relatively high values of p the TPM was sufficient to explain the observed data. Further, the alternate situation, when p=0 (simimating the IAM) could also be rejected.

The data of Di Rienzo et al. (1994) also suggest that all microsatellite loci do not evolve at the same rate. Di Rienzo et al. (1994) suggest that as the variance in repeat number increases the frequency and/or magnitude (i.e. number of change of repeats) of multi-step mutations increases. This is consistent with the multi-step mutations being caused by unequal crossing over and may explain the questionable association of increased repeat length and mutation rate. Weber (1990) and Hudson et al. (1992) concluded that longer repeat lengths were more polymorphic, suggesting a higher mutation rate, however, others have found these data equivocal (Valdes et al. 1993). Although studies in prokaryotes suggest that longer fragments have a higher rate of slippage (Levinson and Gutman 1987b), in vitro studies have found no such association (Schlotterer and Tautz 1992). The mechanism as to why an increased rate of unequal exchange occurs as the variance in repeat number increases, is straightforward. As the variance in repeat number increases, the probability of a crossing over event causing an unequal exchange greater than one repeat unit, should increase. Even if the rate of crossing over remains constant, the rate of unequal exchange should increase. As an unequal exchanges does not change the mean allele size of the two original alleles (i.e. smaller and larger alleles are generated), the total mean allele size should not change as a result of this mutational process. Valdes et al. (1993) found no correlation between variance in repeat number and mean allele size or number of alleles and mean allele size. There is, however, a clear linear relationship between variance in repeat number and number of alleles (Valdes et al. 1993) as would be predicted by an increased rate of unequal exchange. This relationship is also predicted by the SMM (Valdes et al. 1993). It remains to be tested whether the observed relationship is more realistically modelled by the SMM, or by a TPM which incorporates an increased amount of unequal exchange as variance in repeat number increases.

Implications of the microsatellite evolutionary process
It is clear that the evolution of microsatellites is a complex mutational process involving at least two different mechanisms. The prevalence of each mechanism varies according to the structure of the repeat unit itself. Three to five bp repeats seem to evolve via the SMM while 1-2 bp repeats are more like the TPM. Further, the prevalence of single step mutations decreases as the complexity of the repeat core increases. Although much further work is needed to fully understand the mutational processes, it is clear that care should be taken when interpreting microsatellite results. Most importantly, data from the different classes of microsatellites described above should not be treated as a single homogeneous data set. The prevalence of different mutational events may vary dramatically among the groups, confounding the interpretation of the data. Based on this idea, it has been suggested that microsatellites that have a more IAM-like evolution (i.e. composite repeats) should be those best suited to study population questions such as population subdivision and genetic relationships (Estoup et al. 1995a, 1995b) since they will contain the lowest levels of homoplasy.



next up previous
Next: Estimating population structure Up: Microsatellite's and Genetic Distance Previous: Factors affecting the