Simulation studies are useful in investigating the mutational processes occurring at microsatellite loci. These loci are assumed to be selectively neutral (i.e. no selective constraints on microsatellite evolution) which make the mutation-drift equilibrium models ( IAM and SMM, introduced above) applicable. Valdes et al. (1993) compared the observed allele frequency distributions of 108 human microsatellite loci to the values expected from the IAM and the SMM. They found that the observed distributions were consistent with those expected from the SMM in a population of constant size. However, because the data were not drawn from a single population they could not conclude that the SMM is the correct description of the mutational process at these loci.
A second study by Shriver et al. (1993) also investigated the correlation between observed and simulation values based on the SMM. This study compared three parameters; the number of alleles, the range of allele sizes, and the number of modes in the distribution of alleles. This study examined loci from large homogenous populations, to avoid the effects of population substructure, and loci with < 50% heterozygosity, so that expectations of the IAM and the SMM could be distinguished. They examined three classes of VNTR's including 31 microsatellites with 1-2 bp repeats, 12 microsatellites with 3-5 bp repeats, and 11 minisatellites (15-70 bp repeats). The comparisons indicated that all the microsatellites with 3-5 bp repeats, 65% of microsatellites with 1-2 bp repeats, and only 27% of the minisatellites matched the simulation values at all three measures. The minisatellite loci and to a degree the 1-2 bp repeat microsatellite loci are more similar to the expectations of the IAM. From this Shriver et al. (1993) concluded that the mutational processes at these three classes of VNTR's may be different.
Recently a model has been developed that more accurately explains the
variation observed at dinucleotide repeat microsatellites
(Di Rienzo et al. 1994).
Di Rienzo et al. (1994) have
developed a model, based on coalescence theory, that predicts the
expected variance in repeat number (
) under different mutational
processes and demographic histories. This model, the Two Phase Model
(TPM), incorporates the mutational process of the SMM, but allows for
mutations of a larger magnitude to occur. The main difference in the
models is that once a mutation has occurred it has a probability p
of being a one step mutation, and a probability 1-p of being a
multi-step mutation. If a multi-step mutation occurs, then the change
in repeat number is drawn from a symmetrical distribution g with a
given variance of
. Three parameters need to be specified;
mutation rate
, fraction of one step mutations p, and the
variance of the distribution of multi-step mutations
. The
variance in the change in repeat number once a mutation has occurred
can be calculated by,

From this the expected variance in repeat number
can be
estimated by

where 2N is the expected time to coalescence of any 2 randomly chosen
alleles, in a diploid population of constant size N, and
is the
rate at which variance in repeat number accumulates per generation.
Note, in this model the demographic history of the population, i.e.
recent expansion in population size, can be incorporated by
substituting 2N for the appropriate coalescence times.
Di Rienzo et al. (1994) compared
the expected values of homozygosity and the frequency of the most
common allele based on the TPM and SMM (when p=1), under the
assumption of population expansion (a realistic situation for human
populations), to values observed at 10 dinucleotide repeat
microsatellites genotyped in a well defined human population. This was
achieved by running the TPM simulation until similar values of were
produced to that observed in each locus. At this point, levels of
heterozygosity and the frequency of the most common allele produced by
the simulation were compared to the observed values. They found that
in 8 of the 10 loci they could reject the SMM, and that with relatively
high values of p
the TPM was sufficient to explain
the observed data. Further, the alternate situation, when p=0
(simimating the IAM) could also be rejected.
The data of Di Rienzo et al. (1994) also suggest that all microsatellite loci do not evolve at the same rate. Di Rienzo et al. (1994) suggest that as the variance in repeat number increases the frequency and/or magnitude (i.e. number of change of repeats) of multi-step mutations increases. This is consistent with the multi-step mutations being caused by unequal crossing over and may explain the questionable association of increased repeat length and mutation rate. Weber (1990) and Hudson et al. (1992) concluded that longer repeat lengths were more polymorphic, suggesting a higher mutation rate, however, others have found these data equivocal (Valdes et al. 1993). Although studies in prokaryotes suggest that longer fragments have a higher rate of slippage (Levinson and Gutman 1987b), in vitro studies have found no such association (Schlotterer and Tautz 1992). The mechanism as to why an increased rate of unequal exchange occurs as the variance in repeat number increases, is straightforward. As the variance in repeat number increases, the probability of a crossing over event causing an unequal exchange greater than one repeat unit, should increase. Even if the rate of crossing over remains constant, the rate of unequal exchange should increase. As an unequal exchanges does not change the mean allele size of the two original alleles (i.e. smaller and larger alleles are generated), the total mean allele size should not change as a result of this mutational process. Valdes et al. (1993) found no correlation between variance in repeat number and mean allele size or number of alleles and mean allele size. There is, however, a clear linear relationship between variance in repeat number and number of alleles (Valdes et al. 1993) as would be predicted by an increased rate of unequal exchange. This relationship is also predicted by the SMM (Valdes et al. 1993). It remains to be tested whether the observed relationship is more realistically modelled by the SMM, or by a TPM which incorporates an increased amount of unequal exchange as variance in repeat number increases.
Implications of the microsatellite evolutionary
process
It is clear that the evolution of microsatellites is a complex
mutational process involving at least two different mechanisms. The
prevalence of each mechanism varies according to the structure of the
repeat unit itself. Three to five bp repeats seem to evolve via the
SMM while 1-2 bp repeats are more like the TPM. Further, the
prevalence of single step mutations decreases as the complexity of the
repeat core increases. Although much further work is needed to fully
understand the mutational processes, it is clear that care should be
taken when interpreting microsatellite results. Most importantly, data
from the different classes of microsatellites described above should
not be treated as a single homogeneous data set. The prevalence of
different mutational events may vary dramatically among the groups,
confounding the interpretation of the data. Based on this idea, it has
been suggested that microsatellites that have a more IAM-like evolution
(i.e. composite repeats) should be those best suited to study
population questions such as population subdivision and genetic
relationships (Estoup et al.
1995a,
1995b) since they will contain the
lowest levels of homoplasy.