Prior to the analysis of data for population substructure it is essential to test the genetic variation found at microsatellites to ensure that the basic assumptions upon which the subsequent theory is based are not violated. Three main assumptions need to be tested. First, the selective neutrality of each locus should analysed. Second, the presence of 'null alleles' (alleles which are not detected via PCR analysis) should be identified. Finally, before the data from various loci are combined, the independent assortment of the loci must be tested.
The assumption of the selective neutrality of microsatellite loci is the key tenet behind most of the analysis of this data. All of the subsequent analyses of data are based on the interaction of the forces of genetic drift (random change of allele frequency), mutation, and/or migration. Over time, the effects of drift and mutation will lead to the divergence of allele frequencies among subpopulations while migration will lead to a homogenisation of allele frequencies. Strong selection may overcome these forces. Selection at a locus may stabilise allele frequencies (e.g. via overdominance) across all subpopulations and therefore lead to an underestimation of population substructure or genetic distance. Alternatively, difference in selective pressures among regions may cause the fixation of alternate alleles in different subpopulations and cause the overestimation of these parameters. The effects of selection can confound results and any loci that are under selective pressure should be excluded from the analysis. Although, the vast majority of microsatellites are believed to be neutral, linkage of these markers to selected loci is present. One only has to look at the wealth of information on human genetic diseases that have been uncovered via the analysis of microsatellite loci tightly linked to candidate 'disease loci' to understand that this problem is not trivial (e.g. Robinson et al. 1996). Since, the exact location of most microsatellites used in the analysis of wildlife populations is unknown, the detection of the effects of selection become an important first step in any analysis.
The comparison of observed genotype frequencies to those expected from
the predictions of the Hardy-Weinberg equilibrium may detect the
presence of selection. This comparison is not straightforward at
microsatellite loci because of the combined effects of modest sample
sizes and a large number of alleles. Thus, data is usually pooled
prior to comparison with expected values to increase the power of the
tests. Various tests may be conducted and levels of comparisons are
recommended. A selection of these tests is described below. First,
observed and expected levels of heterozygosity can be compared (e.g.
Edwards et al. 1992). An
unbiased estimation of heterozygosity
is,

where n is the sample size and
is the frequency of the ith
allele. In addition, the data for rare alleles can be pooled. A
common way this is achieved is to compare three classes; homozygotes of
the most common allele, heterozygotes with the most common allele, and
all other rarer genotypes (e.g. Gottelli
et al. 1996). Finally, the observed and expected values for all
classes can be compared (e.g. Allen
et al. 1995; Edwards et al.
1992). This final comparison should be conducted with caution as one
of the common ways to test the significance of this data, the
likelihood-ratio test (G-statistic; Sokal and Rohlf 1981), does not
follow a standard distribution in the presence of a large number of
alleles and a moderate sample size
(Edwards et al. 1992). Thus
levels of statistical significance have to be estimated by permutation
of the data (Deka et al. 1991).