next up previous contents
Next: Sequence Conservation Up: Pattern Analysis Previous: Information Theory

Chargaff's Rules

Chargaff's rules express the fact that double stranded DNA obeys Watson-Crick base pairing. The two stands of dsDNA are sometimes called Watson and Crick. Chargaff's rules are tex2html_wrap_inline2985 , tex2html_wrap_inline2987 , tex2html_wrap_inline2989 and tex2html_wrap_inline2991 , where the letters represent the molar fraction of a base on one strand. Less well known is the fact that Chargaff's rules apply approximately and separately to each of the two strands of dsDNA. That is, tex2html_wrap_inline2993 , tex2html_wrap_inline2995 , tex2html_wrap_inline2997 and tex2html_wrap_inline2999 . Chargaffs second rules express the fact that complementary strands are approximately symmetric in nucleotide content. If they are true, then tex2html_wrap_inline3001 , tex2html_wrap_inline3003 , tex2html_wrap_inline3005 and tex2html_wrap_inline3007 . Departures from strand symmetry or Chargaff asymmetries can be expressed by differences: (A-T)/(A+T) and (C-G)/(C+G) for each strand (though strands are connected by Chargaff's first rules).

Strand symmetry originates from identical mutation/substitution processes affecting each strand. For example, when changing tex2html_wrap_inline3013 has the same probability as tex2html_wrap_inline3015 . However, some mutation processes are known to be strand asymmetric (Francino and Ochman, 1997). Furthermore, nucleotide substitution is subjected to selection which may depend on information contained in only one strand.

Replication Asymmetry: The leading- and lagging-strands are replicated by different mechanisms. The leading-strand is copied by a continuous process, while the lagging strand is synthesized discontinuously using multiple, short RNA primers. Additional enzymes are needed to synthesize primers and then later remove them and fill in gaps. Leading- and lagging-strand replication may involve different polymerases with disparate error rates. As well, the structure of the replication fork exposes the leading- and lagging-strands to different environments. The lagging-strand is more open as a longer, single-stranded structure which could lead to increased DNA damage.

Mutagenesis experiments in E. coli have shown that deletions and replication errors are more frequent on the lagging strand. Differences depend on the agent inducing replication errors. Excess dTTP causes more errors on the lagging strand, while excess dCTP makes little difference. In general, it seems that tex2html_wrap_inline3017 (pyrimidine tex2html_wrap_inline3019 purine) changes are expected to be more frequent on the lagging-strand, causing an accumulation of purines.

Replication bias produces a switch in Chargaff asymmetry across a replication origin because at this point the leading- and lagging-strands change identity.

   figure1915
Figure: Chargaff asymmetries in the D. melanogaster Adh region. Windows of 100 nucleotides were calculated over overlapping 50 nucleotide segments

Loby ( 1996) analyzed chromosomes of several bacteria for replication bias. The expected switch in strand asymmetry occurred across the replication origin. Changes in (C-G)/(C+G) were much more dramatic than changes in (A-T)/(A+T). The replication effect was partly obscured by protein-coding sequences which introduce their own bias. Wherever one strand had a higher density of coding sequences, that strand was found to increase G>C and T>A. Contrary to the expectation from mutagenesis, the lagging-strand accumulated more A and C (instead of A and G).

No evidence has been found in eukaryotes for replication bias. Chargaff asymmetries switch rapidly over short regions of the chromosome although generally they are higher around protein coding exons (Fig. gif). Apparently, the effect of other mutational biases and/or codon selection obscures the asymmetry caused by replication (if present!).

Transcriptional Asymmetry: Transcription can also introduce Chargaff asymmetry since the two strands may be subject to different mutational effects. During transcription, the non-template strand is in an open single-stranded conformation that is more sensitive to certain mutations such as tex2html_wrap_inline3029 (U) deamination. The template strand, on the other hand may be subject to transcription-dependent repair. DNA damage (for example a pyrimidine dimer) can stall the RNA polymerase and promote the action of nucleotide excision repair. This repair may be error-prone, inducing mutations on the template strand. Or unrepaired damage on the non-template strand may lead to substitution.

Codon Selection: Codon selection obscures the effect of transcription asymmetry. (I know of no study that has compared transcribed but non-translated regions in a way that would test for transcription-dependent asymmetry. Analyses in bacteria have detected the effect of replication asymmetry, see Francino and Ochman, 1997).

When the DNA sequence is protein-coding, selection for specific amino acids will produce strand asymmetry. For example, suppose selection favors glycine in a protein. Thus, GGN codons tend to occur on one strand and of course complementary NCC nucleotides on the other. The content of G increases relative to C in the sense strand. That is, protein amino acid composition can impose strand asymmetry. Certain kinds of codon bias in the synonymous position can also produce strand asymmetry.


next up previous contents
Next: Sequence Conservation Up: Pattern Analysis Previous: Information Theory