Institute of Cell, Animal, and Population Biology, University of Edinburgh, U.K.
Laboratoire "Génome, Populations, Interactions", Université Montpellier 2, France
Base compositions (A-, C-, G- and T-percent) are highly variable among genes and genomes. Heterogeneous base compositions have been observed in most taxonomic groups sampled, for organellar or nuclear genomes, in coding and non-coding regions. Base composition has some functional implications: depending on the organism, it relates to codon usage, gene density, resistance to high temperature. Observing unequal base compositions between genomes or between homologous genes implies that distinct lineages have undergone distinct evolutionary processes. This raises several interesting questions. First, one may wonder about the robustness of DNA sequence analysis methods - and especially phylogenetic inference methods when the base composition varies between compared sequences. Secondly, the history of diverging base compositions deserves attention: what were the ancestral states, which lineages experienced severe compositional changes? Finally, the mechanisms of compositional divergence are unknown in most cases: what are the evolutionary forces that underly the observed changes in base composition? Is natural selection acting to shape genomic base compositions, or is the variation between genomes mainly due to variable mutation processes?
These problematics instantiate the dual goal of molecular evolution, namely (i) recovering the history of species and populations through that of their genomes, and (ii) understanding better the structure and function of genomes thanks to the evolutionary perspective. The above questions are addressed thanks to a non-homogeneous, non-stationary model of DNA sequence evolution, allowing diverging GC-content in time and between lineages (Galtier & Gouy 1995, 1998). Maximum-likelihood analyses based on this model allow to (i) correctly estimate phylogenies in case of variable GC-content between sequences, and (ii) estimate ancestral base compositions. The latter possibility is applied to ribosomal RNA sequences from species sampled in all three domains of life (Galtier et al. 1999), yielding evidence that the last universal common ancestor was not a thermophilic organism.
Galtier, N., and Gouy, M. 1995. Inferring phylogenies from sequences of unequal base compositions. Proc. Natl. Acad. Sci. USA. 92: 11317-11321.
Galtier, N., and Gouy, M. 1998. Inferring pattern and process: maximum likelihood implementation of a non-homogeneous model of DNA sequence evolution for phylogenetic analysis. Mol. Biol. Evol. 15: 871-879
Galtier, N., Tourasse, N.J., and Gouy, M. 1999. A non-hyperthermophilic ancestor to extant life forms. Science 283: 220-221.