Research articleOn the evolution rate in mammalian mitochondrial genomes
Graphical Abstract
Highlights
► We present a simple and effective method to estimate the evolution rate. ► The first and second codon positions correlate in evolutionary process. ► The rate distribution in 3rd position is a mixture distribution.
Introduction
The analysis of mitochondrial genomes (mtDNA) has been a potent tool in our understanding of evolution and phylogenetic inference since the mtDNA exhibits some special features such as maternal inheritance, the presence of single-copy orthologous genes, lack of recombination, a high mutation rate and a low level of DNA repair.
The evolution rate is of fundamental importance for understanding the evolution and phylogenetic inference. Although most methods of phylogenetic inference model single nucleotide substitutions and assume that they occur as independent draws from a unique probability distribution across the sequence (Bofkin and Goldman, 2007, Yang, 2006, Felsenstein, 2004), this assumption may be unrealistic in real data. In the last two decades, more and more evidences indicate that the rates of nucleotide and amino acid substitution are not constant among the sites (Uzzell and Corbin, 1971, Bousquet et al., 1992, Gaut et al., 1992, Martin et al., 1992, Martin and Palumbi, 1993, Laroche et al., 1995, Nei and Kumar, 2000, Yang et al., 1998, Felsenstein, 2001). Especially for protein-coding sequence, the three codon positions evolve with different rates: the third codon positions evolve much faster than the first and the second codon positions mainly due to the selective pressure. Furthermore some rate heterogeneity can be observed within each of the three codon positions because of the specific functional constraints acting on each site.
Codon models first introduced by Goldman and Yang (1994, GY94 model) have been developed to improve the reliability of the phylogenetic reconstruction for protein-coding sequence (Goldman and Yang, 1994, Yang et al., 2000, Schadt and Lange, 2002, Ren et al., 2005). These models have been shown to provide a substantially improved fit to protein-coding sequences, accounting for the context dependency derived from the triplet nature of the genetic code (recently reviewed by Anisimova and Kosiol, 2009, Delport et al., 2008). Full codon models are computationally expensive compare to standard nucleotide substitution models, for the 61 sense codons code for 20 amino acids lead to redundancy. The so-called codon position model (CP model) were presented treating the codon position as a separate category (Bofkin and Goldman, 2007, Shapiro et al., 2006). Bofkin and Goldman (2007) have investigated the variation in evolutionary processes at different codon positions and suggested that the first- and second-codon positions could be grouped together in a single category in future investigations. However, the correlation of the codon position are not studied in a comprehensive manner. It is necessary to examine the correlation of the evolution rates between different codon position to help us understand the molecular evolutionary mechanism and build the more precise mathematical model of the phylogenetic reconstruction.
Different approaches have been used so far to estimate the evolution rate such as maximum parsimony method (Hasegawa et al., 1993, Wakeley, 1993) and maximum likelihood method (Siepel and Haussler, 2004, Excoffier and Yang, 1999, Meyer et al., 1999, Felsenstein, 1981, Yang, 1993, Yang, 1994, Yang, 1996, Yang and Kumar, 1996).These approaches all depend on a phylogenetic tree which is reconstructed by the multi-alignment sequences and add a priori determination of the substitution rate parameters to the predetermination of the tree. Furthermore, these methods assume the evolution process is independent along the sites of sequence and are too time-consuming in reconstruction of the phylogenetic tree and estimating the ancestor sequence (Felsenstein, 2004). So the methods based on the phylogenetic tree are not suitable to estimate the evolution rates accurately in large datasets though they do it effectively in relatively small datasets.
We present here a simple and effective method to compare the evolution rates at the different codon positions of mammalian mitochondrial genomes and to analyze the correlation between these positions. It is based on the simple idea that the higher evolution rate increase the uncertainty of the nucleotide base. For our dataset of 123 mtDNA sequences, we use shannon entropy (Cover and Thomas, 2006) to estimate the evolution rates and to analyze the correlation of evolution rates between the positions. Generally, entropy measures the degree of uncertainty of a random variable. When the variable is a constant, the value of entropy achieves the minimum (in fact, it equals zero). And the uniform distribution is the maximum entropy distribution. For the aligned sequences, we regard the nucleotides at a special site as a random variable, then the slower the evolution rate at this site is, the closer the entropy of this variable is to zero. It is suitable to analyze the correlation between the positions using the entropy because we do not make any assumption about correlation among the sites (independency or dependency), though we can’t estimate the divergence time accurately via this value. In addition, computation of the entropy is very simple, so we can almost ignore the run time of computing the entropy. Although conceptually very simple, the entropy is found to be very effective to analysis of the distribution of the evolution rates and the correlation between the evolution rates at different codon position.
The gamma distribution models are considered suitable to describe the variation in evolution rate among amino acid or nucleotide sites (Yang, 2006, Felsenstein, 2004, Yang and Kumar, 1996, Gu and Zhang, 1997, Mayrose et al., 2005). Although gamma mixture model accounting for among site rate heterogeneity have been proposed in protein sequences (Mayrose et al., 2005), the mixture distribution model for the codon positions are still absent. Our finding reveals that the evolution rates at 3rd codon positions of mammalian mitochondrial genomes also do not fit the single gamma distribution well. We also analyzed the correlation of the evolution rates between the different codon positions, and the result implies that the evolution rates between 1st and 2nd codon positions are positively correlated, and the evolution rates at 3rd codon positions are more weakly correlated with the other two codon positions.
Section snippets
Materials and methods
Complete mammalian mitochondrial DNA genome sequences and corresponding 13 protein-coding gene sequences of these genome were extracted from 123 different species. The organisms and the accession numbers of the corresponding complete mtDNA sequences are listed in Table 1.We translated these protein-coding genes to corresponding proteins, aligned these protein sequences by CLUSTALW using its default option, and examined them by eyes, then retranslated these aligned protein sequences to DNA
Descriptive statistics
We calculated the mean and standard deviation of the entropy at each codon position of 13 protein-coding genes in mammalian mitochondrial genomes. The results are shown in Table 2. In all 13 genes, the mean of entropies at the same codon positions is on an even keel. But at the different codon positions, a large variation in entropy is highlighted. The entropies of 3rd codon position are much higher than these of the other two codon positions. We use Wilcoxon rank sum test to compare the mean
Discussion and conclusion
It is well known that the evolution rate is varied over sites. But it is not enough to only consider the variety over nucleotide or amino acid sites. In fact, many codon substitution models are used to reconstruct phylogenetic tree now (Yang and Bielawski, 2000, Yang, 2002) and improve the veracity of reconstructing phylogenetic tree greatly. Unfortunately, most of these models consider the evolution rate of each codon position is independently and identically distributed and follows gamma
Acknowledgments
This research work was supported by the Project of Sciences and Technology in Tianjin (08ZCGHHZ00200), and the National Natural Science Foundation of China under the Grants 20836005 and 10671100.
References (37)
Inference of selection from multiple species alignments
Curr. Opin. Genet. Dev.
(2002)- et al.
Statistical methods for detecting molecular adaptation
Trends Ecol. Evol.
(2000) - et al.
Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference
Bioinformatics
(2004) - et al.
Investigating protein-coding sequence evolution with probabilistic codon substitution models
Mol. Biol. Evol.
(2009) - et al.
Variation in evolutionary processes at diffenrent codon positions
Mol. Biol. Evol.
(2007) - et al.
Extensive variation in evolutionary rate of rbcL gene sequences among seed plants
Proc. Natl. Acad. Sci. U.S.A.
(1992) - et al.
Elements of Information Theory
(2006) - et al.
Models of coding sequence evolution
Brief. Bioinform.
(2008) - et al.
Substitution rate variation among sites in mitochondrial hypervariable region i of humans and chimpanzees
Mol. Biol. Evol.
(1999) Evolutionary trees from DNA sequences: a maximum likelihood approach
J. Mol. Evol.
(1981)
Taking variation of evolutionary rates between sites into account in inferring phylogenies
J. Mol. Evol.
Inferring Phylogenenies
A hidden Markov model approach to variation among sites in rate of evolution
Mol. Biol. Evol.
Relative rates of nucleotide substitution at the rbcL locus of monocotyledonous plants
J. Mol. Evol.
A codon-based model of nucleotide substitution for protein-coding DNA sequences
Mol. Biol. Evol.
A simple method for estimating the parameter of substitution rate variation among sites
Mol. Biol. Evol.
Toward a more accurate time scale for the human mitochondrial DNA tree
J. Mol. Evol.
Mitochondrial DNA and monocot–dicot divergence time
Mol. Biol. Evol.
Cited by (1)
Automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates
2015, BMC Evolutionary Biology