Abstract
Hidden Markov models (HMMs) are effective tools to detect series of statistically homogeneous structures, but they are not well suited to analyse complex structures such as DNA sequences. Numerous methodological difficulties are encountered when using HMMs to model non geometric distribution such as exons length, to segregate genes from transposons or retroviruses, or to determine the isochore classes of genes. The aim of this paper is to suggest new tools for the exploration of genome data. We show that HMMs can be used to analyse complex gene structures with bell-shaped length distribution by introducing macros-states. Our HMMs methods take into account many biological properties and were developped to model the isochore organisation of the chimpanzee genome which is considered as a fondamental level of genome organisation. A clear isochore structure in the chimpanzee genome, correlated with the gene density and guanine-cytosine content, has been identified.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Thiery, J.P., Macaya, G., Bernardi, G.: An analysis of eukaryotic genomes by density gradient centrifugation. J. Mol. Biol. 108(1), 219–235 (1976)
Bernardi, G.: Isochores and the evolutionary genomics of vertebrates (review). Gene 241(1), 3–17 (2000)
Krogh, A.: Two methods for improving performance of an HMM and their application for gene-finding. In: Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology, pp. 179–186 (1997)
Henderson, J., Salzberg, S., Fasman, K.H.: Finding genes in DNA with a hidden Markov model. Journal of Computational Biology 4, 127–141 (1997)
Lukashin, V.A., Borodovsky, M.: Gene-Mark.hmm: new solutions for gene finding. Nucleic Acids Research 26, 1107–1115 (1998)
Burge, C., Karlin, S.: Prediction of complete gene structure in human genomic DNA. Journal of Molecular Biology 268, 78–94 (1997)
Berget, S.M.: Exon recognition in vertebrate splicing. The Journal of Biological Chemistry 270(6), 2411–2414 (1995)
Hawkins, J.D.: A survey on intron and exon lengths. Nucleic Acids Research 16, 9893–9908 (1998)
Rabiner, L.: A tutorial on hidden Markov models and selected applications in speech recognition. Poceeding of the IEEE 77(2), 257–286 (1989)
Guédon, Y.: Estimating hidden semi-Markov chains from discrete sequences. Journal of Computational and Graphical Statistics 12(3), 604–639 (2003)
Macaya, G., Thiery, J.P., Bernardi, G.: An approach to the organization of eukaryotic genomes at a macromolecular level. J. Mol. Biol. 108(1), 237–254 (1976)
Eyre-Walker, A., Hurst, L.D.: The evolution of isochores (Review). Nat. Rev. Genet. 2(7), 549–555 (2001)
Nekrutenko, A., Li, W.H.: Assessment of compositional heterogeneity within and between eukaryotic genomes. Genome Res. 10(12), 1986–1995 (2000)
Bernaola-Galvan, P., Carpena, P., Roman-Roldon, R., Oliver, J.L.: Mapping isochores by entropic segmentation of long genome sequences. In: Sankoff, D., Lengauer, T. (eds.) RECOMB Proceedings of the Fifth Annual International Conference on Computational Biology, pp. 217–218 (2001)
Li, W., Bernaola-Galvan, P., Carpena, P., Oliver, J.L.: Isochores merit the prefix ’iso. Comput. Biol. Chem. 27(1), 5–10 (2003)
Oliver, J.L., Carpena, P., Roman-Roldan, R., Mata-Balaguer, T., Mejias-Romero, A., Hackenberg, M., Bernaola-Galvan, P.: Isochore chromosome maps of the human genome. Gene 300(1-2), 117–127 (2002)
Zhang, C.T., Zhang, R.: An isochore map of the human genome based on the Z curve method. Gene 317(1-2), 127–135 (2003)
Costantini, M., Clay, O., Auletta, F., Bernardi, G.: An isochore map of human chromosomes. Genome Research 16, 536–541 (2006)
Bernardi, G., Olofsson, B., Filipski, J., Zerial, M., Salinas, J., Cuny, G., Meunier-Rotival, M., Rodier, F.: The mosaic genome of warm-blooded vertabrates. Science 228(4702), 953–958 (1985)
Mouchiroud, D., D’Onofrio, G., Aissani, B., Macaya, G., Gautier, C., Bernardi, G.: The distribution of genes in the human genome. Gene 100, 181–187 (1991)
D’Onofrio, G., Mouchiroud, D., Aïssani, B., Gautier, C., Bernardi, B.: Correlations between the compositional properties of human genes, codon usage, and amino acid composition of proteins. J. Mol. Evol. 32, 504–510 (1991)
Clay, O., Caccio, S., Zoubak, S., Mouchiroud, D., Bernardi, G.: Human coding and non coding DNA: compositional correlations. Mol. Phyl. Evol. 1, 2–12 (1996)
Jabbari, K., Bernardi, G.: CpG doublets, CpG islands and Alu repeats in long human DNA sequences from different isochore families. Gene 224(1-2), 123–127 (1998)
Zoubak, S., Clay, O., Bernardi, G.: The gene distribution of the human genome. Gene 174(1), 95–102 (1996)
Burge, C., Karlin, S.: Finding the genes in genomic DNA. Curr.Opin.Struc.Biol. 8, 346–354 (1998)
Borodovsky, M., McIninch, J.: Recognition of genes in DNA sequences with ambiguities. Biosystems 30(1-3), 161–171 (1993)
Rogic, S., Mackworth, A.K., Ouellette, F.B.: Evaluation of Gene-Finding Programs on Mammalian Sequences. Genome Research 11, 817–832 (2001)
Guéguen, L.: Sarment: Python modules for HMM analysis and partitioning of sequences. Bioinformatics 21(16), 3427–3428 (2005)
De Sario, A., Geigl, E.M., Palmieri, G., D’Urso, M., Bernardi, G.: A compositional map of human chromosome band Xq28. Proc. Natl. Acad. Sci. U S A. 93(3), 1298–1302 (1996)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Melodelima, C., Gautier, C. (2007). A Markovian Approach for the Segmentation of Chimpanzee Genome. In: Hochreiter, S., Wagner, R. (eds) Bioinformatics Research and Development. BIRD 2007. Lecture Notes in Computer Science(), vol 4414. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71233-6_20
Download citation
DOI: https://doi.org/10.1007/978-3-540-71233-6_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71232-9
Online ISBN: 978-3-540-71233-6
eBook Packages: Computer ScienceComputer Science (R0)