Abstract
Models that rely exclusively on the Markov property, usually known as finite-context models, can model DNA sequences without considering mechanisms that take direct advantage of exact and approximate repeats. These models provide probability estimates that depend on the recent past of the sequence and have been used for data compression. In this paper, we investigate some properties of the finite-context models and we use these properties in order to improve the compression. The results are presented using the human genome as example.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Berg, I., Bosnacki, D., Hilbers, P.: Large scale analysis of small repeats via mining of the human genome. In: 20th International Workshop on Database and Expert Systems Application, DEXA 2009, pp. 198–202 (2009)
Botta, M., Haider, S., Leung, I., Lio, P., Mozziconacci1, J.: Intra- and inter-chromosomal interactions correlate with CTCF binding genome wide. Molecular Systems Biology 6 (2010), doi:10.1038/msb.2010.79
Cao, M.D., Dix, T.I., Allison, L., Mears, C.: A simple statistical algorithm for biological sequence compression. In: Proc. of the Data Compression Conf. (DCC 2007), Snowbird, Utah (2007)
Haubold, B., Wiehe, T.: How repetitive are genomes? BMC Bioinformatics 7(1), 541 (2006)
Pinho, A.J., Neves, A.J.R., Martins, D.A., Bastos, C.A.C., Ferreira, P.J.S.G.: Finite-context models for DNA coding. In: Miron, S. (ed.) Signal Processing, pp. 117–130. INTECH (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pratas, D., Pinho, A.J. (2011). Compressing the Human Genome Using Exclusively Markov Models. In: Rocha, M.P., RodrÃguez, J.M.C., Fdez-Riverola, F., Valencia, A. (eds) 5th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2011). Advances in Intelligent and Soft Computing, vol 93. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19914-1_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-19914-1_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19913-4
Online ISBN: 978-3-642-19914-1
eBook Packages: EngineeringEngineering (R0)