Abstract
Tools that effectively analyze and compare sequences are of great importance in various areas of applied computational research, especially in the framework of molecular biology. In the present paper, we introduce simple geometric criteria based on the notion of string linearity and use them to compare DNA sequences of various organisms, as well as to distinguish them from random sequences. Our experiments reveal a significant difference between biosequences and random sequences the former having much higher deviation from linearity than the latter as well as a general trend of increasing deviation from linearity between primitive and biologically complex organisms. The proposed approach is potentially applicable to the construction of dendograms representing the evolutionary relationships among species.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Apostolico, A., Giancarlo, R.: Sequence alignment in molecular biology. Journal of Computational Biology 5(2), 173–196 (1998)
Apostolico, A., Cunial, F.: The subsequence composition of polypeptides. Journal of Computational Biology 17(8), 1–39 (2010)
Brimkov, B., Brimkov, V.E.: Geometric approach to string analysis: deviation from linearity and its use for biosequence classification (2013), http://arxiv.org/abs/1308.2885v1
Broox Jr., F.P.: Three great challenges for half-century-old computer science. J. ACM 50, 25–26 (2003)
Monod, J.: Chance and Necessity. Collins, London (1972)
Nevil-Manning, C., Witten, I.: Protein is incompressible. In: Proc. Conf. Data Compression, p. 257 (1999)
Salzburger, W., Steinke, D., Braasch, I., Meyer, A.: Genome desertification in eutherians: can gene deserts explain the uneven distribution of genes in placental mammalian genomes? J. Mol. Evol. 69(3), 207–216 (2009)
Sankoff, D., Kruskal, J.B. (eds.): Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Computation. Addison-Wesley, Reading (1983)
Schwartz, R., King, J.: Sequences of hydrophobic and hydrophilic runs and alternations in proteins of known structure. Protein Sci. 15, 102–112 (2006)
Pande, V., Grosberg, A., Tanaka, T.: Nonrandomness in protein sequences: evidence for a physically driven stage of evolution. Proc. Natl. Acad. Sci. USA 91, 12972–12975 (1994)
Pandić, M., Balaban, A.T.: On a four-dimensional representation of DNA primary sequences. J. Chem. Inf. Comput. Sci. 43, 532–539 (2003)
Waterman, M.S.: Introduction to Computational Biology. Maps, Sequences and Genomes. Chapman Hall (1995)
Weiss, O., Jiménez-Montañgo, M., Herzel, H.: Information content of protein sequences. J. Theoret. Biology 206, 379–386 (2000)
White, S., Jacobs, R.: Statistical distribution of hydrophobic residues along the length of protein chains. Biophys. J. 57, 911–921 (1990)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Brimkov, B., Brimkov, V.E. (2014). Geometric Approach to Biosequence Analysis. In: Saez-Rodriguez, J., Rocha, M., Fdez-Riverola, F., De Paz Santana, J. (eds) 8th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2014). Advances in Intelligent Systems and Computing, vol 294. Springer, Cham. https://doi.org/10.1007/978-3-319-07581-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-07581-5_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07580-8
Online ISBN: 978-3-319-07581-5
eBook Packages: EngineeringEngineering (R0)