Abstract
In this work we present a genetic-algorithm-based approach to optimise weighted distance measurements from compositional and physical-chemical properties of biological sequences that allow a significant reduction of the computational cost associated to the distance evaluation, while maintaining a high accuracy when comparing with traditional methodologies. The strategy has a generic and parametric formulation and exhaustive tests have been performed to shown its adaptability to optimise the weights over different compositions of sequence characteristics. These fast-evaluation distances can be used to deal with large set of sequences as is nowadays imperative, and appear as an important alternative to the traditional and expensive pairwise sequence similarity criterions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Altschul, S.F., Madden T.L., Schaffer A.A., Zhang J., Zhang Z., Miller W. and Lipman D.J., (1997) “Gapped BLAST and PSI-BLAST: A new Generation of Protein DB search Programs”, Nucleid Acids Research (1997) v.25,n.17 3389–3402
Everitt, B. (1993), “Cluster analysis”, London: Edward Arnold, third edition.
Golberg, D.E., (1989), “Genetic Algorithms in Search, Optimisation and Machine Learning”, Addison Wesley Publishing Company.
Hobohm, U. & Sander, C. (1995) “A sequence property approach to searching protein databases” J. Mol. Biol. 251, 390–399.
Holland, J.H. (1975), Adaptation in natural and artificial systems.The University of Michigan Press.
Jain, A,K, and Dubes, R.L. (1998), “Algorithms for clustering data”, Prentice-Hall
Nakata, K., (1995), “Prediction of zinc fingers DNA binding protein”, Computer Application in the Biosciences, 11, 125–131
Needleman S.B. and Wunsch C.D. (1970) “A general method applicable to the search for similarities in the amino acid sequence of two proteins”.J.Mol.Biol 48, 443–453
Pearson W.R. and Lipman D.J.; (1988), “Improved tools for biological sequence comparison”, Proc.Natl,Acad.Sci. USA (85), 2444–2448
Rousseeuw, P.J. (1987) “Silhouettes: A graphical aid to the interpretations and validation of cluster analysis”. J. of Computational and Applied mathematics,20:53–65.
Smith T.F. and Waterman M.S. (1981). “Comparison of Biosequences”. Adv.in Aplied Maths, (2), 482–489
Sokal, R.R. (1977), “Clustering and classification: background and current directions”, In Van Ryzin, J. ed., Classification and Clustering, 1–15, Acad. Press.
Trelles O., Andrade M.A., Valencia A., Zapata L., and Carazo J.M. (1998), “Computational Space Reduction and Parallelization of a new Clustering Approach for Large Groups of Sequences”, BioInformatics v14n5 439–451
Wu, C., Berry, M., Shivakumar, S. and Mclarty, J, (1995), “Neural Networks for Full-Scale Sequence Classification: Sequence Encoding with Singular Value Decomposition” machine Learning. 21, 177–193.
Wu, C. (1997) “Artificial neural networks for molecular sequence analysis” Computers Chem. Vol. 21,No.4, 237–256
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Perez1, O.M., Marin, F.J., Trelles, O. (2001). Improving Biological Sequence Property Distances by Using a Genetic Algorithm. In: Mira, J., Prieto, A. (eds) Bio-Inspired Applications of Connectionism. IWANN 2001. Lecture Notes in Computer Science, vol 2085. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45723-2_65
Download citation
DOI: https://doi.org/10.1007/3-540-45723-2_65
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42237-2
Online ISBN: 978-3-540-45723-7
eBook Packages: Springer Book Archive