Skip to main content

Improving Biological Sequence Property Distances by Using a Genetic Algorithm

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2085))

Abstract

In this work we present a genetic-algorithm-based approach to optimise weighted distance measurements from compositional and physical-chemical properties of biological sequences that allow a significant reduction of the computational cost associated to the distance evaluation, while maintaining a high accuracy when comparing with traditional methodologies. The strategy has a generic and parametric formulation and exhaustive tests have been performed to shown its adaptability to optimise the weights over different compositions of sequence characteristics. These fast-evaluation distances can be used to deal with large set of sequences as is nowadays imperative, and appear as an important alternative to the traditional and expensive pairwise sequence similarity criterions.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S.F., Madden T.L., Schaffer A.A., Zhang J., Zhang Z., Miller W. and Lipman D.J., (1997) “Gapped BLAST and PSI-BLAST: A new Generation of Protein DB search Programs”, Nucleid Acids Research (1997) v.25,n.17 3389–3402

    Article  Google Scholar 

  2. Everitt, B. (1993), “Cluster analysis”, London: Edward Arnold, third edition.

    Google Scholar 

  3. Golberg, D.E., (1989), “Genetic Algorithms in Search, Optimisation and Machine Learning”, Addison Wesley Publishing Company.

    Google Scholar 

  4. Hobohm, U. & Sander, C. (1995) “A sequence property approach to searching protein databases” J. Mol. Biol. 251, 390–399.

    Article  Google Scholar 

  5. Holland, J.H. (1975), Adaptation in natural and artificial systems.The University of Michigan Press.

    Google Scholar 

  6. Jain, A,K, and Dubes, R.L. (1998), “Algorithms for clustering data”, Prentice-Hall

    Google Scholar 

  7. Nakata, K., (1995), “Prediction of zinc fingers DNA binding protein”, Computer Application in the Biosciences, 11, 125–131

    Google Scholar 

  8. Needleman S.B. and Wunsch C.D. (1970) “A general method applicable to the search for similarities in the amino acid sequence of two proteins”.J.Mol.Biol 48, 443–453

    Article  Google Scholar 

  9. Pearson W.R. and Lipman D.J.; (1988), “Improved tools for biological sequence comparison”, Proc.Natl,Acad.Sci. USA (85), 2444–2448

    Google Scholar 

  10. Rousseeuw, P.J. (1987) “Silhouettes: A graphical aid to the interpretations and validation of cluster analysis”. J. of Computational and Applied mathematics,20:53–65.

    Article  MATH  Google Scholar 

  11. Smith T.F. and Waterman M.S. (1981). “Comparison of Biosequences”. Adv.in Aplied Maths, (2), 482–489

    Google Scholar 

  12. Sokal, R.R. (1977), “Clustering and classification: background and current directions”, In Van Ryzin, J. ed., Classification and Clustering, 1–15, Acad. Press.

    Google Scholar 

  13. Trelles O., Andrade M.A., Valencia A., Zapata L., and Carazo J.M. (1998), “Computational Space Reduction and Parallelization of a new Clustering Approach for Large Groups of Sequences”, BioInformatics v14n5 439–451

    Article  Google Scholar 

  14. Wu, C., Berry, M., Shivakumar, S. and Mclarty, J, (1995), “Neural Networks for Full-Scale Sequence Classification: Sequence Encoding with Singular Value Decomposition” machine Learning. 21, 177–193.

    Google Scholar 

  15. Wu, C. (1997) “Artificial neural networks for molecular sequence analysis” Computers Chem. Vol. 21,No.4, 237–256

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Perez1, O.M., Marin, F.J., Trelles, O. (2001). Improving Biological Sequence Property Distances by Using a Genetic Algorithm. In: Mira, J., Prieto, A. (eds) Bio-Inspired Applications of Connectionism. IWANN 2001. Lecture Notes in Computer Science, vol 2085. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45723-2_65

Download citation

  • DOI: https://doi.org/10.1007/3-540-45723-2_65

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42237-2

  • Online ISBN: 978-3-540-45723-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics