Skip to main content

Computational and Statistical Methods in Bioinformatics

  • Conference paper
  • 784 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3430))

Abstract

Many computational and statistical methods have been developed and applied in bioinformatics. Recently, new approaches based on support vector machines have been developed. Support vector machines provide a way of combining computational methods and statistical methods. After overviewing fundamental computational and statistical methods in bioinformatics, this paper surveys how these methods are used with support vector machines in order to analyze biological sequence data. This paper also overviews a method to handle chemical structures using support vector machines.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bailey, T.L., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proc. Second International Conf. on Intelligent Systems for Molecular Biology, pp. 28–36 (1994)

    Google Scholar 

  2. Ben-Hur, A., Brutlag, D.: Remote homology detection: a motif based approach. Bioinformatics 33, i26-i33 (2003)

    Article  Google Scholar 

  3. Bock, J.R., Gough, D.A.: Predicting protein-protein interactions from primary structure. Bioinformatics 17, 455–460 (2001)

    Article  Google Scholar 

  4. Brazma, A., Jonassen, I., Eidhammer, I., Gilbert, D.: Approaches to the automatic discovery of patterns in biosequences. Journal of Computational Biology 5, 279–305 (1998)

    Article  Google Scholar 

  5. Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20, 273–297 (1995)

    MATH  Google Scholar 

  6. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge Univ. Press, Cambridge (2000)

    Google Scholar 

  7. Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis. Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)

    Book  MATH  Google Scholar 

  8. Gärtner, T., Flach, P., Wrobel, S.: On graph kernels: Hardness results and efficient alternatives. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 129–143. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  9. Haussler, D.: Convolution kernels on discrete structures, Technical Report, UC Santa Cruz (1999)

    Google Scholar 

  10. Hayashida, M., Ueda, N., Akutsu, T.: Inferring strengths of protein-protein interactions from experimental data using linear programming. Bioinformatics 19, ii58-ii65 (2003)

    Article  Google Scholar 

  11. Henikoff, A., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919 (1992)

    Article  Google Scholar 

  12. Hourai, Y., Akutsu, T., Akiyama, Y.: Optimizing substitution matrices by separating score distributions. Bioinformatics 20, 863–873 (2004)

    Article  Google Scholar 

  13. Jaakola, T., Diekhans, M., Haussler, D.: A discriminative framework for detecting remote protein homologies. Journal of Computational Biology 7, 95–114 (2000)

    Article  Google Scholar 

  14. Kann, M., Qian, B., Goldstein, R.A.: Optimization of a new score function for the detection of remote homologs. Proteins: Structure, Function, and Genetics 41, 498–503 (2000)

    Article  Google Scholar 

  15. Karchin, R., Karplus, K., Haussler, D.: Classifying G-protein coupled receptors with support vector machines. Bioinformatics 18, 147–159 (2002)

    Article  Google Scholar 

  16. Kashima, J., Tsuda, K., Inokuchi, A.: Marginalized kernels between labeled graphs. In: Proc. 20th Int. Conf. Machine Learning, pp. 321–328. AAAI Press, Menlo Park (2003)

    Google Scholar 

  17. Kondor, R.I., Lafferty, J.D.: Diffusion kernels on graphs and other discrete input spaces. In: Proc. 19th Int. Conf. Machine Learning, pp. 315–322. AAAI Press, Menlo Park (2002)

    Google Scholar 

  18. Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., Wootton, J.C.: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)

    Article  Google Scholar 

  19. Lee, Y., Lee, C.-K.: Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics 19, 1132–1139 (2003)

    Article  Google Scholar 

  20. Leslie, C., Eskin, E., Noble, W.E.: The spectrum kernel: a string kernel for svm protein classification. In: Proc. Pacific Symp. Biocomputing 2002, pp. 564–575 (2002)

    Google Scholar 

  21. Leslie, C., Eskin, E., Cohen, A., Weston, J., Noble, W.E.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20, 467–476 (2004)

    Article  Google Scholar 

  22. Levitt, M., Gernstein, M., Huang, E., Subbiah, S., Tsai, J.: Protein folding: The endgame. Annual Review of Biochemistry 66, 549–579 (1997)

    Article  Google Scholar 

  23. Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. Journal of Computational Biology 10, 857–868 (2003)

    Article  Google Scholar 

  24. Mahé, P., Ueda, N., Akutsu, T., Perret, J.-L., Vert, J.-P.: Extensions of marginalized graph kernels. In: Proc. 21st Int. Conf. Machine Learning, pp. 552–559. AAAI Press, Menlo Park (2004)

    Google Scholar 

  25. Moult, J., Fidelis, K., Zemla, A., Hubbard, T.: Critical assessment of methods for protein structure prediction (CASP)-round V. Proteins: Structure, Function, and Genetics 53, 334–339 (2003)

    Article  Google Scholar 

  26. Mount, D.W.: Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press (2001)

    Google Scholar 

  27. Mukherjee, S., et al.: Estimating dataset size requirements for classifying DNA microarray data. Journal of Computational Biology 10, 119–142 (2003)

    Article  Google Scholar 

  28. Murzin, A.G., et al.: SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 247, 536–540 (1995)

    Google Scholar 

  29. Notredame, C.: Recent progresses in multiple sequence alignment: A survey. Pharmacogenomics 3, 131–144 (2002)

    Article  Google Scholar 

  30. Park, K.-J., Kanehisa, M.: Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs. Bioinformatics 19, 1656–1663 (2003)

    Article  Google Scholar 

  31. Pevzner, P.A.: Computational Molecular Biology. An Algorithmic Approach. The MIT Press, Cambridge (2000)

    MATH  Google Scholar 

  32. Saigo, H., Vert, J.-P., Ueda, N., Akutsu, T.: Protein homology detection using string alignment kernels. Bioinformatics 20, 1682–1689 (2004)

    Article  Google Scholar 

  33. Schölkopf, B., Tsuda, K., Vert, J.-P.: Kernel Methods in Computational Biology. MIT Press, Cambridge (2004)

    Google Scholar 

  34. Thompson, J., Higgins, D., Gibson, T., Clustal, W.: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673–4690 (1994)

    Article  Google Scholar 

  35. Tsuda, K., Kin, T., Asai, K.: Marginalized kernels for biological sequences. Bioinformatics 275, S268-S275 (2002)

    Google Scholar 

  36. Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. Journal of Computational Biology 1, 337–348 (1994)

    Article  Google Scholar 

  37. Ward, J.J., McGuffin, L.J., Buxton, B.F., Jones, D.T.: Secondary structure prediction with support vector machines. Bioinformatics 19, 1650–1655 (2003)

    Article  Google Scholar 

  38. Watkins, C.: Dynamic alignment kernels. In: Advances in Large Margin Classifiers, pp. 39–50. MIT Press, Cambridge (2000)

    Google Scholar 

  39. Vert, J.-P.: A tree kernel to analyse phylogenetic profiles. Bioinformatics 284, S276-S284 (2002)

    Google Scholar 

  40. Yamanishi, Y., Vert, J.-P., Nakaya, A., Kanehisa, M.: Extraction of correlated gene clusters from multiple genomic data by generalized kernel canonical correlation analysis. Bioinformatics 330, i323-i330 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Akutsu, T. (2005). Computational and Statistical Methods in Bioinformatics. In: Tsumoto, S., Yamaguchi, T., Numao, M., Motoda, H. (eds) Active Mining. Lecture Notes in Computer Science(), vol 3430. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11423270_2

Download citation

  • DOI: https://doi.org/10.1007/11423270_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26157-5

  • Online ISBN: 978-3-540-31933-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics