Skip to main content

Equivalence Learning in Protein Classification

  • Conference paper
Machine Learning and Data Mining in Pattern Recognition (MLDM 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4571))

Abstract

We present a method, called equivalence learning, which applies a two-class classification approach to object-pairs defined within a multi-class scenario. The underlying idea is that instead of classifying objects into their respective classes, we classify object pairs either as equivalent (belonging to the same class) or non-equivalent (belonging to different classes). The method is based on a vectorisation of the similarity between the objects and the application of a machine learning algorithm (SVM, ANN, LogReg, Random Forests) to learn the differences between equivalent and non-equivalent object pairs, and define a unique kernel function that can be obtained via equivalence learning. Using a small dataset of archaeal, bacterial and eukaryotic 3-phosphoglycerate-kinase sequences we found that the classification performance of equivalence learning slightly exceeds those of several simple machine learning algorithms at the price of a minimal increase in time and space requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)

    Article  Google Scholar 

  2. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)

    Article  Google Scholar 

  3. Pearson, W.R.: Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183, 63–98 (1985)

    Article  Google Scholar 

  4. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)

    Google Scholar 

  5. Eddy, S.: HMMER Biological sequence analysis using profile hidden Markov models, Version 2.3.2 (2003), http://hmmer.janelia.org/

  6. Mount, D.W.: Bioinformatics: Sequence and Genome Analysis. 2nd edn. Cold Spring Harbor Laboratory Press (2004)

    Google Scholar 

  7. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  8. Lodhi, H., Saunders, C., Cristianini, N., Watkins, C., Shawe-Taylor, J.: String Matching Kernels for Text Classification. Journal of Machine Learning Research 2, 419–444 (2002)

    Article  MATH  Google Scholar 

  9. Leslie, C., Eskin, E., Weston, J., Noble, W.S.: Mismatch string kernels for SVM protein classification. In: Advances in Neural Information Processing Systems, vol. 15, MIT Press, Cambridge (2003)

    Google Scholar 

  10. Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: A string kernel for SVM protein classification. In: PSB 2002. Proceedings of the Pacific Symposium on Biocomputing, World Scientific Publishing, Singapore (2002)

    Google Scholar 

  11. Vert, J.-P., Saigo, H., Akatsu, T.: Local alignment kernels for biological sequences. In: Schölkopf, B., Tsuda, K., Vert, J.-P. (eds.) Kernel methods in Computational Biology, pp. 131–154. MIT Press, Cambridge (2004)

    Google Scholar 

  12. Jaakkola, T., Diekhans, M., Haussler, D.: Using the Fisher kernel method to detect remote protein homologies. In: Proc Int. Conf. Intell. Syst. Mol. Biol., pp. 149–158 (1999)

    Google Scholar 

  13. Bishop, D.M.: Neural Networks for Pattern Recognition. Clarendon Press, Oxford (1995)

    Google Scholar 

  14. Vapnik, V.N.: Statistical Learning Theory. John Wiley & Sons, New York (1998)

    MATH  Google Scholar 

  15. Rice, J.C.: Logistic regression: An introduction. In: Thompson, B. (ed.) Advances in social science methodology, vol. 3, pp. 191–245. JAI Press, Greenwich, CT (1994)

    Google Scholar 

  16. Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  17. Cristianini, N., Kandola, J., Elisseeff, A., Shawe-Taylor, J.: On Kernel Target Alignment. In: Advances in Neural Information Processing Systems, vol. 14, pp. 367–373 (2001)

    Google Scholar 

  18. Lanckriet, G.R.G., Cristianini, N., Bartlett, P., Ghaoui, L.E., Jordan, M.I.: Learning the Kernel Matrix with Semidefinite Programming. Journal of Machine Learning Research 5, 27–72 (2004)

    Google Scholar 

  19. Kwok, T.J., Tsang, I.W.: Learning with idealized Kernels. In: Proc. of the 28 International Confernece on Machine Learning, Washington, DC (2003)

    Google Scholar 

  20. Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J. Comput. Biol. 10, 857–868 (2003)

    Article  Google Scholar 

  21. Vlahovicek, K., Kajan, L., Agoston, V., Pongor, S.: The SBASE domain sequence resource, release 12: prediction of protein domain-architecture using support vector machines. Nucleic Acids Res. 33, 223–225 (2005)

    Article  Google Scholar 

  22. Tsuda, K.: Support vector classification with asymmetric kernel function. Pros. ESANN, 183–188 (1999)

    Google Scholar 

  23. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  24. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning tools and Techniques with JAVA implementations. Morgan Kaufman, Seattle, Washington (1999)

    Google Scholar 

  25. Berg, C., Christensen, J.P.R., Ressel, P.: Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions. Springer, Heidelberg (1984)

    Google Scholar 

  26. Kertesz-Farkas, A., Dhir, S., Sonego, P., Pacurar, M., Netoteia, S., Nijveen, H., Leunissen, J., Kocsor, A., Pongor, S.: A comparison of random and supervised cross-validation strategies and benchmark datasets for protein classification (submitted for publication, 2007)

    Google Scholar 

  27. Sonego, P., Pacurar, M., Dhir, D., Kertész-Farkas, A., Kocsor, A., Gáspári, Z., Leunissen, J.A.M., Pongor, S.: A Protein Classification Benchmark collection for Machine Learning. Nucleid Acids Research

    Google Scholar 

  28. Henikoff, S., Henikoff, J.G., Pietrokovski, S.: Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations. Bioin-formatics 15, 471–479 (1999)

    Article  Google Scholar 

  29. Gribskov, M., Robinson, N.L.: Use of Receiver Operating Characteristic (ROC) analysis to evaluate sequence matching. Comput. Chem. 20, 25–33 (1996)

    Article  Google Scholar 

  30. Johns, K.W., Williams, D.A.: Acquired equivalence learning with antecedent and consequent unconditioned stimuli. J. Exp. Psychol. Anim. Behav. Process 24m, 3–14 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kertész-Farkas, A., Kocsor, A., Pongor, S. (2007). Equivalence Learning in Protein Classification. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2007. Lecture Notes in Computer Science(), vol 4571. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73499-4_62

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73499-4_62

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73498-7

  • Online ISBN: 978-3-540-73499-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics