Abstract
We present a method, called equivalence learning, which applies a two-class classification approach to object-pairs defined within a multi-class scenario. The underlying idea is that instead of classifying objects into their respective classes, we classify object pairs either as equivalent (belonging to the same class) or non-equivalent (belonging to different classes). The method is based on a vectorisation of the similarity between the objects and the application of a machine learning algorithm (SVM, ANN, LogReg, Random Forests) to learn the differences between equivalent and non-equivalent object pairs, and define a unique kernel function that can be obtained via equivalence learning. Using a small dataset of archaeal, bacterial and eukaryotic 3-phosphoglycerate-kinase sequences we found that the classification performance of equivalence learning slightly exceeds those of several simple machine learning algorithms at the price of a minimal increase in time and space requirements.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Pearson, W.R.: Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183, 63–98 (1985)
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)
Eddy, S.: HMMER Biological sequence analysis using profile hidden Markov models, Version 2.3.2 (2003), http://hmmer.janelia.org/
Mount, D.W.: Bioinformatics: Sequence and Genome Analysis. 2nd edn. Cold Spring Harbor Laboratory Press (2004)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Lodhi, H., Saunders, C., Cristianini, N., Watkins, C., Shawe-Taylor, J.: String Matching Kernels for Text Classification. Journal of Machine Learning Research 2, 419–444 (2002)
Leslie, C., Eskin, E., Weston, J., Noble, W.S.: Mismatch string kernels for SVM protein classification. In: Advances in Neural Information Processing Systems, vol. 15, MIT Press, Cambridge (2003)
Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: A string kernel for SVM protein classification. In: PSB 2002. Proceedings of the Pacific Symposium on Biocomputing, World Scientific Publishing, Singapore (2002)
Vert, J.-P., Saigo, H., Akatsu, T.: Local alignment kernels for biological sequences. In: Schölkopf, B., Tsuda, K., Vert, J.-P. (eds.) Kernel methods in Computational Biology, pp. 131–154. MIT Press, Cambridge (2004)
Jaakkola, T., Diekhans, M., Haussler, D.: Using the Fisher kernel method to detect remote protein homologies. In: Proc Int. Conf. Intell. Syst. Mol. Biol., pp. 149–158 (1999)
Bishop, D.M.: Neural Networks for Pattern Recognition. Clarendon Press, Oxford (1995)
Vapnik, V.N.: Statistical Learning Theory. John Wiley & Sons, New York (1998)
Rice, J.C.: Logistic regression: An introduction. In: Thompson, B. (ed.) Advances in social science methodology, vol. 3, pp. 191–245. JAI Press, Greenwich, CT (1994)
Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)
Cristianini, N., Kandola, J., Elisseeff, A., Shawe-Taylor, J.: On Kernel Target Alignment. In: Advances in Neural Information Processing Systems, vol. 14, pp. 367–373 (2001)
Lanckriet, G.R.G., Cristianini, N., Bartlett, P., Ghaoui, L.E., Jordan, M.I.: Learning the Kernel Matrix with Semidefinite Programming. Journal of Machine Learning Research 5, 27–72 (2004)
Kwok, T.J., Tsang, I.W.: Learning with idealized Kernels. In: Proc. of the 28 International Confernece on Machine Learning, Washington, DC (2003)
Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J. Comput. Biol. 10, 857–868 (2003)
Vlahovicek, K., Kajan, L., Agoston, V., Pongor, S.: The SBASE domain sequence resource, release 12: prediction of protein domain-architecture using support vector machines. Nucleic Acids Res. 33, 223–225 (2005)
Tsuda, K.: Support vector classification with asymmetric kernel function. Pros. ESANN, 183–188 (1999)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning tools and Techniques with JAVA implementations. Morgan Kaufman, Seattle, Washington (1999)
Berg, C., Christensen, J.P.R., Ressel, P.: Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions. Springer, Heidelberg (1984)
Kertesz-Farkas, A., Dhir, S., Sonego, P., Pacurar, M., Netoteia, S., Nijveen, H., Leunissen, J., Kocsor, A., Pongor, S.: A comparison of random and supervised cross-validation strategies and benchmark datasets for protein classification (submitted for publication, 2007)
Sonego, P., Pacurar, M., Dhir, D., Kertész-Farkas, A., Kocsor, A., Gáspári, Z., Leunissen, J.A.M., Pongor, S.: A Protein Classification Benchmark collection for Machine Learning. Nucleid Acids Research
Henikoff, S., Henikoff, J.G., Pietrokovski, S.: Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations. Bioin-formatics 15, 471–479 (1999)
Gribskov, M., Robinson, N.L.: Use of Receiver Operating Characteristic (ROC) analysis to evaluate sequence matching. Comput. Chem. 20, 25–33 (1996)
Johns, K.W., Williams, D.A.: Acquired equivalence learning with antecedent and consequent unconditioned stimuli. J. Exp. Psychol. Anim. Behav. Process 24m, 3–14 (1998)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kertész-Farkas, A., Kocsor, A., Pongor, S. (2007). Equivalence Learning in Protein Classification. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2007. Lecture Notes in Computer Science(), vol 4571. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73499-4_62
Download citation
DOI: https://doi.org/10.1007/978-3-540-73499-4_62
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73498-7
Online ISBN: 978-3-540-73499-4
eBook Packages: Computer ScienceComputer Science (R0)