Abstract
During the last years, kernel based methods proved to be very successful for many real-world learning problems. One of the main reasons for this success is the efficiency on large data sets which is a result of the fact that kernel methods like support vector machines (SVM) are based on a convex optimization problem. Solving a new learning problem can now often be reduced to the choice of an appropriate kernel function and kernel parameters. However, it can be shown that even the most powerful kernel methods can still fail on quite simple data sets in cases where the inherent feature space induced by the used kernel function is not sufficient. In these cases, an explicit feature space transformation or detection of latent variables proved to be more successful. Since such an explicit feature construction is often not feasible for large data sets, the ultimate goal for efficient kernel learning would be the adaptive creation of new and appropriate kernel functions. It can, however, not be guaranteed that such a kernel function still leads to a convex optimization problem for Support Vector Machines. Therefore, we have to enhance the optimization core of the learning method itself before we can use it with arbitrary, i.e., non-positive semidefinite, kernel functions. This article motivates the usage of appropriate feature spaces and discusses the possible consequences leading to non-convex optimization problems. We will show that these new non-convex optimization SVM are at least as accurate as their quadratic programming counterparts on eight real-world benchmark data sets in terms of the generalization performance. They always outperform traditional approaches in terms of the original optimization problem. Additionally, the proposed algorithm is more generic than existing traditional solutions since it will also work for non-positive semidefinite or indefinite kernel functions.
Similar content being viewed by others
References
Alpay D (2001) The schur algorithm, reproducing kernel spaces and system theory. In: SMF/AMS texts and monographs, vol 5. SMF, France
Beyer H-G, Schwefel H-P (2002) Evolution strategies: a comprehensive introduction. J Natural Comput 1(1): 2–52
Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Mining Knowledge Discovery 2(2): 121–167
Camps-Valls G, Martin-Guerrero J, Rojo-Alvarez J, Soria-Olivas E (2004) Fuzzy sigmoid kernel for support vector classifiers. Neurocomputing 62: 501–506
Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines
Frp̈hlich H, Chapelle O, Schölkopf B (2004) Feature selection for support vector machines using genetic algorithms. Int J Artif Intell Tools 13(4): 791–800
Haasdonk B (2005) Feature space interpretation of svms with indefinite kernels. IEEE Trans Pattern Anal Mach Intell 27(4): 482–492
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning: data mining, inference, and prediction Springer Series in Statistics. Springer, Heidelberg
Joachims T (1999) Making large-scale SVM learning practical. In: Schölkopf B, Burges C, Smola A (eds). Advances in Kernel Methods—support vector learning, chapter 11. MIT Press, Cambridge
Kimeldorf GS, Wahba G (1971) Some results on Tchebycheffian spline functions. J Math Anal Appl 33: 82–95
Lin H-T, Lin C-J (2003a) A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods
Lin H-T, Lin C-J (2003b) A study on sigmoid kernels for svm and the training of non-PSD kernels by SMO-type methods
Mary X (2003) Hilbertian subspaces, subdualities and applications. PhD thesis, Institut National des Sciences Appliquees Rouen
Mercer J (1909) Functions of positive and negative type and their connection with the theory of integral equations. Philos Trans Roy Soc A 209: 415–446
Mierswa I (2006) Evolutionary learning with kernels: a generic solution for large margin problems. In: Proceedings of the genetic and evolutionary computation conference (GECCO 2006)
Mierswa I, Wurst M, Klinkenberg R, Scholz M, Euler T (2006) YALE: rapid prototyping for complex data mining tasks. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2006)
Newman D, Hettich S, Blake C, Merz C (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html
Ong C, Mary X, Canu S, Smola AJ (2004a) Learning with non-positive kernels. In: Proceedings of the 21st international conference on machine learning (ICML), pp 639–646
Ong C, Mary X, Canu S, Smola AJ (2004b) Learning with non-positive kernels. In: Proceedings of the 21st international conference on machine learning (ICML 2004), pp 639–646
Platt J (1999) Advances in large margin classifiers. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. MIT Press, Cambridge
Rasmussen CE, Quinonero-Candela J (2005) Healing the relevance vector machine through augmentation. In: Proceedings of the 22nd international conference on machine learning (ICML 2005). ACM Press, New York, pp 689–696
Rechenberg I (1973) Evolutionsstrategie: optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Frommann-Holzboog, Stuttgart
Ritthoff O, Klinkenberg R, Fischer S, Mierswa I (2002) A hybrid approach to feature selection and generation using an evolutionary algorithm. In: Proceedings of the 2002 U.K. workshop on computational intelligence (UKCI-02). University of Birmingham, pp 147–154
Rüping S (2000) mySVM Manual. Universität Dortmund, Lehrstuhl Informatik VIII. http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM
Schölkopf B, Smola AJ (2002) Learning with kernels—support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge
Smola AJ, Ovari ZL, Williamson RC (2000) Regularization with dot-product kernels. In: Proceedings of the neural information processing systems (NIPS), pp 308–314
StatLib (2002) Statlib—datasets archive. http://lib.stat.cmu.edu/datasets
Storch T (2005) On the impact of objective function transformations on evolutionary and black-box algorithms. In: Proceedings of the genetic and evolutionary computation conference (GECCO), pp 833–840
Tipping ME (2001) Sparse bayesian learning and the relevance vector machine. J Mach Learn Res 1: 211–244
Vapnik V (1998) Statistical learning theory. Wiley, New York
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mierswa, I., Morik, K. About the non-convex optimization problem induced by non-positive semidefinite kernel learning. Adv Data Anal Classif 2, 241–258 (2008). https://doi.org/10.1007/s11634-008-0033-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-008-0033-4