About the non-convex optimization problem induced by non-positive semidefinite kernel learning

Mierswa, Ingo; Morik, Katharina

doi:10.1007/s11634-008-0033-4

About the non-convex optimization problem induced by non-positive semidefinite kernel learning

Regular Article
Published: 02 December 2008

Volume 2, pages 241–258, (2008)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Ingo Mierswa¹ &
Katharina Morik¹

187 Accesses
8 Citations
Explore all metrics

Abstract

During the last years, kernel based methods proved to be very successful for many real-world learning problems. One of the main reasons for this success is the efficiency on large data sets which is a result of the fact that kernel methods like support vector machines (SVM) are based on a convex optimization problem. Solving a new learning problem can now often be reduced to the choice of an appropriate kernel function and kernel parameters. However, it can be shown that even the most powerful kernel methods can still fail on quite simple data sets in cases where the inherent feature space induced by the used kernel function is not sufficient. In these cases, an explicit feature space transformation or detection of latent variables proved to be more successful. Since such an explicit feature construction is often not feasible for large data sets, the ultimate goal for efficient kernel learning would be the adaptive creation of new and appropriate kernel functions. It can, however, not be guaranteed that such a kernel function still leads to a convex optimization problem for Support Vector Machines. Therefore, we have to enhance the optimization core of the learning method itself before we can use it with arbitrary, i.e., non-positive semidefinite, kernel functions. This article motivates the usage of appropriate feature spaces and discusses the possible consequences leading to non-convex optimization problems. We will show that these new non-convex optimization SVM are at least as accurate as their quadratic programming counterparts on eight real-world benchmark data sets in terms of the generalization performance. They always outperform traditional approaches in terms of the original optimization problem. Additionally, the proposed algorithm is more generic than existing traditional solutions since it will also work for non-positive semidefinite or indefinite kernel functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Primal Framework for Indefinite Kernel Learning

Article 10 June 2019

Robust kernel-based multiclass support vector machines via second-order cone programming

Article 06 January 2017

Kernel classification using a linear programming approach

Article 14 June 2019

References

Alpay D (2001) The schur algorithm, reproducing kernel spaces and system theory. In: SMF/AMS texts and monographs, vol 5. SMF, France
Beyer H-G, Schwefel H-P (2002) Evolution strategies: a comprehensive introduction. J Natural Comput 1(1): 2–52
MathSciNet Google Scholar
Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Mining Knowledge Discovery 2(2): 121–167
Article Google Scholar
Camps-Valls G, Martin-Guerrero J, Rojo-Alvarez J, Soria-Olivas E (2004) Fuzzy sigmoid kernel for support vector classifiers. Neurocomputing 62: 501–506
Article Google Scholar
Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines
Frp̈hlich H, Chapelle O, Schölkopf B (2004) Feature selection for support vector machines using genetic algorithms. Int J Artif Intell Tools 13(4): 791–800
Article Google Scholar
Haasdonk B (2005) Feature space interpretation of svms with indefinite kernels. IEEE Trans Pattern Anal Mach Intell 27(4): 482–492
Article Google Scholar
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning: data mining, inference, and prediction Springer Series in Statistics. Springer, Heidelberg
MATH Google Scholar
Joachims T (1999) Making large-scale SVM learning practical. In: Schölkopf B, Burges C, Smola A (eds). Advances in Kernel Methods—support vector learning, chapter 11. MIT Press, Cambridge
Kimeldorf GS, Wahba G (1971) Some results on Tchebycheffian spline functions. J Math Anal Appl 33: 82–95
Article MATH MathSciNet Google Scholar
Lin H-T, Lin C-J (2003a) A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods
Lin H-T, Lin C-J (2003b) A study on sigmoid kernels for svm and the training of non-PSD kernels by SMO-type methods
Mary X (2003) Hilbertian subspaces, subdualities and applications. PhD thesis, Institut National des Sciences Appliquees Rouen
Mercer J (1909) Functions of positive and negative type and their connection with the theory of integral equations. Philos Trans Roy Soc A 209: 415–446
Article Google Scholar
Mierswa I (2006) Evolutionary learning with kernels: a generic solution for large margin problems. In: Proceedings of the genetic and evolutionary computation conference (GECCO 2006)
Mierswa I, Wurst M, Klinkenberg R, Scholz M, Euler T (2006) YALE: rapid prototyping for complex data mining tasks. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2006)
Newman D, Hettich S, Blake C, Merz C (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html
Ong C, Mary X, Canu S, Smola AJ (2004a) Learning with non-positive kernels. In: Proceedings of the 21st international conference on machine learning (ICML), pp 639–646
Ong C, Mary X, Canu S, Smola AJ (2004b) Learning with non-positive kernels. In: Proceedings of the 21st international conference on machine learning (ICML 2004), pp 639–646
Platt J (1999) Advances in large margin classifiers. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. MIT Press, Cambridge
Rasmussen CE, Quinonero-Candela J (2005) Healing the relevance vector machine through augmentation. In: Proceedings of the 22nd international conference on machine learning (ICML 2005). ACM Press, New York, pp 689–696
Rechenberg I (1973) Evolutionsstrategie: optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Frommann-Holzboog, Stuttgart
Google Scholar
Ritthoff O, Klinkenberg R, Fischer S, Mierswa I (2002) A hybrid approach to feature selection and generation using an evolutionary algorithm. In: Proceedings of the 2002 U.K. workshop on computational intelligence (UKCI-02). University of Birmingham, pp 147–154
Rüping S (2000) mySVM Manual. Universität Dortmund, Lehrstuhl Informatik VIII. http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM
Schölkopf B, Smola AJ (2002) Learning with kernels—support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge
Google Scholar
Smola AJ, Ovari ZL, Williamson RC (2000) Regularization with dot-product kernels. In: Proceedings of the neural information processing systems (NIPS), pp 308–314
StatLib (2002) Statlib—datasets archive. http://lib.stat.cmu.edu/datasets
Storch T (2005) On the impact of objective function transformations on evolutionary and black-box algorithms. In: Proceedings of the genetic and evolutionary computation conference (GECCO), pp 833–840
Tipping ME (2001) Sparse bayesian learning and the relevance vector machine. J Mach Learn Res 1: 211–244
Article MATH MathSciNet Google Scholar
Vapnik V (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Artificial Intelligence Unit, Department of Computer Science, Technical University of Dortmund, Dortmund, Germany
Ingo Mierswa & Katharina Morik

Authors

Ingo Mierswa
View author publications
You can also search for this author in PubMed Google Scholar
Katharina Morik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Katharina Morik.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mierswa, I., Morik, K. About the non-convex optimization problem induced by non-positive semidefinite kernel learning. Adv Data Anal Classif 2, 241–258 (2008). https://doi.org/10.1007/s11634-008-0033-4

Download citation

Received: 02 February 2008
Revised: 25 September 2008
Accepted: 22 October 2008
Published: 02 December 2008
Issue Date: December 2008
DOI: https://doi.org/10.1007/s11634-008-0033-4

Keywords

Mathematics Subject Classification (2000)

JEL Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

About the non-convex optimization problem induced by non-positive semidefinite kernel learning

Abstract

Access this article

Similar content being viewed by others

A Primal Framework for Indefinite Kernel Learning

Robust kernel-based multiclass support vector machines via second-order cone programming

Kernel classification using a linear programming approach

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2000)

JEL Classification

Navigation

About the non-convex optimization problem induced by non-positive semidefinite kernel learning

Abstract

Access this article

Similar content being viewed by others

A Primal Framework for Indefinite Kernel Learning

Robust kernel-based multiclass support vector machines via second-order cone programming

Kernel classification using a linear programming approach

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

JEL Classification

Search

Navigation