Tuning and evolution of support vector kernels

Koch, Patrick; Bischl, Bernd; Flasch, Oliver; Bartz-Beielstein, Thomas; Weihs, Claus; Konen, Wolfgang

doi:10.1007/s12065-012-0073-8

Tuning and evolution of support vector kernels

Special Issue
Published: 04 May 2012

Volume 5, pages 153–170, (2012)
Cite this article

Evolutionary Intelligence Aims and scope Submit manuscript

Patrick Koch¹,
Bernd Bischl²,
Oliver Flasch¹,
Thomas Bartz-Beielstein¹,
Claus Weihs² &
…
Wolfgang Konen¹

590 Accesses
35 Citations
Explore all metrics

Abstract

Kernel-based methods like Support Vector Machines (SVM) have been established as powerful techniques in machine learning. The idea of SVM is to perform a mapping from the input space to a higher-dimensional feature space using a kernel function, so that a linear learning algorithm can be employed. However, the burden of choosing the appropriate kernel function is usually left to the user. It can easily be shown that the accuracy of the learned model highly depends on the chosen kernel function and its parameters, especially for complex tasks. In order to obtain a good classification or regression model, an appropriate kernel function in combination with optimized pre- and post-processed data must be used. To circumvent these obstacles, we present two solutions for optimizing kernel functions: (a) automated hyperparameter tuning of kernel functions combined with an optimization of pre- and post-processing options by Sequential Parameter Optimization (SPO) and (b) evolving new kernel functions by Genetic Programming (GP). We review modern techniques for both approaches, comparing their different strengths and weaknesses. We apply tuning to SVM kernels for both regression and classification. Automatic hyperparameter tuning of standard kernels and pre- and post-processing options always yielded to systems with excellent prediction accuracy on the considered problems. Especially SPO-tuned kernels lead to much better results than all other tested tuning approaches. Regarding GP-based kernel evolution, our method rediscovered multiple standard kernels, but no significant improvements over standard kernels were obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Tuning and Evolving Support Vector Machine Models

Evolution of Kernels for Support Vector Machine Classification on Large Datasets

Kernel Construction and Feature Subset Selection in Support Vector Machines

Notes

As an alternative to SVM we tested also Random Forest (RF) which gave similar results.

References

Acevedo J, Maldonado-Bascón S, Siegmann P, Lafuente-Arroyo S, Gil P (2007) Tuning L1-SVM hyperparameters with modified radius margin bounds and simulated annealing. In: IWANN, pp 284–291
Auger A, Hansen N (2005) A restart CMA evolution strategy with increasing population size. In: Proceedings of the IEEE congress on evolutionary computation, CEC 2005, pp 1769–1776
Banzhaf W, Francone FD, Keller RE, Nordin P (1998) Genetic programming: an introduction: on the automatic evolution of computer programs and its applications. Morgan Kaufmann Publishers Inc., San Francisco
MATH Google Scholar
Bartz-Beielstein T, Flasch O, Koch P, Konen W (2010) SPOT: a toolbox for interactive and automatic tuning in the R environment. In: Hoffmann F, Hüllermeier E (eds) Proceedings 20. Workshop computational intelligence. Universitätsverlag Karlsruhe, pp 264–273
Bartz-Beielstein T. (November 2003) Experimental analysis of evolution strategies—overview and comprehensive introduction. Interner Bericht des Sonderforschungsbereichs 531 computational intelligence CI–157/03, Universität Dortmund, Germany
Bartz-Beielstein T, Parsopoulos KE, Vrahatis MN (2004) Design and analysis of optimization algorithms using computational statistics. Appl Numer Anal Comput Math (ANACM) 1(2):413–433
Article MathSciNet MATH Google Scholar
Bischl B (2011) mlr: Machine learning in R, http://mlr.r-forge.r-project.org
Byrd R, Lu P, Nocedal J, Zhu C (1995) A limited memory algorithm for bound constrained optimization. SIAM J Sci Comput 16(5):1190–1208
Article MathSciNet MATH Google Scholar
Christmann A, Luebke K, Marin-Galiano M, Rüping S (2005) Determination of hyper-parameters for kernel based classification and regression. Tech. rep., University of Dortmund, Germany
Cortes C, Haffner, P, Mohri M (2003) Positive definite rational kernels. Proceedings of the 16th annual conference on computational learning theory (COLT 2003). vol 1, pp 41–56
Cortes C, Haffner P, Mohri M (2004) Rational kernels: theory and algorithms. J Mach Learn Res 5:1035–1062
MathSciNet MATH Google Scholar
Cortes C, Mohri M, Rostamizadeh A (2009) Learning non-linear combinations of kernels. Adv Neural Inf Process Syst 22:396–404
Google Scholar
Diosan L, Rogozan A, Pecuchet J (2007) Evolving kernel functions for SVMs by genetic programming. In: icmla, pp 19–24. IEEE Comput Soc
Droste S, Wiesmann D (2000) Metric based evolutionary algorithms. In: Proceedings of the european conference on genetic programming. Springer-Verlag, London, pp 29–43 http://portal.acm.org/citation.cfm?id=646808.703953
Drucker H, Burges C, Kaufman L, Smola A, Vapnik V (1997) Support vector regression machines. Adv Neural Inform Process Syst: 155–161
Duan K, Keerthi S, Poo A (2003) Evaluation of simple performance measures for tuning SVM hyperparameters. Neurocomputing 51:41–59
Article Google Scholar
Evgeniou T, Pontil M, Poggio T (2000) Regularization networks and support vector machines. Adv Comput Math 13(1): 1–50, http://dx.doi.org/10.1023/A:1018946025316
Google Scholar
Forrester A, Sobester A, Keane A (2008) Engineering design via surrogate modelling. Wiley, London
Book Google Scholar
Frank A, Asuncion A (2010) UCI machine learning repository, http://archive.ics.uci.edu/ml
Friedrichs F, Igel C (2005) Evolutionary tuning of multiple SVM parameters. Neurocomputing 64:107–117
Article Google Scholar
Fröhlich H, Zell A (2005) Efficient parameter selection for support vector machines in classification and regression via model-based global optimization. In: Neural Networks, 2005. IJCNN ’05. Proceedings. 2005 IEEE international joint conference on. vol 3, pp 1431–1436
Gagné C, Schoenauer M, Sebag M, Tomassini M (2006) Genetic programming for kernel-based learning with co-evolving subsets selection. Parallel problem solving from nature-PPSN IX, pp 1008–1017
Glasmachers T, Igel C (2010) Maximum Likelihood model selection for 1-norm soft margin svms with multiple parameters. IEEE Transaction pattern analysis and machine intelligence
Hansen N (2006) The CMA evolution strategy: a comparing review. In: Lozano J, Larranaga P, Inza I, Bengoetxea E (eds) Towards a new evolutionary computation, Springer, Berlin, pp 75–102
Chapter Google Scholar
Hilmer T (2008) Water in society—integrated optimisation of sewerage systems and wastewater treatment plants with computational intelligence tools. Ph.D. thesis, Open Universiteit Nederland, Heerlen
Howley T, Madden M (2005) The genetic kernel support vector machine: description and evaluation. Artif Intell Rev 24(3):379–395
Article Google Scholar
Jones DR (December 2001) A taxonomy of global optimization methods based on response surfaces. J Global Optim 21: 345–383, http://dx.doi.org/10.1023/A:1012771025575
Google Scholar
Karatzoglou A, Smola A, Hornik K, Zeileis A (2004) kernlab – an S4 package for kernel methods in R. J Stat Softw 11(9): 1–20, http://www.jstatsoft.org/v11/i09/
Keerthi S, Sindhwani V, Chapelle O (2007) An efficient method for gradient-based adaptation of hyperparameters in SVM models. Adv Neural Inform Process Syst 19:673–680
Google Scholar
Koch P, Bartz-Beielstein T, Konen W (2010) Optimization of support vector regression models for stormwater prediction. In: Hoffmann F, Hüllermeier E (eds) Proceedings 20. workshop computational intelligence. Universitätsverlag Karlsruhe, http://www.gm.fh-koeln.de/konen/Publikationen/GMACI10_optimSVR.pdf
Koch P, Konen W, Flasch O, Bartz-Beielstein T (2010) Optimizing support vector machines for stormwater prediction. In: Bartz-Beielstein T, Chiarandini M, Paquete L, Preuss M (eds) Proceedings of workshop on experimental methods for the assessment of computational systems joint to PPSN2010. No. TR10-2-007, TU Dortmund, pp 47–59. ls11-http://www.cs.tu-dortmund.de/_media/techreports/tr10-07.pdf
Konen W (2011) The TDM framework: tuned data mining in R. CIOP Technical Report 01-11, Cologne University of Applied Sciences
Konen W, Koch P, Flasch O, Bartz-Beielstein T (2010) Parameter-tuned data mining: a general framework. In: Hoffmann F, Hüllermeier E (eds) Proceedings 20. Workshop computational intelligence. Universitätsverlag Karlsruhe, http://www.gm.fh-koeln.de/konen/Publikationen/GMACI10_tunedDM.pdf
Konen W, Koch P, Flasch O, Bartz-Beielstein T, Friese M, Naujoks B (2011) Tuned data mining: a benchmark study on different tuners. CIOP Technical Report 02-11, Cologne University of Applied Sciences
Koza J (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge
MATH Google Scholar
McKay MD, Beckman RJ, Conover WJ (1979) A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21(2):239–245
MathSciNet MATH Google Scholar
Mercer J (1909) Functions of positive and negative type, and their connection with the theory of integral equations. Philos Trans R Soc Lond 209:415–446
Article MATH Google Scholar
Momma M, Bennett K (2002) A pattern search method for model selection of support vector regression. In: SDM, pp 1345–1350
Müller KR, Smola AJ, Rätsch G, Schölkopf B, Kohlmorgen J, Vapnik V (1999) Using support vector machines for time series prediction, pp 243–253. MIT Press, Cambridge http://portal.acm.org/citation.cfm?id=299094.299107
Ojeda F, Suykens JAK, Moor BD (2008) Low rank updated ls-svm classifiers for fast variable selection. Neural Networks 21(2-3):437–449
Article Google Scholar
Poli R, Langdon WB, McPhee NF (2008) A field guide to genetic programming. Published via http://lulu.com and freely available at http://www.gp-field-guide.org.uk, (With contributions by J. R. Koza)
Rasmussen CE, Williams CKI (2005) Gaussian processes for machine learning. The MIT Press, Cambridge
Google Scholar
Rosca JP, Rosca, JP, Ballard DH, Ballard DH (1995) Causality in genetic programming. In: Genetic algorithms: proceedings of the sixth international conference (ICGA95. pp 256–263. Morgan Kaufmann
Sacks J, Welch WJ, Mitchell TJ, Wynn HP (1989) Design and analysis of computer experiments. Stat Sci 4(4):409–435
Article MathSciNet MATH Google Scholar
Santner TJ, Williams BJ, Notz WI (2003) The design and analysis of computer experiments. Springer, Berlin, Heidelberg, New York
MATH Google Scholar
Sasena MJ, Papalambros P, Goovaerts P (2002) Exploration of metamodeling sampling criteria for constrained global optimization. Eng Optim 34:263–278
Article Google Scholar
Schölkopf B, Burges C, Smola A (1999) Advances in kernel methods: support vector learning. The MIT press, Cambridge
Google Scholar
Sollich P (2002) Bayesian methods for support vector machines: evidence and predictive class probabilities. Mach Learn 46(1-3):21–52
Article MATH Google Scholar
Staelin C (2003) Parameter selection for support vector machines. Hewlett-Packard Company, Tech. Rep. HPL-2002-354R1
Sullivan K, Luke S (2007) Evolving kernels for support vector machine classification. In: Proceedings of the 9th annual conference on Genetic and evolutionary computation. p. 1707. ACM
Vapnik V (1995) The nature of statistical learning theory. Springer, NY
MATH Google Scholar
Wiesmann D (2002) From syntactical to semantical mutation operators for structure optimization. In: Proceedings of the 7th international conference on parallel problem solving from nature. PPSN VII, Springer, London, pp 234–246 http://portal.acm.org/citation.cfm?id=645826.669440
Wolf C, Gaida D, Stuhlsatz A, Ludwig T, McLoone S, Bongards M (2011) Predicting organic acid concentration from UV/vis spectro measurements - a comparison of machine learning techniques. Trans Inst Meas Control
Wolf C, Gaida D, Stuhlsatz A, McLoone S, Bongards M (2010) Organic acid prediction in biogas plants using UV/vis spectroscopic online-measurements. Life system modeling and intelligent computing 97: 200–206, http://dx.doi.org/10.1007/978-3-642-15853-7_25
Zhu C, Byrd R, Lu P, Nocedal J (1997) Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans Mathematical Software (TOMS) 23(4):550–560
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

This work was partly supported by the Research Training Group “Statistical Modelling” of the German Research Foundation, and the Bundesministerium für Bildung und Forschung (BMBF) under the grant SOMA (AiF FKZ 17N1009) and by the Cologne University of Applied Sciences under the research focus grant COSA. Some experimental calculations were performed on the LiDO HPC cluster at the TU Dortmund. We would like to thank the LiDO team at the TU Dortmund for their support.

Author information

Authors and Affiliations

Institute of Computer Science, Cologne University of Applied Sciences, Cologne, Germany
Patrick Koch, Oliver Flasch, Thomas Bartz-Beielstein & Wolfgang Konen
Faculty of Statistics, University of Dortmund, Dortmund, Germany
Bernd Bischl & Claus Weihs

Authors

Patrick Koch
View author publications
You can also search for this author in PubMed Google Scholar
Bernd Bischl
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Flasch
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Bartz-Beielstein
View author publications
You can also search for this author in PubMed Google Scholar
Claus Weihs
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Konen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Patrick Koch.

Additional information

First, second and third author contributed equally

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koch, P., Bischl, B., Flasch, O. et al. Tuning and evolution of support vector kernels. Evol. Intel. 5, 153–170 (2012). https://doi.org/10.1007/s12065-012-0073-8

Download citation

Received: 10 March 2011
Accepted: 17 April 2012
Published: 04 May 2012
Issue Date: September 2012
DOI: https://doi.org/10.1007/s12065-012-0073-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Tuning and evolution of support vector kernels

Abstract

Access this article

Similar content being viewed by others

Tuning and Evolving Support Vector Machine Models

Evolution of Kernels for Support Vector Machine Classification on Large Datasets

Kernel Construction and Feature Subset Selection in Support Vector Machines

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Tuning and evolution of support vector kernels

Abstract

Access this article

Similar content being viewed by others

Tuning and Evolving Support Vector Machine Models

Evolution of Kernels for Support Vector Machine Classification on Large Datasets

Kernel Construction and Feature Subset Selection in Support Vector Machines

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation