Abstract
Kernel-based methods like Support Vector Machines (SVM) have been established as powerful techniques in machine learning. The idea of SVM is to perform a mapping from the input space to a higher-dimensional feature space using a kernel function, so that a linear learning algorithm can be employed. However, the burden of choosing the appropriate kernel function is usually left to the user. It can easily be shown that the accuracy of the learned model highly depends on the chosen kernel function and its parameters, especially for complex tasks. In order to obtain a good classification or regression model, an appropriate kernel function in combination with optimized pre- and post-processed data must be used. To circumvent these obstacles, we present two solutions for optimizing kernel functions: (a) automated hyperparameter tuning of kernel functions combined with an optimization of pre- and post-processing options by Sequential Parameter Optimization (SPO) and (b) evolving new kernel functions by Genetic Programming (GP). We review modern techniques for both approaches, comparing their different strengths and weaknesses. We apply tuning to SVM kernels for both regression and classification. Automatic hyperparameter tuning of standard kernels and pre- and post-processing options always yielded to systems with excellent prediction accuracy on the considered problems. Especially SPO-tuned kernels lead to much better results than all other tested tuning approaches. Regarding GP-based kernel evolution, our method rediscovered multiple standard kernels, but no significant improvements over standard kernels were obtained.
Similar content being viewed by others
Notes
As an alternative to SVM we tested also Random Forest (RF) which gave similar results.
References
Acevedo J, Maldonado-Bascón S, Siegmann P, Lafuente-Arroyo S, Gil P (2007) Tuning L1-SVM hyperparameters with modified radius margin bounds and simulated annealing. In: IWANN, pp 284–291
Auger A, Hansen N (2005) A restart CMA evolution strategy with increasing population size. In: Proceedings of the IEEE congress on evolutionary computation, CEC 2005, pp 1769–1776
Banzhaf W, Francone FD, Keller RE, Nordin P (1998) Genetic programming: an introduction: on the automatic evolution of computer programs and its applications. Morgan Kaufmann Publishers Inc., San Francisco
Bartz-Beielstein T, Flasch O, Koch P, Konen W (2010) SPOT: a toolbox for interactive and automatic tuning in the R environment. In: Hoffmann F, Hüllermeier E (eds) Proceedings 20. Workshop computational intelligence. Universitätsverlag Karlsruhe, pp 264–273
Bartz-Beielstein T. (November 2003) Experimental analysis of evolution strategies—overview and comprehensive introduction. Interner Bericht des Sonderforschungsbereichs 531 computational intelligence CI–157/03, Universität Dortmund, Germany
Bartz-Beielstein T, Parsopoulos KE, Vrahatis MN (2004) Design and analysis of optimization algorithms using computational statistics. Appl Numer Anal Comput Math (ANACM) 1(2):413–433
Bischl B (2011) mlr: Machine learning in R, http://mlr.r-forge.r-project.org
Byrd R, Lu P, Nocedal J, Zhu C (1995) A limited memory algorithm for bound constrained optimization. SIAM J Sci Comput 16(5):1190–1208
Christmann A, Luebke K, Marin-Galiano M, Rüping S (2005) Determination of hyper-parameters for kernel based classification and regression. Tech. rep., University of Dortmund, Germany
Cortes C, Haffner, P, Mohri M (2003) Positive definite rational kernels. Proceedings of the 16th annual conference on computational learning theory (COLT 2003). vol 1, pp 41–56
Cortes C, Haffner P, Mohri M (2004) Rational kernels: theory and algorithms. J Mach Learn Res 5:1035–1062
Cortes C, Mohri M, Rostamizadeh A (2009) Learning non-linear combinations of kernels. Adv Neural Inf Process Syst 22:396–404
Diosan L, Rogozan A, Pecuchet J (2007) Evolving kernel functions for SVMs by genetic programming. In: icmla, pp 19–24. IEEE Comput Soc
Droste S, Wiesmann D (2000) Metric based evolutionary algorithms. In: Proceedings of the european conference on genetic programming. Springer-Verlag, London, pp 29–43 http://portal.acm.org/citation.cfm?id=646808.703953
Drucker H, Burges C, Kaufman L, Smola A, Vapnik V (1997) Support vector regression machines. Adv Neural Inform Process Syst: 155–161
Duan K, Keerthi S, Poo A (2003) Evaluation of simple performance measures for tuning SVM hyperparameters. Neurocomputing 51:41–59
Evgeniou T, Pontil M, Poggio T (2000) Regularization networks and support vector machines. Adv Comput Math 13(1): 1–50, http://dx.doi.org/10.1023/A:1018946025316
Forrester A, Sobester A, Keane A (2008) Engineering design via surrogate modelling. Wiley, London
Frank A, Asuncion A (2010) UCI machine learning repository, http://archive.ics.uci.edu/ml
Friedrichs F, Igel C (2005) Evolutionary tuning of multiple SVM parameters. Neurocomputing 64:107–117
Fröhlich H, Zell A (2005) Efficient parameter selection for support vector machines in classification and regression via model-based global optimization. In: Neural Networks, 2005. IJCNN ’05. Proceedings. 2005 IEEE international joint conference on. vol 3, pp 1431–1436
Gagné C, Schoenauer M, Sebag M, Tomassini M (2006) Genetic programming for kernel-based learning with co-evolving subsets selection. Parallel problem solving from nature-PPSN IX, pp 1008–1017
Glasmachers T, Igel C (2010) Maximum Likelihood model selection for 1-norm soft margin svms with multiple parameters. IEEE Transaction pattern analysis and machine intelligence
Hansen N (2006) The CMA evolution strategy: a comparing review. In: Lozano J, Larranaga P, Inza I, Bengoetxea E (eds) Towards a new evolutionary computation, Springer, Berlin, pp 75–102
Hilmer T (2008) Water in society—integrated optimisation of sewerage systems and wastewater treatment plants with computational intelligence tools. Ph.D. thesis, Open Universiteit Nederland, Heerlen
Howley T, Madden M (2005) The genetic kernel support vector machine: description and evaluation. Artif Intell Rev 24(3):379–395
Jones DR (December 2001) A taxonomy of global optimization methods based on response surfaces. J Global Optim 21: 345–383, http://dx.doi.org/10.1023/A:1012771025575
Karatzoglou A, Smola A, Hornik K, Zeileis A (2004) kernlab – an S4 package for kernel methods in R. J Stat Softw 11(9): 1–20, http://www.jstatsoft.org/v11/i09/
Keerthi S, Sindhwani V, Chapelle O (2007) An efficient method for gradient-based adaptation of hyperparameters in SVM models. Adv Neural Inform Process Syst 19:673–680
Koch P, Bartz-Beielstein T, Konen W (2010) Optimization of support vector regression models for stormwater prediction. In: Hoffmann F, Hüllermeier E (eds) Proceedings 20. workshop computational intelligence. Universitätsverlag Karlsruhe, http://www.gm.fh-koeln.de/konen/Publikationen/GMACI10_optimSVR.pdf
Koch P, Konen W, Flasch O, Bartz-Beielstein T (2010) Optimizing support vector machines for stormwater prediction. In: Bartz-Beielstein T, Chiarandini M, Paquete L, Preuss M (eds) Proceedings of workshop on experimental methods for the assessment of computational systems joint to PPSN2010. No. TR10-2-007, TU Dortmund, pp 47–59. ls11-http://www.cs.tu-dortmund.de/_media/techreports/tr10-07.pdf
Konen W (2011) The TDM framework: tuned data mining in R. CIOP Technical Report 01-11, Cologne University of Applied Sciences
Konen W, Koch P, Flasch O, Bartz-Beielstein T (2010) Parameter-tuned data mining: a general framework. In: Hoffmann F, Hüllermeier E (eds) Proceedings 20. Workshop computational intelligence. Universitätsverlag Karlsruhe, http://www.gm.fh-koeln.de/konen/Publikationen/GMACI10_tunedDM.pdf
Konen W, Koch P, Flasch O, Bartz-Beielstein T, Friese M, Naujoks B (2011) Tuned data mining: a benchmark study on different tuners. CIOP Technical Report 02-11, Cologne University of Applied Sciences
Koza J (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge
McKay MD, Beckman RJ, Conover WJ (1979) A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21(2):239–245
Mercer J (1909) Functions of positive and negative type, and their connection with the theory of integral equations. Philos Trans R Soc Lond 209:415–446
Momma M, Bennett K (2002) A pattern search method for model selection of support vector regression. In: SDM, pp 1345–1350
Müller KR, Smola AJ, Rätsch G, Schölkopf B, Kohlmorgen J, Vapnik V (1999) Using support vector machines for time series prediction, pp 243–253. MIT Press, Cambridge http://portal.acm.org/citation.cfm?id=299094.299107
Ojeda F, Suykens JAK, Moor BD (2008) Low rank updated ls-svm classifiers for fast variable selection. Neural Networks 21(2-3):437–449
Poli R, Langdon WB, McPhee NF (2008) A field guide to genetic programming. Published via http://lulu.com and freely available at http://www.gp-field-guide.org.uk, (With contributions by J. R. Koza)
Rasmussen CE, Williams CKI (2005) Gaussian processes for machine learning. The MIT Press, Cambridge
Rosca JP, Rosca, JP, Ballard DH, Ballard DH (1995) Causality in genetic programming. In: Genetic algorithms: proceedings of the sixth international conference (ICGA95. pp 256–263. Morgan Kaufmann
Sacks J, Welch WJ, Mitchell TJ, Wynn HP (1989) Design and analysis of computer experiments. Stat Sci 4(4):409–435
Santner TJ, Williams BJ, Notz WI (2003) The design and analysis of computer experiments. Springer, Berlin, Heidelberg, New York
Sasena MJ, Papalambros P, Goovaerts P (2002) Exploration of metamodeling sampling criteria for constrained global optimization. Eng Optim 34:263–278
Schölkopf B, Burges C, Smola A (1999) Advances in kernel methods: support vector learning. The MIT press, Cambridge
Sollich P (2002) Bayesian methods for support vector machines: evidence and predictive class probabilities. Mach Learn 46(1-3):21–52
Staelin C (2003) Parameter selection for support vector machines. Hewlett-Packard Company, Tech. Rep. HPL-2002-354R1
Sullivan K, Luke S (2007) Evolving kernels for support vector machine classification. In: Proceedings of the 9th annual conference on Genetic and evolutionary computation. p. 1707. ACM
Vapnik V (1995) The nature of statistical learning theory. Springer, NY
Wiesmann D (2002) From syntactical to semantical mutation operators for structure optimization. In: Proceedings of the 7th international conference on parallel problem solving from nature. PPSN VII, Springer, London, pp 234–246 http://portal.acm.org/citation.cfm?id=645826.669440
Wolf C, Gaida D, Stuhlsatz A, Ludwig T, McLoone S, Bongards M (2011) Predicting organic acid concentration from UV/vis spectro measurements - a comparison of machine learning techniques. Trans Inst Meas Control
Wolf C, Gaida D, Stuhlsatz A, McLoone S, Bongards M (2010) Organic acid prediction in biogas plants using UV/vis spectroscopic online-measurements. Life system modeling and intelligent computing 97: 200–206, http://dx.doi.org/10.1007/978-3-642-15853-7_25
Zhu C, Byrd R, Lu P, Nocedal J (1997) Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans Mathematical Software (TOMS) 23(4):550–560
Acknowledgments
This work was partly supported by the Research Training Group “Statistical Modelling” of the German Research Foundation, and the Bundesministerium für Bildung und Forschung (BMBF) under the grant SOMA (AiF FKZ 17N1009) and by the Cologne University of Applied Sciences under the research focus grant COSA. Some experimental calculations were performed on the LiDO HPC cluster at the TU Dortmund. We would like to thank the LiDO team at the TU Dortmund for their support.
Author information
Authors and Affiliations
Corresponding author
Additional information
First, second and third author contributed equally
Rights and permissions
About this article
Cite this article
Koch, P., Bischl, B., Flasch, O. et al. Tuning and evolution of support vector kernels. Evol. Intel. 5, 153–170 (2012). https://doi.org/10.1007/s12065-012-0073-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-012-0073-8