Skip to main content

Advertisement

Log in

Tuning and evolution of support vector kernels

  • Special Issue
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

Kernel-based methods like Support Vector Machines (SVM) have been established as powerful techniques in machine learning. The idea of SVM is to perform a mapping from the input space to a higher-dimensional feature space using a kernel function, so that a linear learning algorithm can be employed. However, the burden of choosing the appropriate kernel function is usually left to the user. It can easily be shown that the accuracy of the learned model highly depends on the chosen kernel function and its parameters, especially for complex tasks. In order to obtain a good classification or regression model, an appropriate kernel function in combination with optimized pre- and post-processed data must be used. To circumvent these obstacles, we present two solutions for optimizing kernel functions: (a) automated hyperparameter tuning of kernel functions combined with an optimization of pre- and post-processing options by Sequential Parameter Optimization (SPO) and (b) evolving new kernel functions by Genetic Programming (GP). We review modern techniques for both approaches, comparing their different strengths and weaknesses. We apply tuning to SVM kernels for both regression and classification. Automatic hyperparameter tuning of standard kernels and pre- and post-processing options always yielded to systems with excellent prediction accuracy on the considered problems. Especially SPO-tuned kernels lead to much better results than all other tested tuning approaches. Regarding GP-based kernel evolution, our method rediscovered multiple standard kernels, but no significant improvements over standard kernels were obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. As an alternative to SVM we tested also Random Forest (RF) which gave similar results.

References

  1. Acevedo J, Maldonado-Bascón S, Siegmann P, Lafuente-Arroyo S, Gil P (2007) Tuning L1-SVM hyperparameters with modified radius margin bounds and simulated annealing. In: IWANN, pp 284–291

  2. Auger A, Hansen N (2005) A restart CMA evolution strategy with increasing population size. In: Proceedings of the IEEE congress on evolutionary computation, CEC 2005, pp 1769–1776

  3. Banzhaf W, Francone FD, Keller RE, Nordin P (1998) Genetic programming: an introduction: on the automatic evolution of computer programs and its applications. Morgan Kaufmann Publishers Inc., San Francisco

    MATH  Google Scholar 

  4. Bartz-Beielstein T, Flasch O, Koch P, Konen W (2010) SPOT: a toolbox for interactive and automatic tuning in the R environment. In: Hoffmann F, Hüllermeier E (eds) Proceedings 20. Workshop computational intelligence. Universitätsverlag Karlsruhe, pp 264–273

  5. Bartz-Beielstein T. (November 2003) Experimental analysis of evolution strategies—overview and comprehensive introduction. Interner Bericht des Sonderforschungsbereichs 531 computational intelligence CI–157/03, Universität Dortmund, Germany

  6. Bartz-Beielstein T, Parsopoulos KE, Vrahatis MN (2004) Design and analysis of optimization algorithms using computational statistics. Appl Numer Anal Comput Math (ANACM) 1(2):413–433

    Article  MathSciNet  MATH  Google Scholar 

  7. Bischl B (2011) mlr: Machine learning in R, http://mlr.r-forge.r-project.org

  8. Byrd R, Lu P, Nocedal J, Zhu C (1995) A limited memory algorithm for bound constrained optimization. SIAM J Sci Comput 16(5):1190–1208

    Article  MathSciNet  MATH  Google Scholar 

  9. Christmann A, Luebke K, Marin-Galiano M, Rüping S (2005) Determination of hyper-parameters for kernel based classification and regression. Tech. rep., University of Dortmund, Germany

  10. Cortes C, Haffner, P, Mohri M (2003) Positive definite rational kernels. Proceedings of the 16th annual conference on computational learning theory (COLT 2003). vol 1, pp 41–56

  11. Cortes C, Haffner P, Mohri M (2004) Rational kernels: theory and algorithms. J Mach Learn Res 5:1035–1062

    MathSciNet  MATH  Google Scholar 

  12. Cortes C, Mohri M, Rostamizadeh A (2009) Learning non-linear combinations of kernels. Adv Neural Inf Process Syst 22:396–404

    Google Scholar 

  13. Diosan L, Rogozan A, Pecuchet J (2007) Evolving kernel functions for SVMs by genetic programming. In: icmla, pp 19–24. IEEE Comput Soc

  14. Droste S, Wiesmann D (2000) Metric based evolutionary algorithms. In: Proceedings of the european conference on genetic programming. Springer-Verlag, London, pp 29–43 http://portal.acm.org/citation.cfm?id=646808.703953

  15. Drucker H, Burges C, Kaufman L, Smola A, Vapnik V (1997) Support vector regression machines. Adv Neural Inform Process Syst: 155–161

  16. Duan K, Keerthi S, Poo A (2003) Evaluation of simple performance measures for tuning SVM hyperparameters. Neurocomputing 51:41–59

    Article  Google Scholar 

  17. Evgeniou T, Pontil M, Poggio T (2000) Regularization networks and support vector machines. Adv Comput Math 13(1): 1–50, http://dx.doi.org/10.1023/A:1018946025316

    Google Scholar 

  18. Forrester A, Sobester A, Keane A (2008) Engineering design via surrogate modelling. Wiley, London

    Book  Google Scholar 

  19. Frank A, Asuncion A (2010) UCI machine learning repository, http://archive.ics.uci.edu/ml

  20. Friedrichs F, Igel C (2005) Evolutionary tuning of multiple SVM parameters. Neurocomputing 64:107–117

    Article  Google Scholar 

  21. Fröhlich H, Zell A (2005) Efficient parameter selection for support vector machines in classification and regression via model-based global optimization. In: Neural Networks, 2005. IJCNN ’05. Proceedings. 2005 IEEE international joint conference on. vol 3, pp 1431–1436

  22. Gagné C, Schoenauer M, Sebag M, Tomassini M (2006) Genetic programming for kernel-based learning with co-evolving subsets selection. Parallel problem solving from nature-PPSN IX, pp 1008–1017

  23. Glasmachers T, Igel C (2010) Maximum Likelihood model selection for 1-norm soft margin svms with multiple parameters. IEEE Transaction pattern analysis and machine intelligence

  24. Hansen N (2006) The CMA evolution strategy: a comparing review. In: Lozano J, Larranaga P, Inza I, Bengoetxea E (eds) Towards a new evolutionary computation, Springer, Berlin, pp 75–102

    Chapter  Google Scholar 

  25. Hilmer T (2008) Water in society—integrated optimisation of sewerage systems and wastewater treatment plants with computational intelligence tools. Ph.D. thesis, Open Universiteit Nederland, Heerlen

  26. Howley T, Madden M (2005) The genetic kernel support vector machine: description and evaluation. Artif Intell Rev 24(3):379–395

    Article  Google Scholar 

  27. Jones DR (December 2001) A taxonomy of global optimization methods based on response surfaces. J Global Optim 21: 345–383, http://dx.doi.org/10.1023/A:1012771025575

    Google Scholar 

  28. Karatzoglou A, Smola A, Hornik K, Zeileis A (2004) kernlab – an S4 package for kernel methods in R. J Stat Softw 11(9): 1–20, http://www.jstatsoft.org/v11/i09/

  29. Keerthi S, Sindhwani V, Chapelle O (2007) An efficient method for gradient-based adaptation of hyperparameters in SVM models. Adv Neural Inform Process Syst 19:673–680

    Google Scholar 

  30. Koch P, Bartz-Beielstein T, Konen W (2010) Optimization of support vector regression models for stormwater prediction. In: Hoffmann F, Hüllermeier E (eds) Proceedings 20. workshop computational intelligence. Universitätsverlag Karlsruhe, http://www.gm.fh-koeln.de/konen/Publikationen/GMACI10_optimSVR.pdf

  31. Koch P, Konen W, Flasch O, Bartz-Beielstein T (2010) Optimizing support vector machines for stormwater prediction. In: Bartz-Beielstein T, Chiarandini M, Paquete L, Preuss M (eds) Proceedings of workshop on experimental methods for the assessment of computational systems joint to PPSN2010. No. TR10-2-007, TU Dortmund, pp 47–59. ls11-http://www.cs.tu-dortmund.de/_media/techreports/tr10-07.pdf

  32. Konen W (2011) The TDM framework: tuned data mining in R. CIOP Technical Report 01-11, Cologne University of Applied Sciences

  33. Konen W, Koch P, Flasch O, Bartz-Beielstein T (2010) Parameter-tuned data mining: a general framework. In: Hoffmann F, Hüllermeier E (eds) Proceedings 20. Workshop computational intelligence. Universitätsverlag Karlsruhe, http://www.gm.fh-koeln.de/konen/Publikationen/GMACI10_tunedDM.pdf

  34. Konen W, Koch P, Flasch O, Bartz-Beielstein T, Friese M, Naujoks B (2011) Tuned data mining: a benchmark study on different tuners. CIOP Technical Report 02-11, Cologne University of Applied Sciences

  35. Koza J (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge

    MATH  Google Scholar 

  36. McKay MD, Beckman RJ, Conover WJ (1979) A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21(2):239–245

    MathSciNet  MATH  Google Scholar 

  37. Mercer J (1909) Functions of positive and negative type, and their connection with the theory of integral equations. Philos Trans R Soc Lond 209:415–446

    Article  MATH  Google Scholar 

  38. Momma M, Bennett K (2002) A pattern search method for model selection of support vector regression. In: SDM, pp 1345–1350

  39. Müller KR, Smola AJ, Rätsch G, Schölkopf B, Kohlmorgen J, Vapnik V (1999) Using support vector machines for time series prediction, pp 243–253. MIT Press, Cambridge http://portal.acm.org/citation.cfm?id=299094.299107

  40. Ojeda F, Suykens JAK, Moor BD (2008) Low rank updated ls-svm classifiers for fast variable selection. Neural Networks 21(2-3):437–449

    Article  Google Scholar 

  41. Poli R, Langdon WB, McPhee NF (2008) A field guide to genetic programming. Published via http://lulu.com and freely available at http://www.gp-field-guide.org.uk, (With contributions by J. R. Koza)

  42. Rasmussen CE, Williams CKI (2005) Gaussian processes for machine learning. The MIT Press, Cambridge

    Google Scholar 

  43. Rosca JP, Rosca, JP, Ballard DH, Ballard DH (1995) Causality in genetic programming. In: Genetic algorithms: proceedings of the sixth international conference (ICGA95. pp 256–263. Morgan Kaufmann

  44. Sacks J, Welch WJ, Mitchell TJ, Wynn HP (1989) Design and analysis of computer experiments. Stat Sci 4(4):409–435

    Article  MathSciNet  MATH  Google Scholar 

  45. Santner TJ, Williams BJ, Notz WI (2003) The design and analysis of computer experiments. Springer, Berlin, Heidelberg, New York

    MATH  Google Scholar 

  46. Sasena MJ, Papalambros P, Goovaerts P (2002) Exploration of metamodeling sampling criteria for constrained global optimization. Eng Optim 34:263–278

    Article  Google Scholar 

  47. Schölkopf B, Burges C, Smola A (1999) Advances in kernel methods: support vector learning. The MIT press, Cambridge

    Google Scholar 

  48. Sollich P (2002) Bayesian methods for support vector machines: evidence and predictive class probabilities. Mach Learn 46(1-3):21–52

    Article  MATH  Google Scholar 

  49. Staelin C (2003) Parameter selection for support vector machines. Hewlett-Packard Company, Tech. Rep. HPL-2002-354R1

  50. Sullivan K, Luke S (2007) Evolving kernels for support vector machine classification. In: Proceedings of the 9th annual conference on Genetic and evolutionary computation. p. 1707. ACM

  51. Vapnik V (1995) The nature of statistical learning theory. Springer, NY

    MATH  Google Scholar 

  52. Wiesmann D (2002) From syntactical to semantical mutation operators for structure optimization. In: Proceedings of the 7th international conference on parallel problem solving from nature. PPSN VII, Springer, London, pp 234–246 http://portal.acm.org/citation.cfm?id=645826.669440

  53. Wolf C, Gaida D, Stuhlsatz A, Ludwig T, McLoone S, Bongards M (2011) Predicting organic acid concentration from UV/vis spectro measurements - a comparison of machine learning techniques. Trans Inst Meas Control

  54. Wolf C, Gaida D, Stuhlsatz A, McLoone S, Bongards M (2010) Organic acid prediction in biogas plants using UV/vis spectroscopic online-measurements. Life system modeling and intelligent computing 97: 200–206, http://dx.doi.org/10.1007/978-3-642-15853-7_25

  55. Zhu C, Byrd R, Lu P, Nocedal J (1997) Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans Mathematical Software (TOMS) 23(4):550–560

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This work was partly supported by the Research Training Group “Statistical Modelling” of the German Research Foundation, and the Bundesministerium für Bildung und Forschung (BMBF) under the grant SOMA (AiF FKZ 17N1009) and by the Cologne University of Applied Sciences under the research focus grant COSA. Some experimental calculations were performed on the LiDO HPC cluster at the TU Dortmund. We would like to thank the LiDO team at the TU Dortmund for their support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick Koch.

Additional information

First, second and third author contributed equally

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koch, P., Bischl, B., Flasch, O. et al. Tuning and evolution of support vector kernels. Evol. Intel. 5, 153–170 (2012). https://doi.org/10.1007/s12065-012-0073-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-012-0073-8

Keywords

Navigation