Skip to main content

Advertisement

Log in

Improving classification performance of Support Vector Machine by genetically optimising kernel shape and hyper-parameters

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Support Vector Machines (SVMs) deliver state-of-the-art performance in real-world applications and are now established as one of the standard tools for machine learning and data mining. A key problem of these methods is how to choose an optimal kernel and how to optimise its parameters. The real-world applications have also emphasised the need to consider a combination of kernels—a multiple kernel—in order to boost the classification accuracy by adapting the kernel to the characteristics of heterogeneous data. This combination could be linear or non-linear, weighted or un-weighted. Several approaches have been already proposed to find a linear weighted kernel combination and to optimise its parameters together with the SVM parameters, but no approach has tried to optimise a non-linear weighted combination. Therefore, our goal is to automatically generate and adapt a kernel combination (linear or non-linear, weighted or un-weighted, according to the data) and to optimise both the kernel parameters and SVM parameters by evolutionary means in a unified framework. We will denote our combination as a kernel of kernels (KoK). Numerical experiments show that the SVM algorithm, involving the evolutionary kernel of kernels (eKoK) we propose, performs better than well-known classic kernels whose parameters were optimised and a state of the art convex linear and an evolutionary linear, respectively, kernel combinations. These results emphasise the fact that the SVM algorithm could require a non-linear weighted combination of kernels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bach FR, Thibaux R, Jordan MI (2004) Computing regularization paths for learning multiple kernels. In: NIPS, pp 1–10

    Google Scholar 

  2. Banzhaf W (1998) Genetic programming: an introduction: on the automatic evolution of computer programs and its applications

    MATH  Google Scholar 

  3. Bennett K, Hu J, Ji X, Kunapuli G, Pang J-S (2006) Model selection via bilevel optimization. In: IJCNN’06. International joint conference on neural networks. IEEE Computer Society, Los Alamitos, pp 1922–1929

    Google Scholar 

  4. Boardman M, Trappenberg T (2006) A heuristic for free parameter optimization with SVM. In: IJCNN 2006. IEEE, New York, pp 1337–1344

    Google Scholar 

  5. Boser BE, Guyon I, Vapnik V (1992) A training algorithm for optimal margin classifiers. In: COLT, pp 144–152

    Google Scholar 

  6. Bousquet O, Herrmann DJL (2002) On the complexity of learning the kernel matrix. In: Becker S et al (eds) NIPS. MIT Press, Cambridge, pp 399–406

    Google Scholar 

  7. Chang BR, Tsai H-F (2007) Composite of adaptive support vector regression and nonlinear conditional heteroscedasticity tuned by quantum minimization for forecasts. Appl Intell 27(3):277–289

    Article  MathSciNet  Google Scholar 

  8. Chang C-C, Lin C-J (2001) LIBSVM a library for SVM. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  9. Chapelle O (2004) Support vector machines: induction principle, adaptive tuning and prior knowledge. PhD thesis, UPMC

  10. Chapelle O, Vapnik V, Bousquet O, Mukherjee S (2002) Choosing multiple parameters for Support Vector Machines. Mach Learn 46(1/3):131–159

    Article  MATH  Google Scholar 

  11. Cho S-B, Shimohara K (1998) Evolutionary learning of modular neural networks withgenetic programming. Appl Intell 9(3):191–200

    Article  Google Scholar 

  12. Chung K-M, Kao W-C, Sun C-L, Wang L-L, Lin C-J (2003) Radius margin bounds for Support Vector Machines with the RBF kernel. Neural Comput 15(11):2643–2681

    Article  MATH  Google Scholar 

  13. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297

    MATH  Google Scholar 

  14. Crammer K, Singer Y (2002) On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2:265–292

    MATH  Google Scholar 

  15. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge

    Google Scholar 

  16. Cristianini N, Shawe-Taylor J, Elisseeff A, Kandola JS (2001) On kernel-target alignment. In: Dietterich TG, Becker S, Ghahramani Z (eds) NIPS 2001. MIT Press, Cambridge, pp 367–373

    Google Scholar 

  17. Dioşan L, Oltean M, Rogozan A, Pécuchet JP (2007) Improving SVM performance using a linear combination of kernels. In: ICANNGA’07. LNCS, vol 4432, pp 218–227

    Google Scholar 

  18. Dioşan L, Rogozan A, Pécuchet J-P (2007) Evolving kernel functions for SVMs by genetic programming. In: ICMLA’07, Ohio, USA

    Google Scholar 

  19. Frank A, Asuncion A (2010) UCI machine learning repository

    Google Scholar 

  20. Friedrichs F, Igel C (2005) Evolutionary tuning of multiple SVM parameters. Neurocomputing 64:107–117

    Article  Google Scholar 

  21. Fröhlich H, Chapelle O, Schölkopf B (2003) Feature selection for SVM by means of GAs. In: ICTAI. IEEE, New York, pp 142–148

    Google Scholar 

  22. Gagne C et al (2006) Genetic programming for kernel-based learning with co-evolving subsets selection. In: Runarsson TP et al (eds) 9th PPSN’06. Springer, Berlin, pp 1008–1017

    Google Scholar 

  23. Girolami M, Rogers S (2005) Hierarchic Bayesian models for kernel learning. In: ICML, pp 241–248

    Google Scholar 

  24. Gold C, Sollich P (2003) Model selection for Support Vector Machine classification. Neurocomputing 55(1–2):221–249

    Article  Google Scholar 

  25. Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison Wesley, Reading

    MATH  Google Scholar 

  26. Gunn S, Kandola J (2002) Structural modelling with sparse kernels. Mach Learn 48:137–163

    Article  MATH  Google Scholar 

  27. Hastie T, Rosset S, Tibshirani R, Zhu J (2003/2004) The entire regularization path for the SVM. J Mach Learn Res 5:1391–1415

    MathSciNet  Google Scholar 

  28. Hooke R, Jeeves TA (1961) Direct search solution of numerical and statistical problems. J ACM 8:212–229

    Article  MATH  Google Scholar 

  29. Howley T, Madden MG (2005) The genetic kernel Support Vector Machine: description and evaluation. Artif Intell Rev 24(3–4):379–395

    Article  Google Scholar 

  30. Huang Y (2009) Advances in artificial neural networks—methodological development and application. Algorithms 2(3):973–1007

    Article  MathSciNet  Google Scholar 

  31. Igel C (2005) Multi-objective model selection for SVM. In: Coello Coello CA et al (eds) EMO 2005. LNCS, vol 3410. Springer, Berlin, pp 534–546

    Google Scholar 

  32. Imbault F, Lebart K (2004) A stochastic optimization approach for parameter tuning of SVM. In: ICPR (4), pp 597–600

    Google Scholar 

  33. Joachims T (2001) The maximum-margin approach to learning text classifiers. Künstl Intell 15(3):63–65

    Google Scholar 

  34. Keerthi S, Sindhwani V, Chapelle O (2006) An efficient method for gradient-based adaptation of hyperparameters in SVM models. In: NIPS’06. IEEE Computer Society, Los Alamitos, pp 1–10

    Google Scholar 

  35. King RD (1992) Statlog databases

  36. Kirkpatrick S, Gelatt CD Jr, Vecchi MP (1983) Optimization by simulated annealing. Science 220:671–680

    Article  MathSciNet  MATH  Google Scholar 

  37. Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge

    MATH  Google Scholar 

  38. Lacerda E, Carvalho AC, Braga AP, Ludermir TB (2005) Evolutionary radial basis functions for credit assessment. Appl Intell 22(3):167–181

    Article  Google Scholar 

  39. Lanckriet GRG et al (2004) Learning the kernel matrix with Semidefinite Programming. J Mach Learn Res 5:27–72

    MathSciNet  MATH  Google Scholar 

  40. Mallick BK, Ghosh D, Ghosh M (2005) Bayesian classification of tumours by using gene expression data. J R Stat Soc Ser B 67(2):219–234

    Article  MathSciNet  MATH  Google Scholar 

  41. Mercer J (1909) Functions of positive and negative type and their connection with the theory of integral equations. Philos Trans R Soc 209:415–446

    Article  MATH  Google Scholar 

  42. Momma M, Bennett KP (2002) A pattern search method for model selection of SV Regression. In: Grossman RL et al (eds) SIAM 2002. SIAM, Philadelphia, pp 2–16

    Google Scholar 

  43. Ohn S-Y, Nguyen H-N, Chi S-D (2004) Evolutionary parameter estimation algorithm for combined kernel function in SVM. In: Content computing, AWCC 2004. Springer, Berlin, pp 481–486

    Chapter  Google Scholar 

  44. Ong CS, Smola A, Williamson B (2005) Learning the kernel with hyperkernels. J Mach Learn Res 6:1043–1071

    MathSciNet  MATH  Google Scholar 

  45. Rakotomamonjy A, Bach FR, Canu S, Grandvalet Y (2007) More efficiency in multiple kernel learning. In: ICML, pp 775–782

    Chapter  Google Scholar 

  46. Lessmann RS, Crone S (2005) Genetically constructed kernels for SVM. In: Proc. of GOR. Springer, Berlin, pp 257–262

    Google Scholar 

  47. Schölkopf B (2000) The kernel trick for distances. In: Leen TK, Dietterich TG, Tresp V (eds) NIPS. MIT Press, Cambridge, pp 301–307

    Google Scholar 

  48. Schölkopf B, Smola AJ (2002) Learning with kernels. MIT Press, Cambridge

    Google Scholar 

  49. Simon HA (2001) The sciences of the artificial, 3rd edn. MIT Press, Cambridge

    Google Scholar 

  50. Sonnenburg S et al (2006) Large scale multiple kernel learning. J Mach Learn Res 7:1531–1565

    MathSciNet  MATH  Google Scholar 

  51. Staelin C (2003) Parameter selection for Support Vector Machines. Tech Rep HPL-2002-354R1, Hewlett Packard Laboratories

  52. Sullivan K, Luke S (2007) Evolving kernels for SVM classification. In: Lipson H (ed) GECCO 2007. ACM, New York, pp 1702–1707

    Chapter  Google Scholar 

  53. Syswerda G (1991) A study of reproduction in generational and steady state Genetic Algorithms. In: Rawlins GJE (ed) FOGA. Morgan Kaufmann, San Mateo, pp 94–101

    Google Scholar 

  54. Taylor JS, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge

    Book  Google Scholar 

  55. Tsuda K, Rätsch G, Mika S, Müller K-R (2001) Learning to predict the leave-one-out error of kernel based classifiers. In: LNCS, vol 2130, pp 331–338

    Google Scholar 

  56. Vapnik V (1995) The nature of statistical learning theory. Springer, Berlin

    MATH  Google Scholar 

  57. Vapnik V, Chapelle O (2000) Bounds on error expectation for SVM. Neural Comput 12(9):2013–2036

    Article  Google Scholar 

  58. Verma B, Hassan S (2009) Hybrid ensemble approach for classification. Appl Intell, 1–21

  59. Wahba G, Lin Y, Zhang H (1999) GACV for support vector machines. In: Smola B, SchRolkopf S (eds) Advances in large margin classifiers. MIT Press, Cambridge

    Google Scholar 

  60. Wang G, Yeung D-Y, Lochovsky FH (2007) A kernel path algorithm for SVM. In: ICML 07. ACM Press, New York, pp 951–958

    Chapter  Google Scholar 

  61. Xiong H, Swamy M, Ahmad M (2005) Optimizing the kernel in the empirical feature space. IEEE Trans Neural Netw 16(2):460–474

    Article  Google Scholar 

  62. Zhang Z, Jordan MI (2006) Bayesian multicategory support vector machines. In: The twenty-second conference on uncertainty in artificial intelligence (UAI), 2006

    Google Scholar 

  63. Zhang Z, Kwok JT, Yeung D-Y (2006) Model-based transductive learning of the kernel matrix. Mach Learn 63(1):69–101

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laura Dioşan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dioşan, L., Rogozan, A. & Pecuchet, JP. Improving classification performance of Support Vector Machine by genetically optimising kernel shape and hyper-parameters. Appl Intell 36, 280–294 (2012). https://doi.org/10.1007/s10489-010-0260-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-010-0260-1

Keywords

Navigation