Abstract
Kernel functions are used in support vector machines (SVM) to compute inner product in a higher dimensional feature space. SVM classification performance depends on the chosen kernel. The radial basis function (RBF) kernel is a distance-based kernel that has been successfully applied in many tasks. This paper focuses on improving the accuracy of SVM by proposing a non-linear combination of multiple RBF kernels to obtain more flexible kernel functions. Multi-scale RBF kernels are weighted and combined. The proposed kernel allows better discrimination in the feature space. This new kernel is proved to be a Mercer’s kernel. Furthermore, evolutionary strategies (ESs) are used for adjusting the hyperparameters of SVM. Training accuracy, the bound of generalization error, and subset cross-validation on training accuracy are considered to be objective functions in the evolutionary process. The experimental results show that the accuracy of multi-scale RBF kernels is better than that of a single RBF kernel. Moreover, the subset cross-validation on training accuracy is more suitable and it yields the good results on benchmark datasets.
Similar content being viewed by others
References
Ayat NE, Cheriet M, Remaki L, Suen CY (2001) KMOD—a new support vector machine kernel with moderate decreasing for pattern recognition. In: Proceedings on document analysis and recognition, pp 1215–1219
Bartlett P, Shawe-Taylor J (1998) Generalization performance of support vector machines and other pattern classifiers. In: Scholkopf B, Burges CJC, Smola AJ (eds) Advances in kernel methods—support vector learning. MIT Press, Cambridge
Beyer H-G, Schwefel HP (2002) Evolution strategies: a comprehensive introduction. Nat Comput 1(1):3–52
Blake CL, Merz CJ (1998) UCI repository of machine learning databases [Online]. University of California, Department of Information and Computer Science, Irvine, CA. http://www.ics.uci.edu/~mlearn/MLRepository.html
Blum A, Kalai A, Langford J (1999) Beating the hold-out: bounds for K-fold and progressive cross-validation. Computational Learing Theory, pp 203–208
Blumer A, Ehrenfeucht A, Haussler D, Warmuth MK (1989) Learnability and the Vapnik–Chervonenkis dimension. J ACM 36(4):929–965
Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167
Chapelle O, Vapnik V, Bousquet O, Mukherjee S (2002) Choosing multiple parameters for support vector machines. Mach Learn 46(1):131–159
Chunhong Z, Licheng J (2004) Automatic parameters selection for SVM based on GA. In: Proceeding of the 5th World congress on intelligent control and automation, Hangzhou, China 2: 1869-1872
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, UK
deDoncker E, Gupta A, Greenwood G (1996) Adaptive integration using evolutionary strategies. In: Proceedings of 3rd international conference on high performance computing, pp 94–99
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Eads DR, Hill D, Davis S, Perkins S, Ma J, Porter R, Theiler J (2002) Genetic algorithms and support vector machines for time series classification. In: Proceedings of SPIF vol 4787, pp 74–85
Fleuret F, Sahbi H (2002) Scale-invariance of support vector machines based on the triangular kernel. INRIA Research Report, N 4601, October 2002
Fogel DB (1995) Evolutionary computation: toward a new philosophy of machine intelligence. IEEE Press, Piscataway
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32:675–701
Friedrichs F, Igel C (2004) Evolutionary tuning of multiple SVM parameters. In: 12th european symposium on artificial neural networks (ESANN 2004). pp 519–524
Fröhlich H, Zell A (2005) Efficient parameter selection for support vector machines in classification and regression via model-based global optimization. In: IEEE international joint conference on neural networks (IJCNN 2005) 3:1431-1436
Fröhlich H, Chapelle O, Schölkopf B (2003) Feature selection for support vector machines by means of genetic algorithms. In: 15th IEEE international conference on tools with AI (ICTAI 2003), pp 142–148
Garcia S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. J Mach Learn Res 9:2677–2694
Garcia S, Fernández A, Luengo J, Herrera F (2009) A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Comput 13(10):959–977
Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, US
Guo XC, Yang JH, Wu CG, Wang CY, Liang YC (2008) A novel LS-SVMs hyper-parameter selection based on particle swarm optimization. Neurocomputing 71(16–18):3211–3215
Herrera F, Lozano M, Verdegay JL (1996) Tackling real-coded genetic algorithms: operators and tools for behavioural analysis. Artif Intell Rev 12(4):265–319
Hiroshi S, Ken-ichi N, Mitsuru N (2001) Dynamic Time-Alignment Kernel in Support Vector Machine. Adv Neural Inf Process Syst NIPS2001 14(2):921–928
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70
Howley T, Madden MG (2005) The genetic kernel support vector machine: description and evaluation. Artif Intell Rev 24:379–395
Igel C (2005) Multi-objective model selection for support vector machines. In: Proceedings of the third international conference on evolutionary multi-criterion optimization vol 3410, pp 534–546
Iman L, Davenport JM (1980) Approximations of the critical region of the friedman statistic. Communications in Statistics A9: 571–595
Kääriäinen M, Langford J (2005) A comparison of tight generalization error bounds. In: Proceedings of the 22nd international conference on machine learning, Bonn, Germany, pp 409–416
Kecman V (2001) Learning and soft computing: support vector machines, neural networks, and fuzzy logic models. MIT Press, London
Koji T (1999) Support vector classifier with asymmetric kernel functions. In: Proceedings of European symposium on artificial neural networks, Bruges (Belgium), pp. 183–188
Markatou M, Tian H, Biswas S, Hripcsak G (2005) Analysis of variance of cross-validation estimators of the generalization error. Journal of Machine Learning Research 6:1127–1168
Mitchell TM (1997) Machine Learning. McGraw-Hill, New York
Müller K, Mika S, Rätsch G, Tsuda K, Schölkopf B (2001) An introduction to kernel-based learning algorithm. IEEE Transactions on Neural Networks 12(2):181–201
Ong C, Smola A, Williamson R (2005) Machine learning using hyperkernels. J Mach Learn Res 6:1043–1071
Rameswar D, Haruhisa T (2004) Kernel selection for the support vector machines. IEICE Trans Inf Syst E87-D(2)
Rechenberg I (1965) Cybernetic solution path of an experimental problem. Ministry of Aviation, Royal Aircraft Establishment, UK
Rechenberg I (1973) Evolutionsstrategie: Optimierung Technischer Systeme Nach Prinzipien der Biologischen Evolution. Frommann-Holzboog, Stuttgart
Runarsson TP, Sigurdsson S (2004) Asynchronous parallel evolutionary model selection for support vector machines. Neural Information Processing—Letter and Reviews, 3(3)
Russell S, Norvig P (2003) Artificial intelligence: a modern approach. Prentice-Hall, Englewood Cliffs
Schölkopf B, Smola A (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, London
Schölkopf B, Burges C, Smola A (1998) Advances in kernel methods: support vector machines. MIT Press, Cambridge
Schwefel H-P (1981) Numerical optimization for computer models. Wiley, Chichester
Schwefel H-P (1995) Evolution and optimum seeking. Wiley, New York
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, UK
Smits GF, Jordaan EM (2002) Improved SVM regression using mixtures of kernels. In: Proceedings of the 2002 international joint conference on neural networks vol 3, pp 2785–2790
Storn R, Price K (1997) Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 11:341–359
Tan Y, Wang J (2004) Support vector machine with a hybrid kernel and minimal Vapnik–Chervonenkis dimension. IEEE Trans on Know Data Eng 16(4):358–395
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
Vapnik VN (1998) Statistical Learning Theory. John Wiley and Sons, New York
Vapnik V, Chervonenkis A (1971) On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab Appl 16(2):264–280
Xuefeng L, Fang L (2002) Choosing multiple parameters for SVM based on genetic algorithm. In: International conference on signal processing (ICSP 2002) vol 1, pp 117–119
Zhang L, Zhou W, Jiao L (2000) Support vector machines based on scaling kernels. In: 6th international conference on signal processing, vol 2, pp 1142–1145
Zhang L, Zhou W, Jiao L (2004) Wavelet support vector machine. Trans Syst Man Cybern B Cybern 34:34–39
Zhou S, Wu L, Yuan X, Tan W (2007) Parameters selection of SVM for function approximation based on differential evolution. In: Proceedings of the international conference on intelligent systems and knowledge engineering (ISKE 2007), Chengdu, China
Acknowledgments
The authors acknowledge the financial support provided by the Thailand Research Fund, the Royal Golden Jubilee Ph.D. Program, and the 90th Anniversary of Chulalongkorn University Fund (Ratchadaphiseksomphot Endowment Fund). The authors also would like to thank Ananlada Chotimongkol for proofreading the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Phienthrakul, T., Kijsirikul, B. Evolutionary strategies for hyperparameters of support vector machines based on multi-scale radial basis function kernels. Soft Comput 14, 681–699 (2010). https://doi.org/10.1007/s00500-009-0458-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-009-0458-5