Abstract
Support Vector Machine (SVM) classifiers are high-performance classification models devised to comply with the structural risk minimization principle and to properly exploit the kernel artifice of nonlinearly mapping input data into high-dimensional feature spaces toward the automatic construction of better discriminating linear decision boundaries. Among several SVM variants, Least-Squares SVMs (LS-SVMs) have gained increased attention recently due mainly to their computationally attractive properties coming as the direct result of applying a modified formulation that makes use of a sum-squared-error cost function jointly with equality, instead of inequality, constraints. In this work, we present a flexible hybrid approach aimed at augmenting the proficiency of LS-SVM classifiers with regard to accuracy/generalization as well as to hyperparameter calibration issues. Such approach, named as Mixtures of Weighted Least-Squares Support Vector Machine Experts, centers around the fusion of the weighted variant of LS-SVMs with Mixtures of Experts models. After the formal characterization of the novel learning framework, simulation results obtained with respect to both binary and multiclass pattern classification problems are reported, ratifying the suitability of the novel hybrid approach in improving the performance issues considered.
Similar content being viewed by others
References
Adankon MM, Cherieta M (2007) Optimizing resources in model selection for support vector machine. Pattern Recognit 40:953–963. doi:10.1016/j.patcog.2006.06.012
An S, Liua W, Venkatesha S (2007) Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression. Pattern Recognit 40:2154–2162. doi:10.1016/j.patcog.2006.12.015
Andrzejak RG, Lehnertz K, Mormann F, Rieke C, David P, Elger CE (2001) Indications of nonlinear deterministic and finite dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Phys Rev E Stat Nonlin Soft Matter Phys 64(6):061907. doi:10.1103/PhysRevE.64.061907
Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2:121–167. doi:10.1023/A:1009715923555
Cawley GC (2001) Model selection for support vector machines via adaptive step-size tabu search. In: Proceedings of international conference on artificial neural networks and genetic algorithms, Prague, pp 434–437
Cawley GC (2006) Leave-one-out cross-validation based model selection criteria for weighted LS-SVMs. In: Proceedings of the international joint conference on neural networks. IEEE Press, Vancouver, pp 1661–1668
Cawley GC, Talbot NLC (2002) Improved sparse least-squares support vector machines. Neurocomputing 48:1025–1031. doi:10.1016/S0925-2312(02)00606-9
Cawley GC, Talbot NLC (2007) Preventing over-fitting during model selection via Bayesian regularisation of the hyper-parameters. J Mach Learn Res 8:841–861
Chapelle O, Vapnik V, Bousquet O, Mukherjee S (2002) Choosing multiple parameters for support vector machines. Mach Learn 46:131–159. doi:10.1023/A:1012450327387
Cherkassky V, Ma Y (2004) Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw 17:113–126. doi:10.1016/S0893-6080(03)00169-2
Collobert R, Bengio S, Bengio Y (2002) A parallel mixture of SVMs for very large scale problems. Neural Comput 14:1105–1114. doi:10.1162/089976602753633402
Cristianini N, Shawe-Taylor J (2000) An Introduction to support vector machines. Cambridge University Press, London
de Diego IM, Moguerza JM, Muñoz A (2004) Combining kernel information for support vector classification. In: Proceedings of the international workshop on multiple classifier systems. Lecture notes in computer science, vol 3077. Springer, Berlin, pp 102–111
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39:1–38
Friedrichs F, Igel C (2005) Evolutionary tuning of multiple SVM parameters. Neurocomputing 64:107–117. doi:10.1016/j.neucom.2004.11.022
Furey TS, Duffy N, Cristianini N, Bednarski D, Schummer M, Haussler D (2000) Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16:906–914. doi:10.1093/bioinformatics/16.10.906
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, Heidelberg
Haykin S (1999) Neural networks––a comprehensive foundation. Prentice Hall, New York
Hsu C-W, Lin C-J (2002) A comparison of methods for multi-class support vector machines. IEEE Trans Neural Netw 13:415–425. doi:10.1109/72.991427
Jacobs R, Jordan M, Nowlan S, Hinton G (1991) Adaptive mixtures of local experts. Neural Comput 3:79–87. doi:10.1162/neco.1991.3.1.79
Joachims T (2000) Estimating the generalization performance of an SVM efficiently. In: Proceedings of 17th international conference on machine learning. Morgan Kaufmann Publishers, San Francisco, pp 431–438
Jordan M, Jacobs R (1994) Hierarchical mixtures of experts and the EM algorithm. Neural Comput 6:181–214. doi:10.1162/neco.1994.6.2.181
Kwok JT-Y (1998) Support vector mixture for classification and regression problems. In: Proceedins of the 14th international conference on pattern recognition, Brisbane, pp 255–258
Lima CAM, Coelho ALV, Von Zuben FJ (2002) Ensembles of support vector machines for regression problems. In: Proceedings of the international joint conference on neural networks. IEEE Press, Hawaii, pp 2381–2386
Lima CAM, Coelho ALV, Von Zuben FJ (2002) Model selection based on VC-dimension for heterogeneous ensembles of support vector machines. In: Proceedings of the 4th international conference on recent advances in soft computing. Nottingham University Press, Nottingham, pp 459–464
Lima CAM, Coelho ALV, Von Zuben FJ (2007) Hybridizing mixtures of experts with support vector machines: investigation into nonlinear dynamic systems identification. Inf Sci 177:2049–2074. doi:10.1016/j.ins.2007.01.009
McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering. Marcel Deckker, Inc., New York
Moerland P (1999) Classification using localized mixture of experts. In: Proceedings of ninth international conference on artificial neural networks, vol 2, Edinburgh, pp 838–843
Pelckmans K, Suykens JAK, De Moor B (2005) Building sparse representations and structure determination on LS-SVM substrates. Neurocomputing 64:137–159. doi:10.1016/j.neucom.2004.11.029
Schölkopf B, Platt J, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13:1443–1471. doi:10.1162/089976601750264965
Schölkopf B, Smola A (2002) Learning with kernels. The MIT Press, Cambridge
Subasi A (2007) EEG signal classification using wavelet feature extraction and a mixture of expert model. Expert Syst Appl 32:1084–1093. doi:10.1016/j.eswa.2006.02.005
Suykens JAK, Vandewalle J (1999) Least squares support machine classifiers. Neural Process Lett 9:293–300. doi:10.1023/A:1018628609742
Suykens JAK, Lukas L, Van Dooren P, De Moor B, Vandewalle J (1999) Least squares support vector machine classifiers: a large scale algorithm. In: Proceedings of European conference on circuit theory and design, Italy, pp 839–842
Suykens JAK, De Brabanter J, Lukas L, Vandewalle J (2002) Weighted least squares support vector machines: robustness and sparse approximation. Neurocomputing 48:85–105. doi:10.1016/S0925-2312(01)00644-0
Suykens JAK, Van Gestel T, De Brabanter J, De Moor B, Vandewalle J (2002) Least squares support vector machines. World Scientific Pub, Singapore
Tikhonov AN, Arsenim VY (1977) Solutions of Ill-posed problems. W. H. Winston, Washington
Van Gestel T, Suykens JAK, Baesens B, Viaene S, Vanthienen J, Dedene G, De Moor B, Vandewalle J (2004) Benchmarking least squares support vector machine classifiers. Mach Learn 54:5–32. doi:10.1023/B:MACH.0000008082.80494.e0
Vapnik VN (1998) Statistical learning theory. Wiley, New York
Wahba G (1998) Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV. In: Schölkopf B, Burges C, Smola A (eds) Advances in kernel methods: support vector machines. The MIT Press, Cambridge, pp 69–88
Webb A (1999) Statistical pattern recognition. Wiley, New York
Acknowledgments
Fapesp sponsored the work of the first author via process # 04/09597-0, CNPq/Funcap sponsored the work of the second author via process #23661-04, and CNPq sponsored the work of the third author via grant #303214/2007-0.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lima, C.A.M., Coelho, A.L.V. & Von Zuben, F.J. Pattern classification with mixtures of weighted least-squares support vector machine experts. Neural Comput & Applic 18, 843–860 (2009). https://doi.org/10.1007/s00521-008-0210-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-008-0210-6