Abstract
This paper studies the problem of hyper-parameters selection for a linear programming-based support vector machine for regression (LP-SVR). The proposed model is a generalized method that minimizes a linear-least squares problem using a globalization strategy, inexact computation of first order information, and an existing analytical method for estimating the initial point in the hyper-parameters space. The minimization problem consists of finding the set of hyper-parameters that minimizes any generalization error function for different problems. Particularly, this research explores the case of two-class, multi-class, and regression problems. Simulation results among standard data sets suggest that the algorithm achieves statistically insignificant variability when measuring the residual error; and when compared to other methods for hyper-parameters search, the proposed method produces the lowest root mean squared error in most cases. Experimental analysis suggests that the proposed approach is better suited for large-scale applications for the particular case of an LP-SVR. Moreover, due to its mathematical formulation, the proposed method can be extended in order to estimate any number of hyper-parameters.
Similar content being viewed by others
References
Anguita D, Boni A, Ridella S, Rivieccio F, Sterpi D (2005) Theoretical and practical model selection methods for support vector classifiers. In: Support vector machines: theory and applications, Springer, Berlin, pp 159–179
Anguita D, Ridella S, Rivieccio F, Zunino R (2003) Hyperparameter design criteria for support vector classifiers. Neurocomputing 55(1–2):109–134
Argáez M, Velázquez L (2003) A new infeasible interior-point algorithm for linear programming. In: Proceedings of the 2003 conference on diversity in computing, TAPIA ’03, ACM, New York, pp 12–14. doi:10.1145/948542.948545
Armijo L (1966) Minimization of functions having lipschitz continuous first partial derivatives. Pac J Math 16(1):1–3
Blackard J, Dean D (1999) Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Comput Electr Agric 24(3):131–151
Cawley G (2006) Leave-one-out cross-validation based model selection criteria for weighted ls-svms. In: Proceedings of the IEEE international joint conference on neural networks, IJCNN’06, pp 1661–1668. doi:10.1109/IJCNN.2006.246634
Chang M, Lin C (2005) Leave-one-out bounds for support vector regression model selection. Neural Comput 17(5):1188–1222
Cherkassky V, Ma Y (2004) Practical selection of svm parameters and noise estimation for svm regression. Neural Netw 17(1):113–126
Collobert R, Bengio S (2001) Svmtorch: support vector machines for large-scale regression problems. J Mach Learn Res 1:143–160. doi:10.1162/15324430152733142
Courant R, Hilbert D (1966) Methods of mathematical physics. Interscience, New York
Dennis J, Schnabel R (1996) Numerical methods for unconstrained optimization and nonlinear equations. Society for Industrial Mathematics, Philadelphia
Duan K, Keerthi S, Poo A (2003) Evaluation of simple performance measures for tuning SVM hyperparameters. Neurocomputing 51:41–59
Fawcett T (2004) Roc graphs: notes and practical considerations for researchers. Mach Learn 31:1–38
Fisher R (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188
Forina M, Leardi R, Armanino C, Lanteri S (1998) PARVUS: an extendable package of programs for data exploration, classification and correlation. Institute of Pharmaceutical and Food Analysis Technologies, Genoa, Italy
Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
Gorman R, Sejnowski T (1988) Analysis of hidden units in a layered network trained to classify sonar targets. Neural Netw 1(1):75–89
Hart P, Duda R, Stork D (2001) Pattern classification. Wiley, New York
Haykin SS (2009) Neural networks and learning machines. Prentice Hall, Upper Saddle River
He Q, Wu C (2011) Separating theorem of samples in banach space for support vector machine learning. Int J Mach Learn Cybern 2(1):49–54
Hestenes M (1975) Pseudoinversus and conjugate gradients. Commun ACM 18(1):40–43
Hui-ren Z, Pi-e Z (2008) Method for selecting parameters of least squares support vector machines based on GA and bootstrap. J Syst Simul 12:58. doi:http://en.cnki.com.cn/Article_en/CJFDTOTAL-XTFZ200607074.htm
Ito K, Nakano R (2003) Optimizing support vector regression hyperparameters based on cross-validation. In: Proceedings of the IEEE international Joint Conference on neural networks, vol 3, pp 2077–2082
Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. Machine learning ECML-98, Computer Science Department, University of Dortmund, pp 137–142
Joachims T (1999) Making large-scale support vector machine learning practical. In: Advances in kernel methods, MIT Press, Cambridge, pp 169–184
Karasuyama M, Kitakoshi D, Nakano R (2006) Revised optimizer of svr hyperparameters minimizing cross-validation error. In: Proceedings of the IEEE international joint conference on neural networks, IJCNN’06, pp 319–326
Karasuyama M, Nakano R (2007) Optimizing svr hyperparameters via fast cross-validation using aosvr. In: Proceedings of the IEEE international joint conference on neural networks, IJCNN 2007, pp 1186–1191
Karsaz A, Mashhadi H, Mirsalehi M (2010) Market clearing price and load forecasting using cooperative co-evolutionary approach. Int J Electr Power Energy Syst 32(5):408–415
Kay S (2006) Intuitive probability and random processes using MATLAB, 1st edn. Springer, Berlin. doi:10.1007/b104645
Khemchandani R, Karpatne A, Chandra S (2012) Twin support vector regression for the simultaneous learning of a function and its derivatives. Int J Mach Learn Cybern, Springer, pp 1–13. doi:10.1007/s13042-012-0072-1
Kinzett D, Zhang M, Johnston M (2008) Using numerical simplification to control bloat in genetic programming. Simul Evol Learn 5361:493–502. doi:10.1007/978-3-540-89694-4_50
Kobayashi K, Kitakoshi D, Nakano R (2005) Yet faster method to optimize svr hyperparameters based on minimizing cross-validation error. In: Proceedings of the 2005 IEEE international joint conference on neural networks, IJCNN’05, vol 2, pp 871–876. doi:10.1109/IJCNN.2005.1555967
Kohavi R (1996) Scaling up the accuracy of naive-Bayes classifiers: a decision-tree hybrid. In: Proceedings of the second international conference on knowledge discovery and data mining, vol 7. Menlo Park, AAAI Press, USA
Lang K, Witbrock M (1988) Learning to tell two spirals apart. In: Proceedings of the 1988 connectionist models summer school, pp 52–59 (M. Kaufmann)
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. doi:10.1109/5.726791
Liu Z, Wu Q, Zhang Y, Philip Chen C (2011) Adaptive least squares support vector machines filter for hand tremor canceling in microsurgery. Int J Mach Learn Cybern 2(1):37–47. doi:10.1007/s13042-011-0012-5
Lu Z, Sun J, Butts KR (2009) Linear programming support vector regression with wavelet kernel: a new approach to nonlinear dynamical systems identification. Math Comput Simul 79(7):2051–2063. doi:10.1016/j.matcom.2008.10.011
Ma J, Theiler J, Perkins S (2003) Accurate on-line support vector regression. Neural Comput 15(11):2683–2703. doi:10.1162/089976603322385117
McDonald G, Schwing R (1973) Instabilities of regression estimates relating air pollution to mortality. Technometrics 15(3):463–481. doi:10.2307/1266852
Mercer J (1909) Functions of positive and negative type, and their connection with the theory of integral equations. Philos Trans R Soc Lond Ser A (containing papers of a mathematical or physical character) 209:415–446. doi:10.1098/rsta.1909.0016
Momma M, Bennett K (2002) A pattern search method for model selection of support vector regression. In: Proceedings of the SIAM international conference on data mining, SIAM, Philadelphia, pp 261–274
Musa A (2012) Comparative study on classification performance between support vector machine and logistic regression. Int J Mach Learn Cybern, 1–12. doi:10.1007/s13042-012-0068-x
Nierenberg D, Stukel T, Baron J, Dain B, Greenberg E (1989) Determinants of plasma levels of beta-carotene and retinol. Skin cancer prevention study group. Am J Epidemiol 130(3):511–521
Nocedal J, Wright S (1999) Numerical optimization. Springer, Berlin. doi:10.1007/b98874
Ortiz-García E, Salcedo-Sanz S, Pérez-Bellido Á, Portilla-Figueras J (2009) Improving the training time of support vector regression algorithms through novel hyper-parameters search space reductions. Neurocomputing 72(16):3683–3691. doi:10.1016/j.neucom.2009.07.009
Osuna E, Castro O (2002) Convex hull in feature space for support vector machines. In: Advances in artificial intelligence IBERAMIA 2002, lecture notes in computer science, vol 2527, Springer, Berlin, pp 411–419. doi:10.1007/3-540-36131-6_42
Peng X (2010) Tsvr: an efficient twin support vector machine for regression. Neural Netw 23(3):365–372. doi:10.1016/j.neunet.2009.07.002
Penrose K, Nelson A, Fisher A (1985) Generalized body composition prediction equation for men using simple measurement techniques. Med Sci Sports Exerc 2(17):189
Platt J (1999) Using analytic qp and sparseness to speed training of support vector machines. In: Proceedings of the 1998 conference on Advances in neural information processing systems II, MIT Press, Cambridge, MA, USA, pp 557–563
Quinlan J (1993) Combining instance-based and model-based learning. In: Proceedings of the 10th international conference on machine learning, pp 236–243
Ren Y, Bai G (2010) Determination of optimal svm parameters by using ga/pso. J Comput 5(8):1160–1168. doi:10.4304/jcp.5.8.1160-1168
Ripley B (2008) Pattern recognition and neural networks, 1st edn. Cambridge University Press, Cambridge
Rivas-Perea P (2009) Southwestern US and northwestern mexico dust storm modeling trough moderate resolution imaging spectroradiometer data: a machine learning perspective. Technical report: NASA/UMBC/GEST graduate student summer program. http://gest.umbc.edu/student_opp/2009_gssp_reports.html
Rivas Perea P (2011) Algorithms for training large-scale linear programming support vector regression and classification. PhD thesis, The University of Texas at El Paso
Rivas-Perea P, Cota-Ruiz J (2012) An algorithm for training a large scale support vector machine for regression based on linear programming and decomposition methods. Pattern Recogn Lett (In Press). doi:10.1016/j.patrec.2012.10.026
Schölkopf B, Smola A, Williamson R, Bartlett P (2000) New support vector algorithms. Neural Comput 12(5):1207–1245. doi:10.1162/089976600300015565
Small K, Roth D (2010) Margin-based active learning for structured predictions. Int J Mach Learn Cybern 1(1–4):3–25. doi:10.1007/s13042-010-0003-y
Smets K, Verdonk B, Jordaan E (2007) Evaluation of performance measures for svr hyperparameter selection. In: Proceedings of the IEEE international joint conference on neural networks, IJCNN 2007, pp. 637–642. doi:10.1109/IJCNN.2007.4371031
Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222. doi:10.1023/B:STCO.0000035301.49549.88
Stark H, Woods J (2001) Probability and random processes with applications to signal processing, 3rd edn. Prentice-Hall, Upper Saddle River
Torii Y, Abe S (2009) Decomposition techniques for training linear programming support vector machines. Neurocomputing 72(4-6):973–984. doi:10.1016/j.neucom.2008.04.008
Vapnik V, Golowich S, Smola A (1997) Support vector method for function approximation, regression estimation, and signal processing. Adv Neural Inf Process Syst 9:281–287
Wang L (2005) Support vector machines: theory and applications, studies in fuzziness and soft computing, vol 177, Springer, Berlin
Waugh S (1995) Extending and benchmarking cascade-correlation. PhD thesis, University of Tasmania, Tasmania
Xiao JZ, Wang HR, Yang XC, Gao Z (2012) Multiple faults diagnosis in motion system based on svm. Int J Mach Learn Cybern 3(1):77–82. doi:10.1007/s13042-011-0035-y
Xiaofang Y, Yaonan W (2008) Parameter selection of support vector machine for function approximation based on chaos optimization. J Syst Eng Electr 19(1):191–197. doi:10.1016/S1004-4132(08)60066-3
Xu Z, Huang K, Zhu J, King I, Lyu MR (2009) A novel kernel-based maximum a posteriori classification method. Neural Netw 22(7):977–987. doi:10.1016/j.neunet.2008.11.005
Yeh I (1998) Modeling of strength of high-performance concrete using artificial neural networks. Cement and Concrete research 28(12):1797–1808. doi:10.1016/S0008-8846(98)00165-3
Zhang JP, Li ZW, Yang J (2005) A parallel svm training algorithm on large-scale classification problems. In: Proceedings of the 2005 international conference on machine learning and cybernetics, vol 3, pp 1637–1641. doi:10.1109/icmlc.2005.1527207
Zhang L, Zhou W (2010) On the sparseness of 1-norm support vector machines. Neural Netw 23(3):373–385. doi:10.1016/j.neunet.2009.11.012
Zhang XQ, Gu CH (2007) Ch-svm based network anomaly detection. In: Proceedings of the 2007 international conference on machine learning and cybernetics, vol 6, pp 3261 –3266. doi:10.1109/icmlc.2007.4370710
Acknowledgments
The author P. R. P. performed part of this work while at NASA Goddard Space Flight Center as part of the Graduate Student Summer Program (GSSP 2009) under the supervision of Dr. James C. Tilton. This work was supported in part by the National Council for Science and Technology (CONACyT), Mexico, under Grant 193324/303732 and mentored by Dr. Greg Hamerly who is with the department of Computer Science at Baylor University. Finally, the authors acknowledge the support of the Large–Scale Multispectral Multidimensional Analysis (LSMMA) Laboratory (www.lsmmalab.com).
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported in part by NASA Goddard Space Flight Center’s GSSP 2009 program and by the National Council for Science and Technology (CONACyT), Mexico, under Grant 193324/303732.
Rights and permissions
About this article
Cite this article
Rivas-Perea, P., Cota-Ruiz, J. & Rosiles, JG. A nonlinear least squares quasi-Newton strategy for LP-SVR hyper-parameters selection. Int. J. Mach. Learn. & Cyber. 5, 579–597 (2014). https://doi.org/10.1007/s13042-013-0153-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-013-0153-9