Skip to main content

Advertisement

Log in

A nonlinear least squares quasi-Newton strategy for LP-SVR hyper-parameters selection

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

This paper studies the problem of hyper-parameters selection for a linear programming-based support vector machine for regression (LP-SVR). The proposed model is a generalized method that minimizes a linear-least squares problem using a globalization strategy, inexact computation of first order information, and an existing analytical method for estimating the initial point in the hyper-parameters space. The minimization problem consists of finding the set of hyper-parameters that minimizes any generalization error function for different problems. Particularly, this research explores the case of two-class, multi-class, and regression problems. Simulation results among standard data sets suggest that the algorithm achieves statistically insignificant variability when measuring the residual error; and when compared to other methods for hyper-parameters search, the proposed method produces the lowest root mean squared error in most cases. Experimental analysis suggests that the proposed approach is better suited for large-scale applications for the particular case of an LP-SVR. Moreover, due to its mathematical formulation, the proposed method can be extended in order to estimate any number of hyper-parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Anguita D, Boni A, Ridella S, Rivieccio F, Sterpi D (2005) Theoretical and practical model selection methods for support vector classifiers. In: Support vector machines: theory and applications, Springer, Berlin, pp 159–179

  2. Anguita D, Ridella S, Rivieccio F, Zunino R (2003) Hyperparameter design criteria for support vector classifiers. Neurocomputing 55(1–2):109–134

    Article  Google Scholar 

  3. Argáez M, Velázquez L (2003) A new infeasible interior-point algorithm for linear programming. In: Proceedings of the 2003 conference on diversity in computing, TAPIA ’03, ACM, New York, pp 12–14. doi:10.1145/948542.948545

  4. Armijo L (1966) Minimization of functions having lipschitz continuous first partial derivatives. Pac J Math 16(1):1–3

    Article  MATH  MathSciNet  Google Scholar 

  5. Blackard J, Dean D (1999) Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Comput Electr Agric 24(3):131–151

    Article  Google Scholar 

  6. Cawley G (2006) Leave-one-out cross-validation based model selection criteria for weighted ls-svms. In: Proceedings of the IEEE international joint conference on neural networks, IJCNN’06, pp 1661–1668. doi:10.1109/IJCNN.2006.246634

  7. Chang M, Lin C (2005) Leave-one-out bounds for support vector regression model selection. Neural Comput 17(5):1188–1222

    Article  MATH  MathSciNet  Google Scholar 

  8. Cherkassky V, Ma Y (2004) Practical selection of svm parameters and noise estimation for svm regression. Neural Netw 17(1):113–126

    Article  MATH  Google Scholar 

  9. Collobert R, Bengio S (2001) Svmtorch: support vector machines for large-scale regression problems. J Mach Learn Res 1:143–160. doi:10.1162/15324430152733142

    MathSciNet  Google Scholar 

  10. Courant R, Hilbert D (1966) Methods of mathematical physics. Interscience, New York

  11. Dennis J, Schnabel R (1996) Numerical methods for unconstrained optimization and nonlinear equations. Society for Industrial Mathematics, Philadelphia

  12. Duan K, Keerthi S, Poo A (2003) Evaluation of simple performance measures for tuning SVM hyperparameters. Neurocomputing 51:41–59

    Article  Google Scholar 

  13. Fawcett T (2004) Roc graphs: notes and practical considerations for researchers. Mach Learn 31:1–38

    MathSciNet  Google Scholar 

  14. Fisher R (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188

    Article  Google Scholar 

  15. Forina M, Leardi R, Armanino C, Lanteri S (1998) PARVUS: an extendable package of programs for data exploration, classification and correlation. Institute of Pharmaceutical and Food Analysis Technologies, Genoa, Italy

  16. Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml

  17. Gorman R, Sejnowski T (1988) Analysis of hidden units in a layered network trained to classify sonar targets. Neural Netw 1(1):75–89

    Article  Google Scholar 

  18. Hart P, Duda R, Stork D (2001) Pattern classification. Wiley, New York

  19. Haykin SS (2009) Neural networks and learning machines. Prentice Hall, Upper Saddle River

  20. He Q, Wu C (2011) Separating theorem of samples in banach space for support vector machine learning. Int J Mach Learn Cybern 2(1):49–54

    Article  Google Scholar 

  21. Hestenes M (1975) Pseudoinversus and conjugate gradients. Commun ACM 18(1):40–43

    Article  MathSciNet  Google Scholar 

  22. Hui-ren Z, Pi-e Z (2008) Method for selecting parameters of least squares support vector machines based on GA and bootstrap. J Syst Simul 12:58. doi:http://en.cnki.com.cn/Article_en/CJFDTOTAL-XTFZ200607074.htm

    Google Scholar 

  23. Ito K, Nakano R (2003) Optimizing support vector regression hyperparameters based on cross-validation. In: Proceedings of the IEEE international Joint Conference on neural networks, vol 3, pp 2077–2082

  24. Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. Machine learning ECML-98, Computer Science Department, University of Dortmund, pp 137–142

  25. Joachims T (1999) Making large-scale support vector machine learning practical. In: Advances in kernel methods, MIT Press, Cambridge, pp 169–184

  26. Karasuyama M, Kitakoshi D, Nakano R (2006) Revised optimizer of svr hyperparameters minimizing cross-validation error. In: Proceedings of the IEEE international joint conference on neural networks, IJCNN’06, pp 319–326

  27. Karasuyama M, Nakano R (2007) Optimizing svr hyperparameters via fast cross-validation using aosvr. In: Proceedings of the IEEE international joint conference on neural networks, IJCNN 2007, pp 1186–1191

  28. Karsaz A, Mashhadi H, Mirsalehi M (2010) Market clearing price and load forecasting using cooperative co-evolutionary approach. Int J Electr Power Energy Syst 32(5):408–415

    Article  Google Scholar 

  29. Kay S (2006) Intuitive probability and random processes using MATLAB, 1st edn. Springer, Berlin. doi:10.1007/b104645

  30. Khemchandani R, Karpatne A, Chandra S (2012) Twin support vector regression for the simultaneous learning of a function and its derivatives. Int J Mach Learn Cybern, Springer, pp 1–13. doi:10.1007/s13042-012-0072-1

  31. Kinzett D, Zhang M, Johnston M (2008) Using numerical simplification to control bloat in genetic programming. Simul Evol Learn 5361:493–502. doi:10.1007/978-3-540-89694-4_50

    Article  Google Scholar 

  32. Kobayashi K, Kitakoshi D, Nakano R (2005) Yet faster method to optimize svr hyperparameters based on minimizing cross-validation error. In: Proceedings of the 2005 IEEE international joint conference on neural networks, IJCNN’05, vol 2, pp 871–876. doi:10.1109/IJCNN.2005.1555967

  33. Kohavi R (1996) Scaling up the accuracy of naive-Bayes classifiers: a decision-tree hybrid. In: Proceedings of the second international conference on knowledge discovery and data mining, vol 7. Menlo Park, AAAI Press, USA

  34. Lang K, Witbrock M (1988) Learning to tell two spirals apart. In: Proceedings of the 1988 connectionist models summer school, pp 52–59 (M. Kaufmann)

  35. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. doi:10.1109/5.726791

    Article  Google Scholar 

  36. Liu Z, Wu Q, Zhang Y, Philip Chen C (2011) Adaptive least squares support vector machines filter for hand tremor canceling in microsurgery. Int J Mach Learn Cybern 2(1):37–47. doi:10.1007/s13042-011-0012-5

    Article  Google Scholar 

  37. Lu Z, Sun J, Butts KR (2009) Linear programming support vector regression with wavelet kernel: a new approach to nonlinear dynamical systems identification. Math Comput Simul 79(7):2051–2063. doi:10.1016/j.matcom.2008.10.011

    Article  MATH  MathSciNet  Google Scholar 

  38. Ma J, Theiler J, Perkins S (2003) Accurate on-line support vector regression. Neural Comput 15(11):2683–2703. doi:10.1162/089976603322385117

    Article  MATH  Google Scholar 

  39. McDonald G, Schwing R (1973) Instabilities of regression estimates relating air pollution to mortality. Technometrics 15(3):463–481. doi:10.2307/1266852

    Article  Google Scholar 

  40. Mercer J (1909) Functions of positive and negative type, and their connection with the theory of integral equations. Philos Trans R Soc Lond Ser A (containing papers of a mathematical or physical character) 209:415–446. doi:10.1098/rsta.1909.0016

  41. Momma M, Bennett K (2002) A pattern search method for model selection of support vector regression. In: Proceedings of the SIAM international conference on data mining, SIAM, Philadelphia, pp 261–274

  42. Musa A (2012) Comparative study on classification performance between support vector machine and logistic regression. Int J Mach Learn Cybern, 1–12. doi:10.1007/s13042-012-0068-x

  43. Nierenberg D, Stukel T, Baron J, Dain B, Greenberg E (1989) Determinants of plasma levels of beta-carotene and retinol. Skin cancer prevention study group. Am J Epidemiol 130(3):511–521

    Google Scholar 

  44. Nocedal J, Wright S (1999) Numerical optimization. Springer, Berlin. doi:10.1007/b98874

  45. Ortiz-García E, Salcedo-Sanz S, Pérez-Bellido Á, Portilla-Figueras J (2009) Improving the training time of support vector regression algorithms through novel hyper-parameters search space reductions. Neurocomputing 72(16):3683–3691. doi:10.1016/j.neucom.2009.07.009

    Article  Google Scholar 

  46. Osuna E, Castro O (2002) Convex hull in feature space for support vector machines. In: Advances in artificial intelligence IBERAMIA 2002, lecture notes in computer science, vol 2527, Springer, Berlin, pp 411–419. doi:10.1007/3-540-36131-6_42

  47. Peng X (2010) Tsvr: an efficient twin support vector machine for regression. Neural Netw 23(3):365–372. doi:10.1016/j.neunet.2009.07.002

    Article  Google Scholar 

  48. Penrose K, Nelson A, Fisher A (1985) Generalized body composition prediction equation for men using simple measurement techniques. Med Sci Sports Exerc 2(17):189

    Article  Google Scholar 

  49. Platt J (1999) Using analytic qp and sparseness to speed training of support vector machines. In: Proceedings of the 1998 conference on Advances in neural information processing systems II, MIT Press, Cambridge, MA, USA, pp 557–563

  50. Quinlan J (1993) Combining instance-based and model-based learning. In: Proceedings of the 10th international conference on machine learning, pp 236–243

  51. Ren Y, Bai G (2010) Determination of optimal svm parameters by using ga/pso. J Comput 5(8):1160–1168. doi:10.4304/jcp.5.8.1160-1168

    Article  Google Scholar 

  52. Ripley B (2008) Pattern recognition and neural networks, 1st edn. Cambridge University Press, Cambridge

  53. Rivas-Perea P (2009) Southwestern US and northwestern mexico dust storm modeling trough moderate resolution imaging spectroradiometer data: a machine learning perspective. Technical report: NASA/UMBC/GEST graduate student summer program. http://gest.umbc.edu/student_opp/2009_gssp_reports.html

  54. Rivas Perea P (2011) Algorithms for training large-scale linear programming support vector regression and classification. PhD thesis, The University of Texas at El Paso

  55. Rivas-Perea P, Cota-Ruiz J (2012) An algorithm for training a large scale support vector machine for regression based on linear programming and decomposition methods. Pattern Recogn Lett (In Press). doi:10.1016/j.patrec.2012.10.026

  56. Schölkopf B, Smola A, Williamson R, Bartlett P (2000) New support vector algorithms. Neural Comput 12(5):1207–1245. doi:10.1162/089976600300015565

    Article  Google Scholar 

  57. Small K, Roth D (2010) Margin-based active learning for structured predictions. Int J Mach Learn Cybern 1(1–4):3–25. doi:10.1007/s13042-010-0003-y

    Article  Google Scholar 

  58. Smets K, Verdonk B, Jordaan E (2007) Evaluation of performance measures for svr hyperparameter selection. In: Proceedings of the IEEE international joint conference on neural networks, IJCNN 2007, pp. 637–642. doi:10.1109/IJCNN.2007.4371031

  59. Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222. doi:10.1023/B:STCO.0000035301.49549.88

    Article  MathSciNet  Google Scholar 

  60. Stark H, Woods J (2001) Probability and random processes with applications to signal processing, 3rd edn. Prentice-Hall, Upper Saddle River

  61. Torii Y, Abe S (2009) Decomposition techniques for training linear programming support vector machines. Neurocomputing 72(4-6):973–984. doi:10.1016/j.neucom.2008.04.008

    Article  Google Scholar 

  62. Vapnik V, Golowich S, Smola A (1997) Support vector method for function approximation, regression estimation, and signal processing. Adv Neural Inf Process Syst 9:281–287

    Google Scholar 

  63. Wang L (2005) Support vector machines: theory and applications, studies in fuzziness and soft computing, vol 177, Springer, Berlin

  64. Waugh S (1995) Extending and benchmarking cascade-correlation. PhD thesis, University of Tasmania, Tasmania

  65. Xiao JZ, Wang HR, Yang XC, Gao Z (2012) Multiple faults diagnosis in motion system based on svm. Int J Mach Learn Cybern 3(1):77–82. doi:10.1007/s13042-011-0035-y

    Article  Google Scholar 

  66. Xiaofang Y, Yaonan W (2008) Parameter selection of support vector machine for function approximation based on chaos optimization. J Syst Eng Electr 19(1):191–197. doi:10.1016/S1004-4132(08)60066-3

    Google Scholar 

  67. Xu Z, Huang K, Zhu J, King I, Lyu MR (2009) A novel kernel-based maximum a posteriori classification method. Neural Netw 22(7):977–987. doi:10.1016/j.neunet.2008.11.005

    Article  Google Scholar 

  68. Yeh I (1998) Modeling of strength of high-performance concrete using artificial neural networks. Cement and Concrete research 28(12):1797–1808. doi:10.1016/S0008-8846(98)00165-3

    Article  Google Scholar 

  69. Zhang JP, Li ZW, Yang J (2005) A parallel svm training algorithm on large-scale classification problems. In: Proceedings of the 2005 international conference on machine learning and cybernetics, vol 3, pp 1637–1641. doi:10.1109/icmlc.2005.1527207

  70. Zhang L, Zhou W (2010) On the sparseness of 1-norm support vector machines. Neural Netw 23(3):373–385. doi:10.1016/j.neunet.2009.11.012

    Article  Google Scholar 

  71. Zhang XQ, Gu CH (2007) Ch-svm based network anomaly detection. In: Proceedings of the 2007 international conference on machine learning and cybernetics, vol 6, pp 3261 –3266. doi:10.1109/icmlc.2007.4370710

Download references

Acknowledgments

The author P. R. P. performed part of this work while at NASA Goddard Space Flight Center as part of the Graduate Student Summer Program (GSSP 2009) under the supervision of Dr. James C. Tilton. This work was supported in part by the National Council for Science and Technology (CONACyT), Mexico, under Grant 193324/303732 and mentored by Dr. Greg Hamerly who is with the department of Computer Science at Baylor University. Finally, the authors acknowledge the support of the Large–Scale Multispectral Multidimensional Analysis (LSMMA) Laboratory (www.lsmmalab.com).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pablo Rivas-Perea.

Additional information

This work was supported in part by NASA Goddard Space Flight Center’s GSSP 2009 program and by the National Council for Science and Technology (CONACyT), Mexico, under Grant 193324/303732.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rivas-Perea, P., Cota-Ruiz, J. & Rosiles, JG. A nonlinear least squares quasi-Newton strategy for LP-SVR hyper-parameters selection. Int. J. Mach. Learn. & Cyber. 5, 579–597 (2014). https://doi.org/10.1007/s13042-013-0153-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-013-0153-9

Keywords

Navigation