Abstract
We extend LS-SVM to ordinal regression, which has wide applications in many domains such as social science and information retrieval where human-generated data play an important role. Most current methods based on SVM for ordinal regression suffer from the problem of ignoring the distribution information reflected by the samples clustered around the centers of each class. This problem would degrade the performance of SVM-based methods since the classifiers only depend on the scattered samples on the border which induce large margin. Our method takes the samples clustered around class centers into account and has a competitive computational complexity. Moreover, our method would easily produce the optimal cut-points according to the prior class probabilities and hence may obtain more reasonable results when the prior class probabilities are not the same. Experiments on simulated datasets and benchmark datasets, especially on the real ordinal datasets, demonstrate the effectiveness of our method.




Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Notes
Since EBC is a framework of reducing ordinal regression problem to binary classification, the computational complexity varies from \(2N-n_1-n_K\) to KN when the parameters change.
The cut-points in this section are normalized by \(\frac{b_j}{\Vert w\Vert }\).
The datasets are available at http://www.gatsby.ucl.ac.uk/~chuwei/ordinalregression.html.
Because the partition for the first four datasets has been given by Chu, we just use these splits in our experiments for comparison purpose.
The datasets are available at the WEKA website (http://www.cs.waikato.ac.nz/ml/index.html).
References
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, Heidelberg
Cruz-Ramirez M, Fernandez JC, Valero A, Gutierrez PA, Hervas-Martnez C (2013) Multiobjective Pareto ordinal classification for predictive microbiology. In: Snášel V, Abraham A, Corchado ES (eds) Soft computing models in industrial and environmental applications, Springer, Berlin, Heidelberg, pp 153–162
Kramer S, Widmer G, Pfahringer B, DeGroeve M (2001) Prediction of ordinal classes using regression trees. Fundam Inf 47(1–2):1–13
Chu W, Keerthi SS (2007) Support vector ordinal regression. Neural Comput 19:792–815
McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Champman and Hall, London
Crammer K, Singer Y (2002) Pranking with ranking. In: Dietterich TG, Becher S, Ghahramani Z (eds) Advances in neural information processing systems 14, vol 1. MIT Press, Cambridge, pp 641–647
Herbrich R, Graepel T, Obermayer K (2000) Large margin rank boundaries for ordinal regression. Advances in large margin classifiers. MIT Press, Cambridge
Agresti A (2002) Categorical data analysis, 2nd edn. Wiley, New York
McCullagh P (1980) Regression models for ordinal data. J R Stat Soc Ser B 42:109–142
Boser B, Guyon I, Vapnik V (1992) A training algorithm for optimal margin classifier. In: Proceedings of the fifth annual ACM workshop on computational learning research. ACM, pp 144–52
Vapnik V (1998) Statistical learning theory. Wiley, New York
Cristianini N, Shawe-Taylor J (1999) An introduction to support vector machines. Cambridge University Press, Cambridge
Gonzalez L, Angulo C, Velasco F, Catala A (2006) Dual unification of bi-class support vector machine formulations. Pattern Recognit 39(7):1325–1332
Xue H, Chen S, Yang Q (2011) Structural regularized support vector machine: a framework for structural large margin classifier. IEEE Trans Neural Netw 22(4):573–587
Kim S, Park YJ, Toh K, Lee S (2010) SVM-based feature extraction for face recognition. Pattern Recognit 43(8):2871–2881
Chen Y, Su C, Yang T (2013) Rule extraction from support vector machines by genetic algorithms. Neural Comput Appl 23(3–4):729–739
Rosillo R, Giner J, Fuente D (2014) The effectiveness of the combined use of VIX and support vector machines on the prediction of SP 500. Neural Comput Appl 22(2):321–332
Azar AT, El-Said SA (2014) Performance analysis of support vector machines classifiers in breast cancer mammography recognition. Neural Comput Appl 24(5):1163–1177
Angulo C, Ruiz F, Gonzalez L, Ortega JA (2006) Multi-classification by using tri-class SVM. Neural Process Lett 23:90–101
Shashua A, Levin A (2003) Ranking with large margin principle: two approaches. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems 15, MIT Press, Cambridge, pp 961–968
Zhao B, Wang F, Zhang C (2009) Block-quantized support vector ordinal regression. IEEE Trans Neural Netw 20(5):882–890
Pelckmans K, Karsmakers P, Suykens JAK, De Moor B (2006) Ordinal least squares support vector machines—a discriminant analysis approach. In: Proceedings of the machine learning for signal processing (MLSP 2006), pp 1–8
Lin L, Lin HT (2007) Ordinal regression by extended binary classification. In: Advances in neural information processing systems 19. Proceedings of the 2006 Conference (NIPS 2006). MIT Press, pp 865–872
Cardoso JS, Pinto JF (2007) Learning to classify ordinal data: the data replication method. J Mach Learn Res 8:1393–1429
Sun BY, Li J, Wu DD (2010) Kernel discriminant learning for ordinal regression. IEEE Trans Knowl Data Eng 22(6):906–910
Kramer KA, Hall LO, Goldgof DB, Remsen A, Luo T (2009) Fast support vector machines for continuous data. IEEE Trans Syst Man Cybern Part B Cybern 39(4):989–1001
Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9:293–300
Suykens JAK, van Gestel T, De Brabanter J (2002) Least squares support vector machines. World Scientific, Singapore
Van Gestel T, Suykens JAK, Lanckriet G (2002) A Bayesian framework for least squares support vector machine classifiers, gaussian processes and Kernel Fisher discriminant analysis. Neural Comput 14(5):1115–1147
Adankon MM, Cheriet M (2009) Model selection for the LS-SVM. Application to handwriting recognition. Pattern Recognit 42(12):3264–3270
Adankon MM, Cheriet M, Biem A (2011) Semisupervised learning using Bayesian interpretation: application to LS-SVM. IEEE Trans Neural Netw 22(4):513–524
Evgeniou T, Pontil M, Poggio T (2001) Regularization networks and support vector machines. Adv Comput Math 13:1–50
Williams CKI (1998) Prediction with Gaussian process: from linear regression to linear prediction and beyond. In: Jordan MI (ed) Learning and inference in graphical models. Kluwer Academic Press, Dordrecht
Saunders C, Gammerman A, Vovk V (1998) Ridge regression learning algorithm in dual variables. In: Proceedings of the 15th International Conference on Machine Learning (ICML98), pp 515–521
Van Gestel T, Suykens JAK (2004) Benchmarking least squares support vector machine classifiers. Mach Learn 54:5–32
Fung G, Mangasarian OL (2001) Proximal support vector machine classifiers. ACM Special Internet Group on Management of Data AAAI, ACM, New York
Cevikalp H, Neamtu M, Barkana A (2007) The kernel common vector method: a novel nonlinear subspace classifier for pattern recognition. IEEE Trans Syst Man Cybern Part B Cybern 37(4):937–951
Müller K-R, Mika S, Rätsch G (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12(2):181–201
Lee YJ, Mangasarian OL (2001) SSVM: a smooth support vector machine. Comput Optim Appl 20(1):5–22
Francis FB, Jordan MI (2005) Predictive low-rank decomposition for kernel methods. In: Proceedings of the 22nd international conference on machine learning (ICML2005), pp 33–40
Gaudette L, Japkowicz N (2009) Evaluation methods for ordinal regression. Canadian AI 2009, LNAI 5549, pp 207–210
Waegeman W, Baetsb BD, Boullarta L (2008) ROC analysis in ordinalregression learning. Pattern Recognit Lett 29(1):1–9
Baccianella S, Esuli A, Sebastiani F (2009) Evaluation measures for ordinal regression. In: 2009 Ninth international conference on intelligent systems design and applications
Acknowledgments
This research is supported by the Natural Science Foundation of Guangdong Province under grants 2014A030310332 and 2014A030310414.
Author information
Authors and Affiliations
Corresponding author
Appendix: Proof of Proposition 1
Appendix: Proof of Proposition 1
By obtaining the expressions of \(\xi _j^i\) and \(\tilde{\xi }_j^i\) from the equality constraint of (11) and substituting them for \(\xi _j^i\) and \(\tilde{\xi }_j^i\) in (10), the QP problem of (10) and (11) is converted to
subject to
Define the following Lagrangian equation:
with Lagrange multipliers \(\eta _j\ge 0\). From the condition \(\frac{\partial {{\mathcal {L}}}}{\partial {b_j}}=0\) we get
where \(\eta _0=0\). Since \(b_1<b_2<\ldots <b_{K-1}\), from KKT conditions, we get \(\eta _j=0\) for \(j=1,2,\ldots ,K-2\). Therefore, from Eq. (29) we can obtain Eq. (12).
Rights and permissions
About this article
Cite this article
Zhang, N. Extended least squares support vector machines for ordinal regression. Neural Comput & Applic 27, 1497–1509 (2016). https://doi.org/10.1007/s00521-015-1948-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-015-1948-2