Elsevier

Neurocomputing

Volume 74, Issues 1–3, December 2010, Pages 447-456
Neurocomputing

Ordinal extreme learning machine

https://doi.org/10.1016/j.neucom.2010.08.022Get rights and content

Abstract

Recently, a new fast learning algorithm called Extreme Learning Machine (ELM) has been developed for Single-Hidden Layer Feedforward Networks (SLFNs) in G.-B. Huang, Q.-Y. Zhu and C.-K. Siew “[Extreme learning machine: theory and applications,” Neurocomputing 70 (2006) 489–501]. And, ELM has been successfully applied to many classification and regression problems. In this paper, the ELM algorithm is further studied for ordinal regression problems (named ORELM). We firstly proposed an encoding-based framework for ordinal regression which includes three encoding schemes: single multi-output classifier, multiple binary-classifications with one-against-all (OAA) decomposition method and one-against-one (OAO) method. Then, the SLFN was redesigned for ordinal regression problems based on the proposed framework and the algorithms are trained by the extreme learning machine in which input weights are assigned randomly and output weights can be decided analytically. Lastly widely experiments on three kinds of datasets were carried to test the proposed algorithm. The comparative results with such traditional methods as Gaussian Process for Ordinal Regression (ORGP) and Support Vector for Ordinal Regression (ORSVM) show that ORELM can obtain extremely rapid training speed and good generalization ability. Especially when the data set’s scalability increases, the advantage of ORELM will become more apparent. Additionally, ORELM has the following advantages, including the capabilities of learning in both online and batch modes and handling non-linear data.

Introduction

In machine learning, the classification and metric regression are two important supervised learning problems. Classification is a learning task that assigns labels to instances that belong to a finite set of object classes, while metric regression predicts one value in one continuous distinction for new instances. Ordinal regression, a setting bridging between classification and metric regression, is a learning task of predicting variables of ordinal scale. In contrast to metric regression problems, ordinal regression often requires the grades be discrete and finite. These grades are also different from the class labels in classification problems due to the existence of ranking information. The good ordinal regression methods should not only obtain good accuracy, but also hold the fast training speed and responding speed, the ability to deal with linear and non-linear problems and good online learning ability.

Many machine learning algorithms have been proposed or redesigned for ordinal regression [1], [2], including the perceptron [3] and its kernelized generalization [4], the neural network with gradient descent [2], [6], the Gaussian process [7], [8], the large margin classifier [10], [11], [29], the k-partite classifier [12], the boosting algorithm [13], [14], the constraint classification [15], regression trees [16], Naive Bayes [17], Bayesian hierarchical experts [18], the binary classification approach [19], [20], and the optimization of nonsmooth cost functions [21]. Although these ordinal regression methods have some interesting properties, they have also certain disadvantages. For example, Prank [14] is a fast online algorithm whose accuracy suffers when dealing with non-linear data. Large-margin methods [22] are accurate approaches, but they convert the ordinal relations into O(N2) (N is the number of data points) pairwise ranking constraints, and thus, are impractical for large size datasets. Ordinal support vector machines (ORSVM) [11], [10] are another powerful large-margin methods finding m−1 real value thresholds (m is the number of ranks), their optimization problem's complexity is O(N), and the prediction speed is slow when the support vector is not sparse. Similarly, the Gaussian process method for ordinal regression (ORGP) [7] has also the difficulty of handling large size datasets. Therefore, some better methods are expected to solve the ordinal regression problem.

Most of these ordinal regression algorithms are based on iterative learning for thresholds, for example, ORGP is the combination of Gaussian process (GP) with threshold-based learning, and ORSVM is the combination of support vector machine (SVM) with threshold-based learning. These threshold-based approaches are always relatively complex and long consuming for iterative learning steps. If we want to enhance learning speed, we have to give up this iterative learning approach. The encoding approach has been proved to be effective for multi-classification problems by experimental results reported in Allwein’s paper [9]. Allwein only discussed the encoding framework for multi-classification problems (called extended correcting output coding, ECOC), ordinal regression problem is different with classification problem as the ordinal relationship of categories exists. Therefore, while designing the encodings for classification, it need not consider this ordinal relationship. But, for ordinal regression, it is necessary to consider this point. In this paper, we proposed three encoding schemes for ordinal regression that can reflect these ordinal relationships. They are different with the coding for multi-classification. For example, considering of the order of the categories, if a data point x belongs to the kth rank, it is automatically classified into lower-order categories (1, 2, …, k−1). So the target output of x is encoded as t=[1, 1, ..., 1, −1, −1, ..., −1] where ti(1≤ik) is set to “1” and other elements “−1”. While in classification problem, if a data point x belongs to the kth category, it is will not be able to be classified into another category. So the target output of x will be encoded as t=[−1, −1, ..., 1, −1, −1, ..., −1]. This is our first contribution. This proposed framework can be combined with SVM, RVM and so on. In order to get a fast learning algorithm, we select the Extreme Learning Machine [23] and propose three ELM-based ordinal regression methods through designing three different output encoding schemes [9] and the corresponding model structure: learning with a single multi-output ELM (called ORELM-SingleELM), multi-binary ELM with One-against-All scheme (called ORELM-OAA) and multi-binary ELM with One-against-One scheme (called ORELM-OAO). This is our second attribution. Lastly, we analyze and test the proposed schemes by comparing with popular thresholds-based ones, and show that our scheme is very fast and is especially suitable for real-time application, e.g., user mode track and personalized recommendation. These proposed methods origins ELM's advantages, such as learning in both online [28], [30], [31], [32], [33] and batch mode [23], training on very large datasets, handling non-linear data, and extremely rapid training speed. Some comparative experiments will be done to show its performances.

The rest of the paper is arranged as follows. In Section 2, some related work, including ELM and ECOC, are introduced. The proposed ELM schemes for ordinal regression are presented in detail in Section 3, which include the ORELM-SingleELM, ORELM-OAA and ORELM-OAO schemes. In Section 4, the schemes' performances are evaluated and compared with existing ones. Finally, some conclusions are drawn and discussions are given in Section 5.

Section snippets

Extreme learning machine (ELM)

Recently, a new fast learning algorithm, Extreme Learning Machine (ELM), has been developed for Single-Hidden Layer Feedforward Networks (SLFNs) [23]. In ELM, input weights and biases can be randomly assigned and the output weights can be analytically determined by the simple generalized inverse operation [5]. Compared with traditional learning machines [34], [35], ELM not only learns much faster with higher generalization ability but also avoids many difficulties, such as the stopping

ELM for ordinal regression

Motivated by the ECOC-based unified framework for multi-classification [9], we propose an encoding-based framework for ordinal regression. The key difference is that ordinal regression has to consider the ordinal relationship among categories and we will give a detailed information in the following section. Based on the proposed framework, many supervised learning algorithms such as ELM, Relevance Vector Machine (RVM) [36] and Support Vector Machine (SVM)[34] can be easily applied to ordinal

Performance evaluation

In this section, we test the proposed schemes, i.e., ORELM-SingleELM, ORELM-OAO and ORELM-OAA, on quite a few benchmark datasets for ordinal regression, and compare them with existing works, ORGP [7] and ORSVM [11]. The simulations for ORELM-SingleELM, ORELM-OAO and ORELM-OAA are carried out in MATLAB 8.1 environment running on a Pentium 4, 2.7 GHz CPU. The simulations for ORGP and ORSVM are carried out by using compiled C-coded ORGP1 and ORSVM2

Conclusions

Different from threshold-based ordinal regression framework that find a model and the corresponding thresholds of ranks, e.g., ORSVM and ORGP, we proposed an encoding-based ordinal regression framework and three ELM-based ordinal regression algorithms. In our algorithms, three output coding matrixes are firstly designed for ORELM-SingleELM, ORELM-OAA and ORELM-OAO, respectively. Then, one or more ELMs are used to construct the prediction model. Compared with threshold-based algorithms, the

Acknowledgments

The work was partially supported by the National Science Foundation of China under Grant nos. 60825202, 60803079, 61070072 and 60633020, the National High-Tech R&D Program of China under Grant no. 2008AA01Z131, the National Key Technologies R&D Program of China under Grant nos. 2006BAK11B02, 2009BAH51B02 and 2006BAJ07B06, the Science Research Plan of Shaanxi Provincial Department of Education under Grant no. 09JK717, and the Key Projects in the National Science under Grant no. 2009BAH51B00,

Wanyu Deng received his B.S. degree in Computer Science and Technology in 2001, his M.S. degree in Software Technology and Theory in 2004 from Northwest Polytechnical University, China. He is now a Ph.D. candidate in the Department of Computer Science and Technology in Xi’an Jiaotong University, China. His research interests include machine learning, collaborative filtering and personalized service. He has participated in several national research projects and got FUJIXEROX scholarships from

References (36)

  • C.J.C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, G. Hullender, Learning to rank using gradient...
  • W. Chu et al.

    Gaussian processes for ordinal regression

    Journal of Machine Learning Research

    (2005)
  • A. Schwaighofer, V. Tresp, K. Yu, Hiearachical bayesian modelling with gaussian processes, in: Proceedings of the...
  • E.L. Allwein et al.

    Reducing multiclass to binary: a unifying approach for margin classifiers

    Journal of Machine Learning Research

    (2001)
  • A. Shashua, A. Levin, Ranking with large margin principle: two approaches, in: Proceedings of the Advances in Neural...
  • W. Chu et al.

    Support vector ordinal regression

    Neural Computation

    (2007)
  • S. Agarwal, D. Roth, Learnability of bipartite ranking functions, in: Proceedings of the 18th Annual Conference on...
  • Y. Freund et al.

    An efficient boosting algorithm for combining preferences

    Journal of Machine Learning Research

    (2003)
  • Cited by (0)

    Wanyu Deng received his B.S. degree in Computer Science and Technology in 2001, his M.S. degree in Software Technology and Theory in 2004 from Northwest Polytechnical University, China. He is now a Ph.D. candidate in the Department of Computer Science and Technology in Xi’an Jiaotong University, China. His research interests include machine learning, collaborative filtering and personalized service. He has participated in several national research projects and got FUJIXEROX scholarships from the Xi’an Jiaotong University. He is the author or co-author of more than 10 refereed international journal and conference papers covering topics of collaborative filtering, personalized service and artificial intelligence.

    Qinghua Zheng received his B.S. degree in computer software in 1990, his M.S. degree in computer organization and architecture in 1993, and his Ph.D. degree in system engineering in 1997, all from the Xi’an Jiaotong University, China. He is a professor with the Department of Computer Science and Technology in Xi’an Jiaotong University. He serves as the dean of the Department of Computer Science and Technology, and vice dean of E-Learning School of Xi’an Jiaotong University. His research areas include multimedia distance education, intelligent e-learning theory and algorithm, and computer network security. He was a Postdoctoral Researcher in Harvard University from February 2002 to October 2002 and a Visiting Professor Research in Hong Kong University from November 2004 to January 2005. He has published more than 90 papers, and held 13 patents. He won the National Science Fund for Distinguished Young Scholars in 2008. He got the First Prize for National Teaching Achievement, State Education Ministry in 2005 and the First Prize for Scientific and Technological Development of Shanghai City and Shaanxi Province in 2004 and 2003, respectively. Till now, he has held more than 10 national fund projects, including National Natural Science Foundation of China, National 863 Major Programs, and Key (Key grant) Projects of Chinese Ministry of Education, etc. He is the member of the Education Subcommittee of National Standard Committee of Information Technology and a member of the IEEE.

    Shiguo Lian got his B.S. degree and Ph.D. from the Nanjing University of Science and Technology, China. He was a research assistant in the City University of Hong Kong in 2004. Since July 2005, he has been a Research Scientist with France Telecom R&D (Orange Labs) Beijing. He is the author or co-author of more than 80 refereed international journal and conference papers covering topics of secure multimedia communication, intelligent multimedia analysis, and ubiquitous communication. He is the author/editor of 5 published books, has contributed 15 chapters to books and held 14 filed patents. He got the Nomination Prize of “Innovation Prize in France Telecom” and “Top 100 Doctorate Dissertation in Jiangsu Province” in 2006. He is a member of IEEE ComSoc Communications & Information Security Technical Committee (CIS TC), IEEE Multimedia Communications Technical Committee (MMTC), and IEEE Technical Committee on Nonlinear Circuits and Systems (TC NCAS). He is on the editor board of several international journals, and is the guest editor of Informatica, Soft Computing, Neural Network World, Applied Soft Computing, Intelligent Automation and Soft Computing (AutoSoft), Telecommunication Systems, and Computer Communications, etc. He is in the organization committee of some refereed conferences, and the reviewer of many refereed international journals and magazines.

    Lin Chen received his B.S. degree in Computer Science and Technology in 1999 from the Shaanxi Normal University, his M.S. degree in Software Technology and Theory in 2005 from the Northwest Polytechnical University, China. She is a lecturer in the Department of Computer Science and Technology in Xi’an Institute of Posts and Telecommunications, China. His research interests include collaborative filtering and personalized service.

    View full text