Elsevier

Knowledge-Based Systems

Volume 159, 1 November 2018, Pages 203-220
Knowledge-Based Systems

Robust semi-supervised extreme learning machine

https://doi.org/10.1016/j.knosys.2018.06.029Get rights and content

Highlights

  • To effectively exploit the geometric information embedded in unlabeled datavia the manifold regularization term.

  • To have a good ability to reduce the negative influence of outliers by exploiting thenon-convex loss function.

  • To demonstrate the robustness of RSS-ELM in theory from the perspective of reweighted.

  • To be efficiently solved by the well known CCCP method.

  • Validity is investigated by comparing it with several related algorithms on multiple image datasets and UCI datasets.

Abstract

The existence of outliers among labeled data is a major challenge for semi-supervised learning. An effective method to handle this problem is to employ the non-convex loss functions, which give constant penalties to outliers to avoid their negative influences. Along this line, in this paper, by adopting the non-convex squared loss function, we propose a novel robust semi-supervised learning algorithm to overcome the limitation of the classical semi-supervised extreme learning machine (SS-ELM) that it is sensitivity to outliers, termed as robust SS-ELM, or RSS-ELM for short. After expressing the non-convex squared loss function by a difference of two convex ones, RSS-ELM is effectively solved with the help of the concave-convex procedure (CCCP) approach. For the specific implementation, RSS-ELM iteratively builds the output function by solving a sequence of linear systems at each iteration. Moreover, we analyze the computational complexity of RSS-ELM, and prove its convergence and robustness from a theoretical point of view. The proposed RSS-ELM includes the conventional ELM and SS-ELM as its special cases. Extensive experiments conducted across multiple image datasets and benchmark datasets validate that RSS-ELM not only inherits the advantages of semi-supervised learning, but also enjoys the merit of robustness.

Introduction

As a new and efficient single layer feedforward neural networks (SLFNs) training algorithm, extreme learning machine (ELM) [1], [2] has recently gained substantial attention in many research areas. By randomly generating the input weights and biases of the hidden layer and adopting the squared loss function, ELM analytically calculates the output weight under the framework of the regularized least squares (RLS). And it has been shown that ELM with randomly generated hidden neurons exhibits the universal approximation capability [3]. Due to its excellent performance in generalization ability, predicting accuracy and learning speed, ELM has been successfully applied to a wide range of domains [4], [5], [6], [7], [8], [9], [10], [11], including classification, regression, clustering, representational learning and so on. Moreover, recent studies have shown that ELM exhibits comparable or even better performance in prediction accuracy compared to the classical support vector machine (SVM) [12] which is a maximal margin classifier derived under the framework of structural risk minimization.

Most of the existing studies in ELM are supervised ones whose effectiveness depends on enough supervised information. However, in many real-world applications, such as text classification, natural language processing and information retrieval, the acquisition of sufficient labeled data is usually difficult while unlabeled ones are easy to be collected and available in large quantity. In such situations, the performance of supervised learning (SL) algorithms usually deteriorates since a lot of information carried by unlabeled data is simply ignored. To handle the problem, the semi-supervised learning (SSL) [13], which exploits a large number of unlabeled data to assist insufficient labeled ones for learning and ultimately improve the performance of classifiers, has arose. Therefore, it is natural to introduce ELM into the framework of SSL, which can greatly expand its applicability.

The manifold regularization (MR) [14], [15], [16], which tries to capture the geometric information from both the labeled and unlabeled data and makes the smoothness of classifiers along the intrinsic manifold via an additional regularization term, has been widely used in the area of SSL. Following the MR framework, a number of ELM-based SSL algorithms [17], [18], [19] have been presented to make full use of the geometric information to achieve better generalization performance. Moreover, some ELM-based SSL algorithms have been combined with other research directions, such as the multi-view learning, the online learning, the deep learning and so on. In this paper, we mainly focus on the classical semi-supervised ELM (SS-ELM) [19] which not only extends ELM for SSL based on the MR framework but also inherits the learning capability and computational efficiency of ELM.

It is well known that the quality of the finite labeled data is very important for SSL since it always assumes that the input label information is completely reliable. However, in practice, outliers among labeled data (label noises) are common to encounter and their wrong labels are easy to be propagated to the nearby unlabeled data, resulting in major classification errors [20]. Although, SS-ELM has produced good performance, it suffers from the same drawback: performance worsens when there are outliers among labeled data. In fact, SS-ELM fits the regularization learning framework Reg(w)+ci=1lv(f(xi),yi)+λfI2 with training data xi ∈ Rd and labels yi{1,1}. Here, Reg is the regularization term to avoid overfitting [10], v stands for the squared loss function to minimize the empirical error, fI2 is the smoothness penalty corresponding to the sample probability distribution and the factors c and λ are two positive regularization parameters.

Depending on the nature of the squared loss function, we know that it is convex and unbounded. The convexity explains the reason why it is widely adopted: in addition to the computational advantages, the convex functions have the amenability to theoretical analysis. However, the unboundedness means that the penalties given to outliers may be quite huge, so that the decision hyperplane of SS-ELM deviates from the optimal position, and ultimately the generalization performance of SS-ELM is damaged. In other words, outliers tend to obtain much larger errors than normal data and keep more influences in determining the final classifiers. Therefore, it is of great significance to research robust SS-ELM algorithms which can mitigate the negative effect of outliers among labeled data.

In recent years, the non-convex loss functions which play an important role in the fields of machine learning have received great attention. As shown in previous researches [21], [22], [23], [24], [25], [26], [27], substituting the convex loss functions by non-convex ones is a generally adopted strategy to improve the robustness of SL algorithms which only consider labeled data for their models training. Specifically, in the classification scenarios, scholars [21], [22], [23] have adopted the non-convex loss functions and utilized the iteratively reweighted least square(IRLS), the CCCP approach, the semi-definite programming and convex relaxation techniques to solve the optimization problems involved in the proposed robust SL algorithms. Moreover, researchers [24], [25], [26], [27] have applied similar ideas to construct robust ELM-based and SVM-based algorithms for regression estimation. It is worth noting that most of the studies focus on the robustness of SL algorithms, and there is almost no study on improving the robustness of ELM-based SSL algorithms from the perspective of loss functions.

In this paper, inspired by the studies above, we devote ourselves to enhancing the robustness of the classical SS-ELM [19] by adopting a novel non-convex squared loss function. Our motivation mainly depends on the fact that the classical squared loss function assigns the same importance to all error samples, i.e. all errors contribute the same way to the final solution. That is to say, outliers tend to keep more influences in determining the position of the decision hyperplane. In addition, the wrong labels of outliers are easy to be propagated to the nearby unlabeled data in semi-supervised classification scenarios. In light of these phenomenons, we adopt the non-convex squared loss function which gives constant penalties to outliers. Further, we propose the robust version of SS-ELM for semi-supervised binary-class problems, which is termed as robust SS-ELM or RSS-ELM for short. In other words, similarly to SS-ELM, RSS-ELM possesses the ability to make full use of the abundant unlabeled data, and moreover, it is more robust in the presence of outliers among labeled data. However, the non-convexity of the adopted loss function leads to the unavailability of the traditional optimization methods for solving the proposed RSS-ELM. In order to remedy this problem, we rewrite the non-convex squared loss function as a difference of two convex ones and obtain a difference of convex functions (DC) program [28]. Moreover, the CCCP [29], [30] approach is employed to transform the DC program into a sequence of linear systems. Experimental results on multiple datasets indicate that the proposed algorithm is robust against outliers.

In summary, by incorporating the properties of SSL and non-convex loss functions, the advantages of the proposed RSS-ELM are:

• To effectively exploit the geometric information embedded in unlabeled data;

• To have a good ability to reduce the negative influence of outliers by exploiting the non-convex loss function;

• The robustness of RSS-ELM can be proved from the perspective of theory;

• To be efficiently solved by the well known CCCP method;

• To achieve a fast convergence rate. In our experiments, the learning procedure always converges within limited iterations.

The remainder of this paper is organized as follows. The basic setups of ELM and SS-ELM are briefly reviewed in Section 2. In Section 3, the non-convex squared loss function is succinctly described and the detailed derivation of the proposed RSS-ELM is shown. Moreover, some discussions about RSS-ELM are given. After presenting the experimental results on multiple datasets in Section 4, we conclude this paper in Section 5.

Section snippets

Preliminaries

In this section, we provide brief reviews of ELM and SS-ELM, which are underlying basis of the proposed algorithm.

RSS-ELM

Although SS-ELM presented above has shown better properties than the plain ELM in semi-supervised environment, it loses robustness due to the use of the classical squared loss function. In order to solve this problem, we propose the robust SS-ELM (RSS-ELM) for semi-supervised binary-class problems by adopting a non-convex squared loss function which gives constant penalties to outliers. In this section, the non-convex squared loss function is shown at first, then the formulation and solution of

Experiments

In this section, the validity of the proposed RSS-ELM for semi-supervised binary-class problems is examined through a series of experiments designed on image datasets and benchmark datasets in the case of without outliers and with outliers. After the experimental setting is given in Section 4.1, we carefully analyze and compare the performance of RSS-ELM with several related algorithms on different datasets in Section 4.2.

Conclusion and future work

In this paper, we introduce the non-convex loss function into SS-ELM so as to construct a robust SSL algorithm, termed as robust SS-ELM or RSS-ELM for short. By using the non-convex squared loss function which gives constant penalties to outliers, the newly proposed algorithm can suppress the bad influences of outliers among labeled data and solve some practical semi-supervised problems in the presence of outliers. In practice, after introducing the employed loss function, we elaborate the

Acknowledgments

The work is supported by the National Natural Science Foundation of China (Grants Nos. 11171346 and 11626186). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.

References (37)

  • Z. Wang et al.

    Multi-view learning with universum

    Knowl. Based Syst.

    (2014)
  • S. Gourvénec et al.

    An evaluation of the polish smoothed regression and the monte carlo cross-validation for the determination of the complexity of a PLS model

    Chemom. Intell. Lab. Syst.

    (2003)
  • J. Demsar

    Statistical comparisons of classifiers over multiple data sets

    J. Mach. Learn. Res.

    (2006)
  • G. Huang et al.

    Extreme learning machine for regression and multiclass classification

    IEEE Trans. Syst. Man Cybern. Part B: Cybern

    (2012)
  • G. Huang et al.

    Universal approximation using incremental constructive feedforward networks with random hidden nodes

    IEEE Trans. Neural Netw.

    (2006)
  • H. Yu et al.

    ODOC-ELM: Optimal decision outputs compensation-based extreme learning machine for classifying imbalanced data

    Knowl. Based Syst.

    (2016)
  • L. Kasun et al.

    Representational learning with ELMs for big data

    IEEE Intell. Syst.

    (2013)
  • P. Escandell-Montero et al.

    Online fitted policy iteration based on extreme learning machines

    Knowl. Based Syst.

    (2016)
  • Cited by (27)

    • Jointly optimized ensemble deep random vector functional link network for semi-supervised classification

      2022, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      Zhou et al. combine manifold regularization and pairwise constraints regularization to improve the performance of semi-supervised elm (Zhou et al., 2015). To build a more robust architecture, Pei et al. (2018) adopts the non-convex squared loss function to avoid the limitation of the classical semi-supervised extreme learning machine (SS-ELM) that it is sensitive to outliers, termed as robust SS-ELM. In Ma et al. (2019), a Lagrangian-based semi-supervised extreme learning machine is proposed, which achieves better performance than traditional semi-supervised methods.

    • Robust semi-supervised classification based on data augmented online ELMs with deep features

      2021, Knowledge-Based Systems
      Citation Excerpt :

      2) Studying the feature representation model based on the attention mechanism that can be dynamically updated, so that the proposed SLI-OELM and CR-OELM can not only punish noisy samples, but also update the network structure and parameters of the feature representation model adopted by DF-DAELM. ( 3) Extending our proposed data augmented ELMs (SLI-OELM and CR-OELM) to non-convex loss functions [45,49,50]. Xiaochang Hu: Conceptualization, Methodology, Software .

    • Semi-supervised support vector regression based on data similarity and its application to rock-mechanics parameters estimation

      2021, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      Semi-supervised learning (SSL) solves problems by using a small amount of labeled data and a large amount of unlabeled data (Altinel and Ganiz, 2017). SSL for regression includes three main methods: graph-based method (Pei et al., 2018), kernel-based method (Wang et al., 2006), and Co-training (Ma and Wang, 2011; Zhou and Li, 2007; Kang et al., 2016). Co-training trains two kinds of learners separately from a different aspect, i.e., redundant view (Hajmohammadi et al., 2014), feature subspace and so on.

    View all citing articles on Scopus
    View full text