Elsevier

Neurocomputing

Volume 314, 7 November 2018, Pages 251-266
Neurocomputing

MSSBoost: A new multiclass boosting to semi-supervised learning

https://doi.org/10.1016/j.neucom.2018.06.047Get rights and content

Highlights

  • The new multiclass loss function is proposed to semi-supervised learning.

  • Pairwise similarity between points and classifier prediction are combined to build the loss function.

  • The boosting framework is used to derive a new boosting algorithm to multiclass semi-supervised learning from the resulting loss function using Gradient descent in functional space.

  • A new boosting algorithm is then proposed to learn from the similarity functions in Matrix functional space.

  • The resulting optimization problem minimizes the inconsistency between the similarity function and classifier prediction.

  • The proposed loss function also minimizes the cost of margin in labeled data as well as the pseudo-margin cost of unlabeled data.

  • The n-regular approach is used to formulate the multiclass classification problems.

Abstract

In this article, we focus on the multiclass classification problem to semi-supervised learning. Semi-supervised learning is a learning task from both labeled and unlabeled data points. We formulate the multiclass semi-supervised classification problem as an optimization problem. In this formulation, we combine the classifier predictions, based on the labeled data, and the pairwise similarity. The goal here is to minimize the inconsistency between classifier predictions and the pairwise similarity. A boosting algorithm is proposed to solve the multiclass classification problem directly. The proposed multiclass approach uses a new multiclass formulation to loss function, which includes two terms. The first term is the multiclass margin cost of the labeled data and the second term is a regularization term on unlabeled data. The regularization term is used to minimize the inconsistency between the pairwise similarity and the classifier predictions. It in fact assigns the soft labels weighted with the similarity between unlabeled and labeled examples. First, the gradient descent approach is used to solve the resulting optimization problem and derive a boosting algorithm, named MSSBoost. The derived algorithm also uses a learning optimal similarity function for a given data. The second approach to solve the optimization problem is to apply the coordinate gradient descent. The resulting algorithm is called CD-MSSB. We also use a variation of CD-MSSB in the experiments. The results of our experiments on a number of UCI and real-world text classification benchmark datasets show that MSSBoost and CD-MSSB outperform the state-of-the-art boosting methods to multiclass semi-supervised learning. Another observation is that the proposed methods exploit the informative unlabeled data.

Introduction

Supervised learning algorithms are basically effective when there are sufficient. labeled data points However, in many real-world application domains, such as object detection, document and web-page categorization, and medical domains labeled data are difficult, expensive, or time consuming to obtain easily, because they typically require empirical research or experienced human annotators to assign label [37]. Semi-supervised learning algorithms employ not only the labeled data, but also the unlabeled data to build an adequate classification model. The goal of semi-supervised learning is to use unlabeled examples and combine the implicit information in the unlabeled data with the explicit classification information of labeled data to improve the classification performance. The main issue of the semi-supervised learning algorithms is how to find a set of informative data points from the unlabeled data. A number of different algorithms have been proposed to semi-supervised learning, such as generative models [20], [27], self-training [37], [40], co-training [6], [34], Transductive Support Vector Machine (TSVM) [18], Semi-Supervised SVM (S3VM) [4], graph-based methods [3], [29], [42], [43], and boosting based semi-supervised learning methods [5], [7], [8], [23], [35], [36]. The main focus of this article is based on the boosting approach to multiclass semi-supervised learning.

Boosting framework is a popular approach to supervised learning. In boosting a set of weak learners is used to build a strong classification model. Therefore, this is well-motivated to extend the boosting approach to semi-supervised classification problems. In [7] a boosting algorithm is presented, called MarginBoost, to semi-supervised learning using a new definition for the pseudo-margin to unlabeled data. [5] uses the same approach but different pseudo-margin definition to unlabeled data. The main issue in these approaches is that although these methods can improve the classification margin, they do not provide information from the unlabeled examples, such as similarity between examples or marginal distributions. Consequently, the new classifier that is trained on newly-labeled examples, is likely to share the same decision boundary with the first classifier instead of constructing a new one. The reason is that by adapting the decision boundary the poor predictions will not gain higher confidence instead the examples with high classification confidence will gain even higher confidence, see [8], [23], and [36]. Most recently, new boosting methods are proposed to semi-supervised classification problems, e.g. SemiBoost [23] and RegBoost [8], which use both the classifier predictions and pairwise similarity to maximize the margin. In this approach the pairwise similarity information between labeled and unlabeled data is used to guide the resulting classifier to assign more reliable pseudo-labels to unlabeled examples. The experimental results show that these boosting approaches outperform the state-of-the-art methods in this filed [19], [21], [46] and are comparable to LapSVM [3]. The key advantage of the boosting approach is that it can boost any type of base learners and is not limited to specific base learner.

The aforementioned approaches are basically proposed to solve the binary semi-supervised classification problems. Two main approaches can be used to handle the multiclass classification problems. The first approach converts the multiclass problem into a set of binary classification problems. Examples of this method include one-vs-all, one-vs-one, error-correcting output code [2], [9]. This approach may have various problems, such as imbalanced class distributions, increased complexity, no guarantee to have an optimal joint classifier or probability estimation, and different scales for the outputs of generated binary classifiers which complicates combining them, see [26], [30], [44]. The second approach uses a multiclass classifier directly to solve the multiclass classification problem. Although a number of approaches have been recently presented to multiclass semi-supervised classification problems, e.g. [35], [36], [41], none of them has been shown to maximize the multiclass margin properly, which is the aim of this article.

The second important point in many promising semi-supervised learning approaches especially for those that are based on graphs or pairwise similarity, e.g. LapSVM [3], SemiBoost [23], RegBoost [8] and MSAB [35], [36], is that the performance of the algorithm strongly depends on the used pairwise similarity function. A good quality similarity measure can significantly influence the classification performance of the learning algorithm. However, there is no unique method that can effectively measure the pairwise similarity between data points. Hence, the aforementioned methods suffer from the lack of an adequate similarity function. Recently a number of methods have been proposed to distance/similarity learning in the context of classification, clustering, and information retrieval, see [15], [17], [33]. Most of these works, have been presented to learn the Mahalanobis distance function, e.g. [15]. These approaches often use the parametric Mahalanobis distance in combination with K-means or EM clustering methods as well as the constraint-based approach in order to learn the optimized Mahalanobis function. Hillel and Weinshall [16] proposes the approach to specific application of distance/similarity for continuous variables using gaussian assumption. More recently, in [28] a semi-supervised metric learning is presented using entropy maximization approach. The proposed approach tends to optimize the distance function by optimizing the probability parameterized by that distance function. A new boosting approach is proposed in [32] to learn a Mahalanobis distance to supervised learning. In this article, we propose a new form of boosting framework for learning the optimal similarity function to multiclass semi-supervised classification problem.

The main contribution of this article is to introduce a new loss function formulation to the multiclass semi-supervised classification problems. Our proposed approach in this article uses the regular simplex vertices as a new formulation to the multiclass classification problems and combines the similarity information between labeled and unlabeled data with the classifier predictions to assign pseudo-labels to unlabeled data using a new boosting formulation. We propose a new multiclass exponential loss function to semi-supervised learning, which includes two main terms: the first term is used to find a large margin multiclass classifier and the second term is a regularization term to unlabeled data, which consists of the pairwise similarity and the classifier predictions. The goal of the regularization term is to minimize the inconsistency between data, which means that the similar data must share the same class label. In fact, it assigns the soft labels weighted with the similarity between unlabeled and labeled examples. Unlike the existing methods that use a predefined similarity function, we propose a boosting framework to learn from weak similarity functions as well as weak base classifiers. To solve the resulting optimization problem, we first employ a functional gradient descent procedure. We then derive a boosting algorithm from the resulting loss function, named MSSBoost. The proposed boosting approach can boost any type of multiclass weak base classifiers, e.g. decision tree. At each boosting iteration, MSSBoost updates one multiclass ensemble predictor. These updates then lead to minimize the loss function. We obtain the weighting factor to labeled and unlabeled data by solving the optimization problem. We also derive a boosting method for learning the similarity functions, which are used to guide the multiclass predictor to learn more from the unlabeled data. The second approach we use to solve the optimization problem is a functional coordinate gradient descent procedure. We next obtain a boosting framework from the resulting loss function, called CD-MSSB. The proposed boosting approach can boost any type of weak base learners, e.g. decision stump. At each boosting iteration, CD-MSSB updates one component of the multiclass predictor. We also present a variation of CD-MSSB in the experiments. The experiments on a number of UCI [10] benchmark datasets and a set of real-world text classification [38] datasets show that MSSBoost and CD-MSSB outperform the state-of-the-art boosting methods to semi-supervised learning. The results also emphasize that MSSBoost and CD-MSSB can effectively exploit information from the unlabeled data to improve the classification performance.

The rest of this article is organized as follows. Section 2 addresses the multiclass boosting on labeled and unlabeled data. Sections 3 presents the resulting risk function. A variation of the proposed algorithm is discussed in Section 4. Section 5 presents the time complexity. Section 6 addresses the related work. The experimental setup and the results are presented in 7 Experiments, 8 Results, and 9 MSSBoost for text classification problem, 10 Conclusion addresses the discussion and conclusion.

Section snippets

Multiclass supervised and semi-supervised learning

In this section we first overview one of the current formulations to the multiclass supervised learning using boosting framework [30], and then extend it to the multiclass semi-supervised classification problems. We formulate the problem as an optimization problem. We then use the gradient descent approach to solve the resulting optimization problem. This article is the extension of our previously presented work in [33].

Proposed risk function

We start by formulating the risk function of (9) as an optimization problem to find the optimal multiclass predictor and similarity function respectively. This results in: minimizef,SRs(Y^,f,S)subjecttof(x)=[f1(x),...,fM1(x)],fmSpan(H)m=1,..,M1,SSpan(K),where H={h1(x),...,hp(x)} is a set of weak classifiers as hi:XRM1, and hi can be any type of the multiclass weak base learners, such as decision tree and Naive Bayes, and K={K1(x,x˜;A1),...,Kq(x,x˜;Aq)} is a set of real-valued similarity

Variations of MSSBoost

As we mentioned in [33], coordinate gradient descent is used to handle the optimization problem (10). In this case the coordinate descent is applied for each class label, mth coordinate. Therefore, to solve (10), regarding a given similarity function S, we use coordinate gradient descent in functional space. The goal in this case is to find the optimal multiclass predictor. We then solve the problem in terms of the similarity function using gradient descent in matrix space to find the optimal

Discussion of the time complexity

In this section, we give the details of the time complexity of the proposed algorithms. We first discuss the MSSBoost algorithm. As mentioned in section (3), MSSBoost employs a multiclass weak base learner. It then computes the weights for labeled and unlabeled examples. Next, MSSBoost selects the best weak multiclass learner to add to ft ensemble classifier such that it most decreases the risk function. After several iterations, MSSBoost starts to update the similarity function. In order to

Comparison to previous related work

Using boosting approach to handle the semi-supervised learning problems has been addressed in several recent studies, see [5], [8], [23], [35], [41]. In this section we compare MSSBoost to the related methods as follows.

Bennett et al. [5] and dAlch Buc et al. [7] have proposed the boosting framework to handle the binary semi-supervised classification problem. These methods introduce the pseudo-margin concept to unlabeled examples in their loss functions. The goal of these methods is to find a

Experiments

In this section, we perform several experiments to compare the classification performance of MSSBoost to the state-of-the-art semi-supervised methods using several different datasets, syntactic, UCI [10], and real-world text classification datasets. We also setup several experiments to show the impact of using unlabeled data for improving the classification performance.

The first experiment includes comparison between CD-MSSB and OCD-MSSB. In this experiment, we also compare these two algorithms

Results

The results of the experiments are shown in Tables 2–6. Tables consist two parts: supervised and semi-Supervised. The first part presents the classification accuracy of the supervised learning algorithms using only labeled data. The second part of the tables shows the classification performance of the semi-supervised learning algorithms. The used base learners in the experiments are Decision Stump, J48, Naive Bayes (NB), and SVM. In each table, the best classification performance is boldfaced

MSSBoost for text classification problem

In this section, we evaluate the performance of the MSSBoost algorithm on the text classification problem using the popular text datasets. The specification of the used datasets are summarized in Table 7. Datasets re0 and re1 are derived from Reuters-21578 [22] and tr11, tr31, and tr45 are from TREC[5–7] [39]. These datasets were used in many text classification literature, such as [14], [36]. We took these datasets from [38]. We evaluate the classification performance of the MSSBoost using J48

Conclusion

In this paper, we proposed two multiclass boosting methods to semi-supervised learning, named CD-MSSB and MSSBoost. Our assumption is that labeled and unlabeled data with high similarity must share the same labels. Therefore, we combine the similarity information between labeled and unlabeled data with the classifier predictions to assign pseudo-label for the unlabeled examples. We design a new multiclass loss function consisting of the multiclass margin cost on labeled data and the

Acknowledgment

This research was partially supported by a grant from IPM (No. CS1396-4-69). We also thank the anonymous reviewers for their valuable comments.

Jafar Tanha was born in Bonab, Iran. He received the B.Sc and M.Sc degree in computer science from the university of AmirKabir (Polytechnic), Tehran, Iran, in 1999 and 2001, respectively, and the Ph.D degree in computer science-Artificial Intelligence from the University of Amsterdam (UvA), Amsterdam, The Netherlands, in 2013. He joined at INL institute, Leiden, The Netherland, as a researcher, from 2013 to 2015. Since 2015, he has been with the Department of Computer Engineering, Payame-Noor

References (47)

  • M.A. Bagheri et al.

    A subspace approach to error correcting output codes

    Pattern Recognit. Lett.

    (2013)
  • J. Tanha et al.

    Semi-supervised self-training for decision tree classifiers

    Int. J. Mach. Learn. Cybernet.

    (2017)
  • E.L. Allwein et al.

    Reducing multiclass to binary: a unifying approach for margin classifiers

    J. Mach. Learn. Res.

    (2001)
  • M. Belkin et al.

    Manifold regularization: a geometric framework for learning from labeled and unlabeled examples

    J. Mach. Learn. Res.

    (2006)
  • K. Bennett et al.

    Semi-supervised support vector machines

    NIPS

    (1999)
  • K. Bennett et al.

    Exploiting unlabeled data in ensemble methods

    Proceedings of ACM SIGKDD Conference

    (2002)
  • A. Blum, T. Mitchell, Combining labeled and unlabeled data with co-training, ACM, 1998. Proceedings of the Eleventh...
  • F. dAlch Buc et al.

    Semi-supervised marginboost

    NIPS

    (2002)
  • ChenK. et al.

    Semi-supervised learning via regularized boosting working on multiple semi-supervised assumptions

    Pattern Anal. Mach. Intell.

    (2011)
  • T.G. Dietterich et al.

    Solving multiclass learning problems via error-correcting output codes

    J. Artif. Intell. Res.

    (1995)
  • A. Frank, A. Asuncion, UCI Machine Learning Repository, 2010,...
  • Y. Freund et al.

    Experiments with a new boosting algorithm

    ICML

    (1996)
  • J. Friedman et al.

    Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors)

    Ann. Stat.

    (2000)
  • M. Hall et al.

    The weka data mining software: an update

    SIGKDD Explor. Newsl.

    (2009)
  • HanE.H. et al.

    Centroid-based document classification: analysis and experimental results

  • T. Hertz et al.

    Boosting margin based distance functions for clustering

    ICML

    (2004)
  • A. Hillel, D. Weinshall, Learning distance function by coding similarity, ACM, 2007. Proceedings of the 24th...
  • HoiS. et al.

    Semi-supervised distance metric learning for collaborative image retrieval

    CVPR

    (2008)
  • T. Joachims

    Transductive inference for text classification using support vector machines

    ICML

    (1999)
  • T. Joachims

    Transductive learning via spectral graph partitioning

    ICML

    (2003)
  • D.P. Kingma et al.

    Semi-supervised learning with deep generative models

    Advances in Neural Information Processing Systems

    (2014)
  • N. Lawrence et al.

    Semi-supervised learning via gaussian processes

    NIPS

    (2005)
  • D. Lewis, D., Reuters-21578 Text Categorization Test Collection Distribution, 1999,...
  • Cited by (14)

    • Joint exploring of risky labeled and unlabeled samples for safe semi-supervised clustering

      2021, Expert Systems with Applications
      Citation Excerpt :

      Semi-Supervised Learning (SSL) (Chapelle, Scholkopf, & Zien, 2009; Zhu, 2005; Tanha, 2018; Kilinc & Uysal, 2018) has aroused wide attention in the field of machine learning in recent decades.

    • A Selection Metric for semi-supervised learning based on neighborhood construction

      2021, Information Processing and Management
      Citation Excerpt :

      Clustering analysis is a powerful tool to identify the structure of data space from unlabeled data. In Tanha (2018), a multi-class boosting algorithm is proposed for semi-supervised classification problems. The loss function proposed in this research, based on the information of labeled data and the pairwise similarity between labeled and unlabeled, minimizes the disagreement between the prediction of the classifier and the similarity of the pairs.

    • Multimodal Multiclass Boosting and its Application to Cross-modal Retrieval

      2019, Neurocomputing
      Citation Excerpt :

      Based on the coding scheme of [9], MSAB [33] not only minimizes the empirical loss on labeled data, but also uses the manifold and cluster assumptions to minimize the consistencies over labeled and unlabeled data. By utilizing the regular simplex vertices [31], MSSBoost [34] designs a loss function which includes the multiclass margin cost on labeled data and the regularization term on unlabeled data, and learns the optimal similarity function for given data. In these approaches, the multiclass weak learners are utilized.

    View all citing articles on Scopus

    Jafar Tanha was born in Bonab, Iran. He received the B.Sc and M.Sc degree in computer science from the university of AmirKabir (Polytechnic), Tehran, Iran, in 1999 and 2001, respectively, and the Ph.D degree in computer science-Artificial Intelligence from the University of Amsterdam (UvA), Amsterdam, The Netherlands, in 2013. He joined at INL institute, Leiden, The Netherland, as a researcher, from 2013 to 2015. Since 2015, he has been with the Department of Computer Engineering, Payame-Noor University, Tehran, Iran, where he was an Assistance Professor. He has held lecturing positions at the Iran university of Science & Technology, Tehran, Iran, in 2016. His current position is the IT manager at the University of Payame-Noor, Tehran Iran. His main areas of research interest are machine learning, pattern recognition, data mining, and document analysis.

    He was the PC-member of the 11th International Conference on E-learning (icelet 2017) held in Tehran, Iran.

    View full text