Elsevier

Knowledge-Based Systems

Volume 37, January 2013, Pages 137-145
Knowledge-Based Systems

A hybrid generative/discriminative method for semi-supervised classification

https://doi.org/10.1016/j.knosys.2012.07.020Get rights and content

Abstract

Training methods for machine learning are often characterized as being generative or discriminative. We present a new co-training style algorithm which employs a generative classifier (Naive Bayes) and a discriminative classifier (Support Vector Machine) as base classifiers, to take advantage of both methods. Furthermore, we introduce a pair of weight parameters to balance the impact of labeled and pseudo-labeled data, and define a hybrid objective function to tune their values during co-training. The final prediction is given by the combination of base classifiers, and we define a pseudo-validation set to regulate their weight. Additionally, we present a strategy of pseudo-labeled data selecting to deal with the class imbalance problem. Experimental results on six datasets show that our method performs much better in practice, especially when the amount of labeled data is small.

Introduction

Most machine learning algorithms rely on the availability of enough labeled data, however, labeled data are often expensive to get in many application domains, while unlabeled data are readily available in abundance. As a consequence, semi-supervised learning (SSL), which attempts to learn from both labeled and unlabeled data, has attracted much attention in recent years.

A variety of SSL algorithms have been proposed for either generative or discriminative classifiers. In a probabilistic framework, generative classifiers model the joint probability, p(x, y), of the input x and the label y, and make predictions by using Bayes rules to calculate p(yx). In contrast, discriminative classifiers directly model the posterior class probabilities p(yx), and usually achieve better performance than generative ones. However, when the amount of labeled data is small, generative classifiers can provide better accuracy even when their models are not a very good fit to the data [1], [2], [3]. Additionally, discriminative classifiers cannot naturally incorporate unlabeled data in SSL, and the reason is that they do not model p(x) as generative ones do.

To leverage the power of both generative method and discriminative method, several hybrid methods have been proposed recent years. Most of these methods define a probabilistic hybrid objective function, in which the parameter distributions of both the generative part and the discriminative part should belong to a same family, e.g. PCP [2], or even be the same one, e.g. MCL [4]. As a result, some prominent classifiers, such as the Support Vector Machine (SVM), can hardly be incorporated into these hybrid methods.

Co-training is a popular SSL algorithm which employs two Naive Bayes (NB) as base classifiers to retrain each other iteratively [5], [6]. Due to the prominent performance of SVM [7], [8], several researchers recently replace the NBs with SVMs in co-training style algorithms [8], [9], [10]. However, the SVM is very sensitive to noise, which are inevitable among pseudo-labeled data of co-training style algorithms. Additionally, SVM performs worse when classes are strongly overlapping.

We present a new co-training style algorithm, which is named Co-NB-SVM, to flexibly combine a generative classifier (NB) and a discriminative classifier (SVM). Unlike conventional co-training algorithms, our method does not require two sufficient and redundant views (i.e., attribute sets). In order to balance the impact of labeled and pseudo-labeled data, we introduce a pair of weight parameters for all pseudo-labeled data, and this method avoids the plague of local maxima (minima) in probabilistic co-training style algorithms [6], [8], furthermore, we define a hybrid objective function to estimate the optimum values of weight parameters during co-training.

The final prediction is given by combining two classifiers, with a parameter μ to control their weights. In addition, we define a pseudo-validation set to tune the value of μ for a better balance between the generative component and discriminative component.

The class imbalance problem appears in many real-world tasks, and it hurts the performance of standard learning methods. In this paper, we also present different strategies to select pseudo-labeled data for NB and SVM respectively, to deal with the class imbalance problem. Finally, we demonstrate our framework on six text categorization problems addressed in [11].

Section snippets

Hybrid generative/discriminative methods

Recently, there have been a number of hybrid generative/discriminative methods which perform better than either one individually. Jaakkola and Haussler use fisher score which is extracted from a generative model to construct the kernel function of a discriminative classifier [12].

Raina et al. present a model for the document classification task: A document is split into multiple regions whose parameters are trained generatively, while the parameters which weight the importance of each region

Co-NB-SVM algorithm

Although our algorithm can actually work with any type of base classifiers, in this paper, we employ NB and SVM as base classifiers to leverage the power of both generative method and discriminative method. Next we will describe both components in details.

Experiments

Many semi-supervised learning methods perform experiments on datasets whose classes are few and nearly well-balanced. In this section, we show empirical results on multi-class and imbalance datasets with large numbers of features, and provide a substantial comparison with two co-training style methods, and their base classifiers, as well as two state-of-the-art semi-supervised hybrid methods. Furthermore, we show our results for two sets of experiments:

  • (1)

    We show how the accuracies of three

Conclusion and discussion

We presented a semi-supervised hybrid generative/discriminative method which combines a generative classifier with a discriminative classifier in a new co-training style framework. Furthermore, we introduced a new method, which avoids the plague of local maxima (minima) in probabilistic co-training style algorithms, to balance the impact of labeled and pseudo-labeled data for co-training style algorithms. Moreover, we presented a strategy of pseudo-labeled data selecting to deal with the class

Acknowledgements

We thank Gregory Druck and Xiaojin Zhu for offering the databases for our experiment. This work was supported by the National Science Foundation of China (Grant No. 61073170) and the National Natural Science Foundation of China Youth Fund Project (Grant No. 60903164).

References (28)

  • U. Brefeld, T. Scheffer, Co-EM support vector learning, in: Proceedings of the International Conference on Machine...
  • J.D.R. Farquhar et al.

    Two view learning: SVM-2K, theory and practice

  • U. Brefeld, T. Scheffer, Semi-supervised learning for structured output variables, in: Proceedings of the 23rd...
  • G. Druck, C. Pal, X. Zhu, A. McCallum, Semi-supervised classification with hybrid generative/discriminative methods,...
  • Cited by (37)

    • A novel machine learning technique for computer-aided diagnosis

      2020, Engineering Applications of Artificial Intelligence
      Citation Excerpt :

      The benefits of ML methods are observability, stability and controllability. Moreover, the knowledge extracted from data can be utilized in these methods and can be updated easily by adding new records (Ting et al., 2011; Mandal and Sairam, 2013; Jiang et al., 2013). Single ML approaches have been successfully used due to their relatively simple model structure.

    • Self-training semi-supervised classification based on density peaks of data

      2018, Neurocomputing
      Citation Excerpt :

      The standard co-training [18] considers the feature space to be two different conditionally independent views. Each view is able to train one classifier and then teach each other to predict the classes perfectly [19,20]. In addition, advanced approaches for co-training are multi-view learning, which does not require explicit feature splits or the iterative mutual-teaching procedure [21–23].

    View all citing articles on Scopus
    View full text