A hybrid generative/discriminative method for semi-supervised classification
Introduction
Most machine learning algorithms rely on the availability of enough labeled data, however, labeled data are often expensive to get in many application domains, while unlabeled data are readily available in abundance. As a consequence, semi-supervised learning (SSL), which attempts to learn from both labeled and unlabeled data, has attracted much attention in recent years.
A variety of SSL algorithms have been proposed for either generative or discriminative classifiers. In a probabilistic framework, generative classifiers model the joint probability, p(x, y), of the input x and the label y, and make predictions by using Bayes rules to calculate p(y∣x). In contrast, discriminative classifiers directly model the posterior class probabilities p(y∣x), and usually achieve better performance than generative ones. However, when the amount of labeled data is small, generative classifiers can provide better accuracy even when their models are not a very good fit to the data [1], [2], [3]. Additionally, discriminative classifiers cannot naturally incorporate unlabeled data in SSL, and the reason is that they do not model p(x) as generative ones do.
To leverage the power of both generative method and discriminative method, several hybrid methods have been proposed recent years. Most of these methods define a probabilistic hybrid objective function, in which the parameter distributions of both the generative part and the discriminative part should belong to a same family, e.g. PCP [2], or even be the same one, e.g. MCL [4]. As a result, some prominent classifiers, such as the Support Vector Machine (SVM), can hardly be incorporated into these hybrid methods.
Co-training is a popular SSL algorithm which employs two Naive Bayes (NB) as base classifiers to retrain each other iteratively [5], [6]. Due to the prominent performance of SVM [7], [8], several researchers recently replace the NBs with SVMs in co-training style algorithms [8], [9], [10]. However, the SVM is very sensitive to noise, which are inevitable among pseudo-labeled data of co-training style algorithms. Additionally, SVM performs worse when classes are strongly overlapping.
We present a new co-training style algorithm, which is named Co-NB-SVM, to flexibly combine a generative classifier (NB) and a discriminative classifier (SVM). Unlike conventional co-training algorithms, our method does not require two sufficient and redundant views (i.e., attribute sets). In order to balance the impact of labeled and pseudo-labeled data, we introduce a pair of weight parameters for all pseudo-labeled data, and this method avoids the plague of local maxima (minima) in probabilistic co-training style algorithms [6], [8], furthermore, we define a hybrid objective function to estimate the optimum values of weight parameters during co-training.
The final prediction is given by combining two classifiers, with a parameter μ to control their weights. In addition, we define a pseudo-validation set to tune the value of μ for a better balance between the generative component and discriminative component.
The class imbalance problem appears in many real-world tasks, and it hurts the performance of standard learning methods. In this paper, we also present different strategies to select pseudo-labeled data for NB and SVM respectively, to deal with the class imbalance problem. Finally, we demonstrate our framework on six text categorization problems addressed in [11].
Section snippets
Hybrid generative/discriminative methods
Recently, there have been a number of hybrid generative/discriminative methods which perform better than either one individually. Jaakkola and Haussler use fisher score which is extracted from a generative model to construct the kernel function of a discriminative classifier [12].
Raina et al. present a model for the document classification task: A document is split into multiple regions whose parameters are trained generatively, while the parameters which weight the importance of each region
Co-NB-SVM algorithm
Although our algorithm can actually work with any type of base classifiers, in this paper, we employ NB and SVM as base classifiers to leverage the power of both generative method and discriminative method. Next we will describe both components in details.
Experiments
Many semi-supervised learning methods perform experiments on datasets whose classes are few and nearly well-balanced. In this section, we show empirical results on multi-class and imbalance datasets with large numbers of features, and provide a substantial comparison with two co-training style methods, and their base classifiers, as well as two state-of-the-art semi-supervised hybrid methods. Furthermore, we show our results for two sets of experiments:
- (1)
We show how the accuracies of three
Conclusion and discussion
We presented a semi-supervised hybrid generative/discriminative method which combines a generative classifier with a discriminative classifier in a new co-training style framework. Furthermore, we introduced a new method, which avoids the plague of local maxima (minima) in probabilistic co-training style algorithms, to balance the impact of labeled and pseudo-labeled data for co-training style algorithms. Moreover, we presented a strategy of pseudo-labeled data selecting to deal with the class
Acknowledgements
We thank Gregory Druck and Xiaojin Zhu for offering the databases for our experiment. This work was supported by the National Science Foundation of China (Grant No. 61073170) and the National Natural Science Foundation of China Youth Fund Project (Grant No. 60903164).
References (28)
- et al.
Incorporating topic transition in topic detection and tracking algorithms
Expert Systems With Applications
(2009) - et al.
Text classification based on multi-word with support vector machine
Knowledge-Based Systems
(2008) A ‘non-parametric’ version of the Naive Bayes classifier
Knowledge-Based Systems
(2011)- et al.
On the effectiveness of preprocessing methods when dealing with different levels of class imbalance
Knowledge-Based Systems
(2012) - et al.
A comparative study on rough set based class imbalance learning
Knowledge-Based Systems
(2008) - et al.
On discriminative vs. generative classifiers: a comparison of logistic regression and Naive Bayes
- et al.
Principled hybrids of generative and discriminative models
- A. McCallum, C. Pal, G. Druck, X. Wang, Multi-conditional learning: generative/discriminative training for clustering...
- A. Blum, T. Mitchell, Combining labeled and unlabeled data with co-training, in: Proceedings of the Workshop on...
- K. Nigam, R. Ghani, Analyzing the effectiveness and applicability of co-training, in: Workshop on Information and...
Two view learning: SVM-2K, theory and practice
Cited by (37)
Using decomposition-based multi-objective evolutionary algorithm as synthetic example optimization for self-labeling
2020, Swarm and Evolutionary ComputationA novel machine learning technique for computer-aided diagnosis
2020, Engineering Applications of Artificial IntelligenceCitation Excerpt :The benefits of ML methods are observability, stability and controllability. Moreover, the knowledge extracted from data can be utilized in these methods and can be updated easily by adding new records (Ting et al., 2011; Mandal and Sairam, 2013; Jiang et al., 2013). Single ML approaches have been successfully used due to their relatively simple model structure.
Scalable logo detection by self co-learning
2020, Pattern RecognitionSelf-training semi-supervised classification based on density peaks of data
2018, NeurocomputingCitation Excerpt :The standard co-training [18] considers the feature space to be two different conditionally independent views. Each view is able to train one classifier and then teach each other to predict the classes perfectly [19,20]. In addition, advanced approaches for co-training are multi-view learning, which does not require explicit feature splits or the iterative mutual-teaching procedure [21–23].
A semi-supervised social relationships inferred model based on mobile phone data
2017, Future Generation Computer SystemsA boosted co-training method for class-imbalanced learning
2023, Expert Systems