A hybrid generative/discriminative method for semi-supervised classification

doi:10.1016/j.knosys.2012.07.020

Knowledge-Based Systems

Volume 37, January 2013, Pages 137-145

https://doi.org/10.1016/j.knosys.2012.07.020 Get rights and content

Abstract

Training methods for machine learning are often characterized as being generative or discriminative. We present a new co-training style algorithm which employs a generative classifier (Naive Bayes) and a discriminative classifier (Support Vector Machine) as base classifiers, to take advantage of both methods. Furthermore, we introduce a pair of weight parameters to balance the impact of labeled and pseudo-labeled data, and define a hybrid objective function to tune their values during co-training. The final prediction is given by the combination of base classifiers, and we define a pseudo-validation set to regulate their weight. Additionally, we present a strategy of pseudo-labeled data selecting to deal with the class imbalance problem. Experimental results on six datasets show that our method performs much better in practice, especially when the amount of labeled data is small.

Introduction

Most machine learning algorithms rely on the availability of enough labeled data, however, labeled data are often expensive to get in many application domains, while unlabeled data are readily available in abundance. As a consequence, semi-supervised learning (SSL), which attempts to learn from both labeled and unlabeled data, has attracted much attention in recent years.

A variety of SSL algorithms have been proposed for either generative or discriminative classifiers. In a probabilistic framework, generative classifiers model the joint probability, p(x, y), of the input x and the label y, and make predictions by using Bayes rules to calculate p(y∣x). In contrast, discriminative classifiers directly model the posterior class probabilities p(y∣x), and usually achieve better performance than generative ones. However, when the amount of labeled data is small, generative classifiers can provide better accuracy even when their models are not a very good fit to the data [1], [2], [3]. Additionally, discriminative classifiers cannot naturally incorporate unlabeled data in SSL, and the reason is that they do not model p(x) as generative ones do.

To leverage the power of both generative method and discriminative method, several hybrid methods have been proposed recent years. Most of these methods define a probabilistic hybrid objective function, in which the parameter distributions of both the generative part and the discriminative part should belong to a same family, e.g. PCP [2], or even be the same one, e.g. MCL [4]. As a result, some prominent classifiers, such as the Support Vector Machine (SVM), can hardly be incorporated into these hybrid methods.

Co-training is a popular SSL algorithm which employs two Naive Bayes (NB) as base classifiers to retrain each other iteratively [5], [6]. Due to the prominent performance of SVM [7], [8], several researchers recently replace the NBs with SVMs in co-training style algorithms [8], [9], [10]. However, the SVM is very sensitive to noise, which are inevitable among pseudo-labeled data of co-training style algorithms. Additionally, SVM performs worse when classes are strongly overlapping.

We present a new co-training style algorithm, which is named Co-NB-SVM, to flexibly combine a generative classifier (NB) and a discriminative classifier (SVM). Unlike conventional co-training algorithms, our method does not require two sufficient and redundant views (i.e., attribute sets). In order to balance the impact of labeled and pseudo-labeled data, we introduce a pair of weight parameters for all pseudo-labeled data, and this method avoids the plague of local maxima (minima) in probabilistic co-training style algorithms [6], [8], furthermore, we define a hybrid objective function to estimate the optimum values of weight parameters during co-training.

The final prediction is given by combining two classifiers, with a parameter μ to control their weights. In addition, we define a pseudo-validation set to tune the value of μ for a better balance between the generative component and discriminative component.

The class imbalance problem appears in many real-world tasks, and it hurts the performance of standard learning methods. In this paper, we also present different strategies to select pseudo-labeled data for NB and SVM respectively, to deal with the class imbalance problem. Finally, we demonstrate our framework on six text categorization problems addressed in [11].

Section snippets

Hybrid generative/discriminative methods

Recently, there have been a number of hybrid generative/discriminative methods which perform better than either one individually. Jaakkola and Haussler use fisher score which is extracted from a generative model to construct the kernel function of a discriminative classifier [12].

Raina et al. present a model for the document classification task: A document is split into multiple regions whose parameters are trained generatively, while the parameters which weight the importance of each region

Co-NB-SVM algorithm

Although our algorithm can actually work with any type of base classifiers, in this paper, we employ NB and SVM as base classifiers to leverage the power of both generative method and discriminative method. Next we will describe both components in details.

Experiments

Many semi-supervised learning methods perform experiments on datasets whose classes are few and nearly well-balanced. In this section, we show empirical results on multi-class and imbalance datasets with large numbers of features, and provide a substantial comparison with two co-training style methods, and their base classifiers, as well as two state-of-the-art semi-supervised hybrid methods. Furthermore, we show our results for two sets of experiments:

(1)
We show how the accuracies of three

Conclusion and discussion

We presented a semi-supervised hybrid generative/discriminative method which combines a generative classifier with a discriminative classifier in a new co-training style framework. Furthermore, we introduced a new method, which avoids the plague of local maxima (minima) in probabilistic co-training style algorithms, to balance the impact of labeled and pseudo-labeled data for co-training style algorithms. Moreover, we presented a strategy of pseudo-labeled data selecting to deal with the class

Acknowledgements

We thank Gregory Druck and Xiaojin Zhu for offering the databases for our experiment. This work was supported by the National Science Foundation of China (Grant No. 61073170) and the National Natural Science Foundation of China Youth Fund Project (Grant No. 60903164).

References (28)

Jianping Zeng et al.
Incorporating topic transition in topic detection and tracking algorithms
Expert Systems With Applications
(2009)
W. Zhang et al.
Text classification based on multi-word with support vector machine
Knowledge-Based Systems
(2008)
D. Soria
A ‘non-parametric’ version of the Naive Bayes classifier
Knowledge-Based Systems
(2011)
V. Garcı´a et al.
On the effectiveness of preprocessing methods when dealing with different levels of class imbalance
Knowledge-Based Systems
(2012)
Jinfu Liu et al.
A comparative study on rough set based class imbalance learning
Knowledge-Based Systems
(2008)
A. Ng et al.
On discriminative vs. generative classifiers: a comparison of logistic regression and Naive Bayes
A. Lasserre et al.
Principled hybrids of generative and discriminative models
A. McCallum, C. Pal, G. Druck, X. Wang, Multi-conditional learning: generative/discriminative training for clustering...
A. Blum, T. Mitchell, Combining labeled and unlabeled data with co-training, in: Proceedings of the Workshop on...
K. Nigam, R. Ghani, Analyzing the effectiveness and applicability of co-training, in: Workshop on Information and...

U. Brefeld, T. Scheffer, Co-EM support vector learning, in: Proceedings of the International Conference on Machine...

J.D.R. Farquhar et al.

Two view learning: SVM-2K, theory and practice

U. Brefeld, T. Scheffer, Semi-supervised learning for structured output variables, in: Proceedings of the 23rd...

G. Druck, C. Pal, X. Zhu, A. McCallum, Semi-supervised classification with hybrid generative/discriminative methods,...

Cited by (37)

Using decomposition-based multi-objective evolutionary algorithm as synthetic example optimization for self-labeling
2020, Swarm and Evolutionary Computation
Existing a lot of unlabeled data and few labeled data is one of the most common problems in real datasets. Semi-supervised classification methods can well handle such a problem and have a desirable performance. Among them, one of the most successful methods in dealing with shortage of labeled data is self-labeled technique. One of the difficulties of this technique is wrong data labeling in iterative process of self-labeling. The main reasons are 1) existing outlier and noisy data, 2) inappropriate distribution of labeled data in problem space, and 3) shortage of labeled data in order to make diversity in learning hypotheses. In this paper, a method is developed so as to generate synthetic labeled using decomposition-based multi-objective evolutionary algorithm as synthetic example optimization for self-labeling called DMSS. In DMSS, the synthetic labeled datasets with high diversity and high classification accuracy are generated and then added to the labeled datasets for better training of the algorithm. In already conducted researches, the diversity of the generated data and their distribution in problem space have not been well investigated. The proposed method is a data preparation method which can be employed in all self-labeled techniques. To do so, the proposed method is applied over four self-labeled algorithms having different features and their performances are then evaluated using 25 pattern datasets. The obtained results show high performance of the DMSS with regard to the classification accuracy compared to the existing methods in the literature. Also, the outcomes of conducted non-parametric statistical tests show that the proposed method significantly outperforms the other existing methods in the literature, as well.
A novel machine learning technique for computer-aided diagnosis
2020, Engineering Applications of Artificial Intelligence
Citation Excerpt :
The benefits of ML methods are observability, stability and controllability. Moreover, the knowledge extracted from data can be utilized in these methods and can be updated easily by adding new records (Ting et al., 2011; Mandal and Sairam, 2013; Jiang et al., 2013). Single ML approaches have been successfully used due to their relatively simple model structure.
The primary motivation of this paper is twofold: first, to employ a heuristic optimization algorithm to optimize the dendritic neuron model (DNM) and second, to design a tidy visual classifier for computer-aided diagnosis that can be easily implemented on a hardware system. Considering that the backpropagation (BP) algorithm is sensitive to the initial conditions and can easily fall into local minima, we propose an evolutionary dendritic neuron model (EDNM), which is optimized by the gbest-guided artificial bee colony (GABC) algorithm. The experiments are performed on the Liver Disorders Data Set, the Wisconsin Breast Cancer Data Set, the Haberman’s Survival Data Set, the Diabetic Retinopathy Debrecen Data Set and Hepatitis Data Set, and the effectiveness of our model was rigorously validated in terms of the classification accuracy, the sensitivity, the specificity, the F_measure, Cohen’s Kappa, the area under the receiver operating characteristic curve (AUC), convergence speed and the statistical analysis of the Wilcoxon signed-rank test. Moreover, after training, the EDNM can simplify its neural structure by removing redundant synapses and superfluous dendrites by the neuronal pruning mechanism. Finally, the simplified structural morphology of the EDNM can be replaced by a logic circuit (LC) without sacrificing accuracy. It is worth emphasizing that once implemented by an LC, the model has a significant advantage over other classifiers in terms of speed when handling big data. Consequently, our proposed model can serve as an efficient medical classifier with excellent performance.
Scalable logo detection by self co-learning
2020, Pattern Recognition
Existing logo detection methods usually consider a small number of logo classes, limited images per class and assume fine-gained object bounding box annotations. This limits their scalability to real-world dynamic applications. In this work, we tackle these challenges by exploring a web data learning principle without the need for exhaustive manual labelling. Specifically, we propose a novel incremental learning approach, called Scalable Logo Self-co-Learning (SL²), capable of automatically self-discovering informative training images from noisy web data for progressively improving model capability in a cross-model co-learning manner. Moreover, we introduce a very large (2,190,757 images of 194 logo classes) logo dataset “WebLogo-2M” by designing an automatic data collection and processing method. Extensive comparative evaluations demonstrate the superiority of SL² over the state-of-the-art strongly and weakly supervised detection models and contemporary web data learning approaches.
Self-training semi-supervised classification based on density peaks of data
2018, Neurocomputing
Citation Excerpt :
The standard co-training [18] considers the feature space to be two different conditionally independent views. Each view is able to train one classifier and then teach each other to predict the classes perfectly [19,20]. In addition, advanced approaches for co-training are multi-view learning, which does not require explicit feature splits or the iterative mutual-teaching procedure [21–23].
Having a multitude of unlabeled data and few labeled ones is a common problem in many practical applications. A successful methodology to tackle this problem is self-training semi-supervised classification. In this paper, we introduce a method to discover the structure of data space based on find of density peaks. Then, a framework for self-training semi-supervised classification, in which the structure of data space is integrated into the self-training iterative process to help train a better classifier, is proposed. A series of experiments on both artificial and real datasets are run to evaluate the performance of our proposed framework. Experimental results clearly demonstrate that our proposed framework has better performance than some previous works in general on both artificial and real datasets, especially when the distribution of data is non-spherical. Besides, we also find that the support vector machine is particularly suitable for our proposed framework to play the role of base classifier.
A semi-supervised social relationships inferred model based on mobile phone data
2017, Future Generation Computer Systems
Exploring the relationships of humans is an important study in the mobile communication network. But the relationship prediction accuracy is not good enough when the number of known relationship labels (e.g., “friend” and “colleague”) is small, especially when the number of different relation classes are imbalanced in the mobile communication network. To deal with issues, we present a semi-supervised social relationships inferred model. This model can infer the relationships based on a large amount of unlabeled data or a small amount of labeled data. The model is a co-training style semi-supervised model which is combined with the support vector machine and naive Bayes. The final relationship labels are decided by the two classifiers. The proposed model is evaluated by a real mobile communication network dataset and the experiment results show that the model is effective in relationship mining, especially when the relationship network is in a stable state.
A boosted co-training method for class-imbalanced learning
2023, Expert Systems

View all citing articles on Scopus

View full text

A hybrid generative/discriminative method for semi-supervised classification

Abstract

Introduction

Section snippets

Hybrid generative/discriminative methods

Co-NB-SVM algorithm

Experiments

Conclusion and discussion

Acknowledgements

Expert Systems With Applications

Knowledge-Based Systems

Knowledge-Based Systems

Knowledge-Based Systems

Knowledge-Based Systems

On discriminative vs. generative classifiers: a comparison of logistic regression and Naive Bayes

Principled hybrids of generative and discriminative models

Two view learning: SVM-2K, theory and practice