Semi-supervised learning combining transductive support vector machine with active learning
Introduction
Support vector machine (SVM) is a supervised machine learning approach for solving binary classification pattern recognition problems. It adopts maximum margin to find the decision surface that separates the positive and negative labeled training examples of a class [1]. For a given date point, the regular SVM results in distances among data points ranges from 0 to 1. The value 0 indicates that this data point locates on the hyper-plane and the value 1 means that this data point is a support vector. Although SVM has been successfully used in various fields, such as [2], [3], [4], [5], [6], however, in many real word applications, there is not enough labeled data to train a good classification model. Compare to the standard SVM which uses only labeled training data, many semi-supervised SVM employ unlabeled data along with some labeled data for training classifiers with improved generalization and performance. Semi-supervised SVM has been well received attention because of two reasons. Firstly, labeling a large number of examples is time-consuming and labor-intensive. This task has also to be carried out by qualified experts and thus is expensive. Secondly, some studies show that using unlabeled data for learning can improve the accuracy of classifiers [7], [8]. Transductive support vector machine (TSVM) [9] is an efficient method for improving the generalization accuracy of SVM by finding a labeling for the unlabeled data, so that a linear boundary has the maximum margin on both the original labeled data and the labeled unlabeled data [10].
The notable characteristic of TSVM, being transductive, aims at such learning problems that are really interested in only the particular datasets of the testing or working (or training) data [9], [11], while traditional work on inductive learning estimates a classifier based on some training data that generalizes to any input examples. The main idea of transductive learning is building models for the best prediction performance on a particular testing dataset instead of developing generalized models to be applied to any testing dataset [12]. In other words, by explicitly including the working dataset consisting of unlabeled examples in problem formulation, a better generalization can be achieved on problems with insufficient labeled data points [13]. One of the most common problems is that the machine may incorrectly label the training dataset, which will lead to classification error. The solution for this problem is in active learning.
Active learning (AL) is a technique of selecting a small subset from the unlabeled data such that labeling on the subset maximizes the learning accuracy. The selected subset is manually labeled by experts. In this way, AL can complement the TSVM by reducing the labeling errors [14]. In this paper, we explore combining the TSVM with the AL to improve the performance of the classification task. Below are the major contributions of our work:
- 1.
In learning process, TSVM exploits a large amount of unlabeled data which explicitly include the geometrical distribution characteristics. To capture the geometrical structure of the data, we define as a function of Laplacian graph. In this way, it can explore the structure of the data manifold structure by adding a regularization term that penalize any “abrupt changes” of the function values evaluated on neighbor samples in the Laplacian graph.
- 2.
Active learning uses query framework, where an active learner queries the instances for labeling. Our algorithm defined the version space minimum–maximum division principle as the selection criteria to achieve best labeling results. The selected most informative instance is considered to be the most likely support vector, which reduced learning cost by deleting non-support vector. At the same time, it can achieve batch-sampling mode, which improves the training efficiency. Overall, this method can achieve more significant improvement for consuming the same amount of human effort and produce a desirable result considerably fewer labeled data.
The rest of this paper is organized as follows. Section 2 describes the concept of TSVM, active learning, and graph-based method. Section 3 reviews some of the issues in the TSVM and active learning. Section 4 of the paper introduces the proposed algorithm. Section 5 reports experimental results on several UCI datasets and the book reviews dataset, and further analyze the underlying reasons for choosing the algorithm. Finally, Section 6 concludes and highlights the future work.
Section snippets
Transductive SVM
Transductive SVM (TSVM) is a semi-supervised large-margin classification method based on the low density separation assumption. Similar to traditional SVM, TSVM searches for a hyper-plane with largest margin to separate the classes, and simultaneously takes into account labeled and unlabeled examples. The detail descriptions and proofs of the concepts can be found in [9].
Set a group of independent and identically distributed labeled examplesand u
Characteristics of active learning
Active learning (AL) is well-motivated in many modern machine learning problems where data may be abundant but labels are scarce or expensive to obtain. It is an interactive learning technique designed to reduce the labor cost of labeling in which the learning algorithm can freely assign the unlabeled examples to the training set. The main idea is to select the most informative examples and ask the expert for their label in the successive learning rounds. The strategy of AL is to select a most
Combining transductive SVM with active learning
In this section we provide a new semi-supervised learning algorithm based on active learning (AL), which combines TSVM with AL, called ALTSVM algorithm. Firstly, to explore the data manifold structure, we add a regularization term that penalizes any “abrupt changes” of the evaluated function values on neighbor samples in the Laplacian graph. Secondly, we propose a new unlabeled sample selection principle for AL, called version space minimum–maximum division principle. Thirdly, we describe the
Experimental results and analysis
To evaluate the performance of the proposed algorithm, we conduct a set of experiments by comparing the proposed algorithm with several state-of-the-art active learning methods on benchmark UCI datasets [32], and also on a book reviews dataset as a real world application.
Conclusions
In this paper, we proposed to solve the problems with using transductive support vector machine (TSVM), by a preset number of positive class samples N. Presetting the N correctly is very difficult before training the TSVM, therefore leads to considerable estimation error, especially when the number of the labeled examples is very small. To avoid using more unlabeled examples in a native way, we suggested active learning. Studies have found no correlation between using more unlabeled examples
Acknowledgements
This work was supported by the National Key Basic Research Program (973) of China under Grant no. 2013CB328903, the National Science Foundations of China under Grant nos. 61379158 and 71301177, and the Basic and Advanced Research Program of Chongqing under Grant nos. cstc2013jcyjA1658 and cstc2014jcyjA40054.
Xibin Wang is a PhD student in College of Computer Science at Chongqing University, China. He received his MS degree in computer science from Guizhou University in 2012. His research focuses on computational intelligence, data mining and business intelligence, and machine learning.
References (40)
- et al.
The effective use of the one-class SVM classifier for handwritten signature verification based on writer-independent parameters
Pattern Recognit.
(2015) - et al.
Fault detection based on a robust one class support vector machine
Neurocomputing
(2014) - et al.
Credit rating with a monotonicity-constrained support vector machine model
Expert Syst. Appl.
(2014) - et al.
Support vector machines classification based on particle swarm optimization for bone age determination
Appl. Soft Comput.
(2014) - et al.
Real estate price forecasting based on SVM optimized by PSO
Optik—Int. J. Light Electron Opt.
(2014) - et al.
Learning with progressive transductive support vector machine
Pattern Recognit. Lett.
(2003) - et al.
Combining active and semi-supervised learning for spoken language understanding
Speech Commun.
(2005) - et al.
A class of smooth semi-supervised SVM by difference of convex functions programming and algorithm
Knowl. Based Syst.
(2013) - et al.
Semi-supervised learning combining co-training with active learning
Expert Syst. Appl.
(2014) - et al.
Bagging, bumping, multiview, and active learning for record linkage with empirical results on patient identity data
Comput. Methods Prog. Biomed.
(2012)
Semi-supervised and active learning with the probabilistic RBF classifier
Neurocomputing
A tutorial on support vector machines for pattern recognition
Data Min. Knowl. Discov.
Semi-supervised support vector machines
Adv. Neural Inf. Process. Syst.
Recent advances on support vector machines research
Technol. Econ. Dev. Econ.
Transductive support vector machines: promising approach to model small and unbalanced datasets
Mol. Inf.
Cited by (26)
MDD-TSVM: A novel semisupervised-based method for major depressive disorder detection using electroencephalogram signals
2022, Computers in Biology and MedicineMultilinear clustering via tensor Fukunaga–Koontz transform with Fisher eigenspectrum regularization
2021, Applied Soft ComputingCitation Excerpt :Differently, the proposed method does not depend on pre-training and efficiently employs handcrafted features, covering a wider range of applications. For example, several machine learning problems have no available pre-trained models and not enough labeled data in order to train a deep learning model [86,87]. The use of HOF and iDT improves the RTFKT accuracy by about 7%.
Social media mining for ideation: Identification of sustainable solutions and opinions
2021, TechnovationCitation Excerpt :The lack of a sufficient number of labelled samples may result in an overfitted discriminating hyperplane. Due to this, semi-supervised methods that utilise a vast number of unlabelled samples during training have gained popularity in recent years (Wang et al., 2016). In the present study, we used a semi-supervised learning Algorithm for the classification task and compared it to the supervised learning algorithms.
A Selection Metric for semi-supervised learning based on neighborhood construction
2021, Information Processing and ManagementCitation Excerpt :In fact, extracting useful knowledge from the unlabeled data is the main task and challenge in semi-supervised learning (Chapelle, Scholkopf, & Zien, 2009). Several different approaches have been introduced to this task, such as the Expectation Maximization (EM) (Alzanin & Azmi, 2019), Self–training (He & Zhou, 2011; Tanha, van Someren, & Afsarmanesh, 2017), Co-training (Park & Zhang, 2004; Peng, Estrada, Pedersoli, & Desrosiers, 2020), Transduction Support Vector Machine (TSVM) (Li, Wang, Bi, & Jiang, 2018; Wang, Wen, Alam, Jiang, & Wu, 2016), Semi–Supervised SVM (S3VM) (Ding, Zhu, & Zhang, 2017), Graph-based method (Sawant & Prabukumar, 2020; Subramanya & Talukdar, 2014), and Boosting–based semi-supervised learning approach (Tanha, 2019; Tanha, van Someren, & Afsarmanesh, 2014). Most of the semi-supervised algorithms follow two main approaches: the extension of a specific base learner to learn from labeled and unlabeled instances, and application of a framework to learn from both labeled and unlabeled data regardless of the assigned base learner.
A general non-parametric active learning framework for classification on multiple manifolds
2020, Pattern Recognition LettersRecent Trends in Computer Assisted Diagnosis (CAD) System for Breast Cancer Diagnosis Using Histopathological Images
2019, IRBMCitation Excerpt :iii) General feature descriptors such as SIFT, SURF, ORB [72,73] used for feature extraction can be studied and analyzed in future for the development of an efficient CAD system to diagnose breast cancer tissues using histopathological images. ( iv) For classification, most of the techniques employ basic SVM but still there is scope to employ Transductive Support Vector Machine (TSVM) [99,100], advance Progressive Transductive Support Vector Machine (PTSVM) [101] to diagnose the breast cancer more accurately. ( v) In classification stage, the technique CNN if employed to classify histopathological images efficiently at lower magnifications such as 10x may reduce time complexity and cost of the CAD system.
Xibin Wang is a PhD student in College of Computer Science at Chongqing University, China. He received his MS degree in computer science from Guizhou University in 2012. His research focuses on computational intelligence, data mining and business intelligence, and machine learning.
Junhao Wen received his BS, MS and PhD degrees in computer science from Chongqing University, China, in 1991, 1999 and 2008, respectively. Currently, he is a professor and PhD supervisor at Chongqing University. His research focuses on recommended system, data mining and business intelligence, and machine learning.
Shafiq Alam received his PhD degree from University of Auckland, New Zealand. He is currently a postdoctoral research fellow at the Department of Computer Science, University of Auckland. He has published in International Journals of high repute, A* ranked conferences, and edited a book in his research area. He has been also a general chair of a workshop, and served on the program committee of various conferences. His research interests include computational intelligence, shilling and fraud detection in recommender systems, data mining, and decision support systems.
Zhuo Jiang received his BS degree from the Mathematics department of Xinjiang University, China, in 2008 and his MS degree in computer science from Chongqing University, China, in 2010. Currently, he is a PhD candidate at Chongqing University, China. His research interest includes AI planning, Web service and data mining.
Yingbo Wu received his BS, MS and PhD degrees in Computer Science from Chongqing University, China, in 2000, 2003, and 2012, respectively. Currently, he is a professor and MS supervisor at Chongqing University. His research focuses on distributed and intelligent computing, service systems engineering, field engineering and industry information.