Online semi-supervised learning with learning vector quantization
Introduction
Supervised machine learning tasks [1], [2], [3] have achieved considerable success in the past years. However, most of the existing studies suffer from three underlying restrictions: (1) They rely on the complete supervised learning paradigm, thus the required amount of labeled data is very large. (2) They focus on the static environment, which is hard to adjust to the new arriving data. (3) They assume that all the data must be collected and trained in advance, which leads to spending much time before the training model is deployed. Considering the actual situation in our real-world applications, firstly, the unlabeled data are cheap and easily available, while the manual work of labeling is both time-consuming and costly; secondly, data are accumulated over time, and it’s difficult to train and create a new model from scratch each time; and last but not least, the model should be adapted to new environment as fast as possible. In order to handle the mixture of labeled and unlabeled samples in data stream, online semi-supervised learning (OSSL), as a combination of online learning [4], [5], [6] and semi-supervised learning [7], [8], [9], have been raised and investigated in recent years [10], [11], [12].
OSSL has a wide range of applications including object tracking [13], network traffic [14] and social network [15]. However, the problem definition, evaluation metrics and experimental setting in OSSL are diverse among different application scenarios. In this paper, we target such an online semi-supervised classification task: suppose we observe an infinite data sequence where x(t) ∈ Rd is the feature vector of the t-th data point. The label information y(t) is revealed only with a small probability p, and otherwise x(t) remains unlabeled. The learner updates the model based on x(t) and y(t) if exists. Compared with traditional supervised learning, OSSL methods need to update models more effectively since each sample is only learned for one time. In addition, they should have the capability of exploiting the unlabeled data efficiently in the data stream to boost the classification performance.
Recently, a wide variety of studies have attempted to address the OSSL problem. A pioneering study is the “online semi-supervised graph-based methods” [10], [12], [16], in which an adjacency graph is maintained dynamically as new data sample arrives. Though these methods have been proved to be effective in binary classification problems, their computation complexity is too high to extend to large-scale multiclass classification problems. Another popular family of OSSL, which are referred to as the “online semi-supervised SVM-based methods” [17], [18], [19], learn max-margin decision boundaries with clustering assumption or manifold assumption. The key limitation of these algorithms is that maintaining KKT conditions becomes a significant overhead due to the growing size of SV’s when dealing with extremely large-scale databases.
On the other hand, approaches based on prototype learning, e.g., learning vector quantization (LVQ), or self-organizing maps (SOM), are learned with the stochastic gradient descent and therefore can be easily extended to the online setting with no need to store any of the training data explicitly. Prototype-based learning methods can be broadly divided into two groups: supervised classification models and unsupervised clustering models. The former ones learn a set of prototypes as representatives for each class, while the latter ones learn a set of prototypes as representatives for the whole data distribution. Though some pioneering works [20], [21] of online semi-supervised prototype learning have been proposed, most of them are designed in a heuristic way and don’t have a well-defined cost function.
In this paper, we propose a novel OSSL method based on learning vector quantization, named OSS-LVQ. OSS-LVQ makes an explicit definition of the cost function which is intermediate between the learning process of labeled data and unlabeled data. Specifically, we capture the ‘right’ classification boundary discriminatively from labeled data with conventional learning vector quantization classifier and explore how to extract useful knowledge from unlabeled data. Two different unsupervised techniques, namely Gaussian mixture clustering criterion and neural gas clustering criterion, are considered to improve the classifier by the abundant of unlabeled data. For the Gaussian mixture clustering criterion, we assume the samples of each class are produced from a mixture Gaussian distribution. When an unlabeled sample is arriving, the log likelihood of the joint distribution for all classes is maximized. For the neural gas criterion, we optimize the cost function of the neural gas clustering for unlabeled data. The means of Gaussian mixture models or cluster centroids of neural gas are shared with the prototypes of LVQ. Therefore, the class information of labeled data can be transferred to unlabeled data, while the structure information from unlabeled data can be utilized to boost the classification performance. Meanwhile, note that these two learning criteria for labeled and unlabeled samples differ in scale, we use an online modulation technique to combine them in a united framework.
Fig. 1 visualizes how prototypes are updated by each instance in an online semi-supervised fashion. The labeled data (red triangle) and unlabeled data (blue square) are coming randomly in a sequential manner. Two complementary updating criteria are integrated to learn prototypes in OSS-LVQ. I.e.,
- •
CLL-OLVQ: Conditional log-likelihood criterion with online learning vector quantization;
- •
GMM/NG: Gaussian mixture clustering criterion / neural gas clustering criterion for online unsupervised prototype learning.
Specifically, we use CLL-OLVQ to update prototypes for labeled data and GMM/NG for unlabeled data. In CLL-OLVQ, prototypes of the same class with the arriving sample are moved towards the sample and prototypes of rival classes are moved away from the sample. Meanwhile all of the prototypes are moved towards the current sample in GMM/NG. These two processes are combined with online modulation technique and adaptively used according to the data labeling type to boost the performance of OSS-LVQ.
The contributions of our work are fourfold:
- (1)
By extending the conventional learning vector quantization method, we provide a general solution for the online semi-supervised classification task especially with very limited labeled samples.
- (2)
Two simple but effective strategies, namely the Gaussian mixture clustering criterion and neural gas clustering criterion, are introduced in the learning process of unlabeled data.
- (3)
An online modulation technique is considered to combine supervised and unsupervised learning criterion into a united framework.
- (4)
Compared to previous OSSL approaches, our proposed method achieves better classification performance while enjoys much lower computational cost.
The rest of the paper is organized as follows. In Section 2, we briefly discuss some related work. Section 3 provides our approach in detail. We describe the data sets and present experimental results in Section 4 and 5. Finally, Section 6 concludes this paper.
Section snippets
Related work
Our work is closely related to three major subfileds of machine learning research: online semi-supervised learning, learning vector quantization and prototype-based clustering. Below we briefly review representative work in each subfield.
Online semi-supervised learning vector quantization
In this section, we first give a general introduction to our proposed online semi-supervised learning vector quantization (OSS-LVQ) model in Section 3.1, then present the details of learning criterion from labeled and unlabeled samples in Section 3.2 and Section 3.3, respectively. Finally, we discuss how to use the online modulation technique to combine supervised and unsupervised learning criteria in Section 3.4.
Experiments and results
In this section, we first introduce 6 benchmark datasets, MNIST [42], 20 Newsgroups, CIFAR-10-2048D, CIFAR-10-500D, CIFAR-100-2048D and CIFAR-100-500D [43], used in our experiments. Then, we give the comparison methods and experimental setting. Finally, we compare classification performance of our proposed methods to baseline methods.
Discussion and analysis
To further investigate various aspects of our proposed method, we conduct experiments by varying one factor at a time while keeping the others fixed. Specifically, we show the classification performance with different percentage of labels in Section 5.1. Section 5.2 discusses about how the unlabeled data work in our proposed method. Finally, we analyze the effect of hyper-parameters in Section 5.3. To simplify the experiments, we use one prototype per class except in the discussion of the
Conclusions
Learning vector quantization (LVQ) represents a well-known family of algorithms with a long history, and plays a key role in many applications. However, most of the existing learning vector quantization algorithms presume that all the samples are labeled, which is expensive in the process of human labeling. In order to explore unlabeled data for boosting classification performance, we incorporate the clustering assumption to propose a versatile online semi-supervised method based LVQ.
As shown
CRediT authorship contribution statement
Yuan-Yuan Shen: Conceptualization, Methodology, Software, Writing - original draft. Yan-Ming Zhang: Methodology, Writing - review & editing. Xu-Yao Zhang: Methodology. Cheng-Lin Liu: Conceptualization, Methodology, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work has been supported by the Major Project for New Generation of AI under Grant No. 2018AAA0100400, the National Natural Science Foundation of China (NSFC) Grants 61836014, 61773376, 61721004.
Yuan-Yuan Shen received the B.S. degree from Anhui University, China, and the M.E. degree in computer science and technology from Xiamen University, China, in 2010 and 2015 respectively. Currently, she is pursuing the PhD degree at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing, China. Her research interests lie in the field of machine learning, pattern recognition and data stream classification. Her research goal is
References (47)
- et al.
Online semi-supervised support vector machine
Inf. Sci. (Ny)
(2018) An introduction to neural computing
Neural Netw.
(1988)- et al.
Regularized margin-based conditional log-likelihood loss for prototype learning
Pattern Recognit.
(2010) - et al.
Learning vector quantization for (dis-) similarities
Neurocomputing
(2014) - et al.
Competitive learning algorithms for vector quantization
Neural Netw.
(1990) - et al.
Handwritten digit recognition: benchmarking of state-of-the-art techniques
Pattern Recognit
(2003) - et al.
Generalized relevance LVQ (GRLVQ) with correlation measures for gene expression analysis
Neurocomputing
(2006) - et al.
Supervised machine learning: a review of classification techniques
Emerg. Artif. Intell. Appl.Comput. Eng.
(2007) Approximate statistical tests for comparing supervised classification learning algorithms
Neural Comput.
(1998)- et al.
Deep learning
Nature
(2015)
Libol: a library for online learning algorithms
J. Mach. Learn. Res.
Online bagging and boosting
IEEE International Conference on Systems, Man and Cybernetics
On-line random forests
IEEE International Conference on Computer Vision Workshops
Semi-supervised learning using Gaussian fields and harmonic functions
International Conference on Machine learning
Semi-supervised learning literature survey
Comput. Sci. Univ. Wisconsin-Madison
Semi-Supervised Learning
Online manifold regularization: a new learning setting and empirical study
Joint European Conference on Machine Learning and Knowledge Discovery in Databases
Online semi-supervised perception: real-time learning without explicit feedback
IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
Online semi-supervised learning on quantized graphs
Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence
Semi-supervised on-line boosting for robust tracking
European Conference on Computer Vision
Online data stream classification with incremental semi-supervised learning
ACM IKDD Conference on Data Sciences
Simnest: social media nested epidemic simulation via online semi-supervised deep learning
IEEE International Conference on Data Mining
Semi-supervised learning on data streams via temporal label propagation
International Conference on Machine Learning
Cited by (19)
Decentralized and collaborative machine learning framework for IoT
2024, Computer NetworksDriving information process system-based real-time energy management for the fuel cell bus to minimize fuel cell engine aging and energy consumption
2022, EnergyCitation Excerpt :According to the driving characteristics of the urban driving cycle, four typical patterns of the urban driving cycle are selected in this paper: pattern 1 is stopped at the bus station or stop at an intersection; pattern 2 is acceleration mode starting from speed 0; pattern 3 is deceleration mode from maximum speed to speed 0, and pattern 4 is cruise mode of fuel cell bus, in this mode, the bus velocity will keep in a certain speed fluctuation range. This paper selects the learning vector quantization (LVQ) neural network model to realize the driving cycle pattern recognition which can efficiently recognize complex and nonlinear objects [21–24]. Its structure is shown in Fig. 2.
Learning Vector Quantization based predictor model selection for hourly load demand forecasting
2022, Applied Soft ComputingA comprehensive study of class incremental learning algorithms for visual tasks
2021, Neural NetworksA Robust Kernel Least Mean Square Algorithm and its Quantization
2023, International Journal of Pattern Recognition and Artificial Intelligence
Yuan-Yuan Shen received the B.S. degree from Anhui University, China, and the M.E. degree in computer science and technology from Xiamen University, China, in 2010 and 2015 respectively. Currently, she is pursuing the PhD degree at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing, China. Her research interests lie in the field of machine learning, pattern recognition and data stream classification. Her research goal is to develop techniques for continuous learning by addressing challenges and constraints that arise in practical scenarios.
Yan-Ming Zhang is currently an associate professor at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation of Chinese Academy of Sciences, Beijing, China. He received the Bachelor’s degree from the Beijing University of Posts and Telecommunications, Beijing, China, and the Ph.D. degree in pattern recognition and intelligent systems from the Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing, in 2011. His current research interests include machine learning, pattern recognition, and graph neural network.
Xu-Yao Zhang is currently an associate professor in National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, Beijing, China. He received BS degree in computational mathematics from Wuhan University, Wuhan, China, in 2008 and PhD degree in pattern recognition and intelligent systems from Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 2013. From March 2015 to March 2016, he was a visiting scholar in Montreal Institute for Learning Algorithms (MILA), University of Montreal, Canada. His research interests include machine learning, pattern recognition, handwriting recognition, and deep learning.
Cheng-Lin Liu is a Professor at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation of Chinese Academy of Sciences, Beijing, China, and is now the director of the laboratory. He received the B.S. degree in electronic engineering from Wuhan University, Wuhan, China, the M.E. degree in electronic engineering from Beijing Polytechnic University, Beijing, China, the Ph.D. degree in pattern recognition and intelligent control from the Chinese Academy of Sciences, Beijing, China, in 1989, 1992 and 1995, respectively. He was a postdoctoral fellow at Korea Advanced Institute of Science and Technology (KAIST) and later at Tokyo University of Agriculture and Technology from March 1996 to March 1999. From 1999 to 2004, he was a research staff member and later a senior researcher at the Central Research Laboratory, Hitachi, Ltd., Tokyo, Japan. His research interests include pattern recognition, image processing, neural networks, machine learning, and especially the applications to character recognition and document analysis. He has published over 300 technical papers in journals and conferences. He won the IAPR/ICDAR Young Investigator Award of 2005. He is an associate editor-in-chief of Pattern Recognition Journal, an associate editor of Image and Vision and Computing, International Journal on Document Analysis and Recognition, and Cognitive Computation. He is a Fellow of the IAPR and the IEEE.