Elsevier

Neurocomputing

Volume 399, 25 July 2020, Pages 467-478
Neurocomputing

Online semi-supervised learning with learning vector quantization

https://doi.org/10.1016/j.neucom.2020.03.025Get rights and content

Highlights

  • We illustrate a novel prototype-based methodology to tackle the problem of online semi-supervised learning (OSSL).

  • Two effective strategies based on clustering assumption are incorporated into the learning process of unlabeled data.

  • An online modulation technique considers how to combine supervised learning criterion with unsupervised learning criterion.

  • Compared to previous OSSL approaches, the proposed algorithm achieves better classification performance with lower computational cost.

Abstract

Online semi-supervised learning (OSSL) is a learning paradigm simulating human learning, in which the data appear in a sequential manner with a mixture of both labeled and unlabeled samples. Despite the recent advances, there are still many unsolved problems in this area. In this paper, we propose a novel OSSL method based on learning vector quantization (LVQ). LVQ classifiers, which represent the data of each class by a set of prototypes, have found their usage in a wide range of pattern recognition problems and can be naturally adapted to the online scenario by updating the prototypes with stochastic gradient optimization. However, most of the existing LVQ algorithms were designed for supervised classification. To extract useful information from unlabeled data, we propose two simple and computationally efficient methods based on clustering assumption. To be specific, we use the maximum conditional likelihood criterion for updating prototypes when data sample is labeled, and the Gaussian mixture clustering criterion or neural gas clustering criterion for adjusting prototypes when data sample is unlabeled. These two criteria are utilized alternatively according to the availability of label information to make full use of both supervised and unsupervised data to boost the performance. By extensive experiments, we show that the proposed method exhibits higher accuracy compared with the baseline methods and graph-based methods and is much more efficient than graph-based methods in both training and test time.

Introduction

Supervised machine learning tasks [1], [2], [3] have achieved considerable success in the past years. However, most of the existing studies suffer from three underlying restrictions: (1) They rely on the complete supervised learning paradigm, thus the required amount of labeled data is very large. (2) They focus on the static environment, which is hard to adjust to the new arriving data. (3) They assume that all the data must be collected and trained in advance, which leads to spending much time before the training model is deployed. Considering the actual situation in our real-world applications, firstly, the unlabeled data are cheap and easily available, while the manual work of labeling is both time-consuming and costly; secondly, data are accumulated over time, and it’s difficult to train and create a new model from scratch each time; and last but not least, the model should be adapted to new environment as fast as possible. In order to handle the mixture of labeled and unlabeled samples in data stream, online semi-supervised learning (OSSL), as a combination of online learning [4], [5], [6] and semi-supervised learning [7], [8], [9], have been raised and investigated in recent years [10], [11], [12].

OSSL has a wide range of applications including object tracking [13], network traffic [14] and social network [15]. However, the problem definition, evaluation metrics and experimental setting in OSSL are diverse among different application scenarios. In this paper, we target such an online semi-supervised classification task: suppose we observe an infinite data sequence x(1),,x(t),, where x(t) ∈ Rd is the feature vector of the t-th data point. The label information y(t) is revealed only with a small probability p, and otherwise x(t) remains unlabeled. The learner updates the model based on x(t) and y(t) if exists. Compared with traditional supervised learning, OSSL methods need to update models more effectively since each sample is only learned for one time. In addition, they should have the capability of exploiting the unlabeled data efficiently in the data stream to boost the classification performance.

Recently, a wide variety of studies have attempted to address the OSSL problem. A pioneering study is the “online semi-supervised graph-based methods” [10], [12], [16], in which an adjacency graph is maintained dynamically as new data sample arrives. Though these methods have been proved to be effective in binary classification problems, their computation complexity is too high to extend to large-scale multiclass classification problems. Another popular family of OSSL, which are referred to as the “online semi-supervised SVM-based methods” [17], [18], [19], learn max-margin decision boundaries with clustering assumption or manifold assumption. The key limitation of these algorithms is that maintaining KKT conditions becomes a significant overhead due to the growing size of SV’s when dealing with extremely large-scale databases.

On the other hand, approaches based on prototype learning, e.g., learning vector quantization (LVQ), or self-organizing maps (SOM), are learned with the stochastic gradient descent and therefore can be easily extended to the online setting with no need to store any of the training data explicitly. Prototype-based learning methods can be broadly divided into two groups: supervised classification models and unsupervised clustering models. The former ones learn a set of prototypes as representatives for each class, while the latter ones learn a set of prototypes as representatives for the whole data distribution. Though some pioneering works [20], [21] of online semi-supervised prototype learning have been proposed, most of them are designed in a heuristic way and don’t have a well-defined cost function.

In this paper, we propose a novel OSSL method based on learning vector quantization, named OSS-LVQ. OSS-LVQ makes an explicit definition of the cost function which is intermediate between the learning process of labeled data and unlabeled data. Specifically, we capture the ‘right’ classification boundary discriminatively from labeled data with conventional learning vector quantization classifier and explore how to extract useful knowledge from unlabeled data. Two different unsupervised techniques, namely Gaussian mixture clustering criterion and neural gas clustering criterion, are considered to improve the classifier by the abundant of unlabeled data. For the Gaussian mixture clustering criterion, we assume the samples of each class are produced from a mixture Gaussian distribution. When an unlabeled sample is arriving, the log likelihood of the joint distribution for all classes is maximized. For the neural gas criterion, we optimize the cost function of the neural gas clustering for unlabeled data. The means of Gaussian mixture models or cluster centroids of neural gas are shared with the prototypes of LVQ. Therefore, the class information of labeled data can be transferred to unlabeled data, while the structure information from unlabeled data can be utilized to boost the classification performance. Meanwhile, note that these two learning criteria for labeled and unlabeled samples differ in scale, we use an online modulation technique to combine them in a united framework.

Fig. 1 visualizes how prototypes are updated by each instance in an online semi-supervised fashion. The labeled data (red triangle) and unlabeled data (blue square) are coming randomly in a sequential manner. Two complementary updating criteria are integrated to learn prototypes in OSS-LVQ. I.e.,

  • CLL-OLVQ: Conditional log-likelihood criterion with online learning vector quantization;

  • GMM/NG: Gaussian mixture clustering criterion / neural gas clustering criterion for online unsupervised prototype learning.

Specifically, we use CLL-OLVQ to update prototypes for labeled data and GMM/NG for unlabeled data. In CLL-OLVQ, prototypes of the same class with the arriving sample are moved towards the sample and prototypes of rival classes are moved away from the sample. Meanwhile all of the prototypes are moved towards the current sample in GMM/NG. These two processes are combined with online modulation technique and adaptively used according to the data labeling type to boost the performance of OSS-LVQ.

The contributions of our work are fourfold:

  • (1)

    By extending the conventional learning vector quantization method, we provide a general solution for the online semi-supervised classification task especially with very limited labeled samples.

  • (2)

    Two simple but effective strategies, namely the Gaussian mixture clustering criterion and neural gas clustering criterion, are introduced in the learning process of unlabeled data.

  • (3)

    An online modulation technique is considered to combine supervised and unsupervised learning criterion into a united framework.

  • (4)

    Compared to previous OSSL approaches, our proposed method achieves better classification performance while enjoys much lower computational cost.

The rest of the paper is organized as follows. In Section 2, we briefly discuss some related work. Section 3 provides our approach in detail. We describe the data sets and present experimental results in Section 4 and 5. Finally, Section 6 concludes this paper.

Section snippets

Related work

Our work is closely related to three major subfileds of machine learning research: online semi-supervised learning, learning vector quantization and prototype-based clustering. Below we briefly review representative work in each subfield.

Online semi-supervised learning vector quantization

In this section, we first give a general introduction to our proposed online semi-supervised learning vector quantization (OSS-LVQ) model in Section 3.1, then present the details of learning criterion from labeled and unlabeled samples in Section 3.2 and Section 3.3, respectively. Finally, we discuss how to use the online modulation technique to combine supervised and unsupervised learning criteria in Section 3.4.

Experiments and results

In this section, we first introduce 6 benchmark datasets, MNIST [42], 20 Newsgroups, CIFAR-10-2048D, CIFAR-10-500D, CIFAR-100-2048D and CIFAR-100-500D [43], used in our experiments. Then, we give the comparison methods and experimental setting. Finally, we compare classification performance of our proposed methods to baseline methods.

Discussion and analysis

To further investigate various aspects of our proposed method, we conduct experiments by varying one factor at a time while keeping the others fixed. Specifically, we show the classification performance with different percentage of labels in Section 5.1. Section 5.2 discusses about how the unlabeled data work in our proposed method. Finally, we analyze the effect of hyper-parameters in Section 5.3. To simplify the experiments, we use one prototype per class except in the discussion of the

Conclusions

Learning vector quantization (LVQ) represents a well-known family of algorithms with a long history, and plays a key role in many applications. However, most of the existing learning vector quantization algorithms presume that all the samples are labeled, which is expensive in the process of human labeling. In order to explore unlabeled data for boosting classification performance, we incorporate the clustering assumption to propose a versatile online semi-supervised method based LVQ.

As shown

CRediT authorship contribution statement

Yuan-Yuan Shen: Conceptualization, Methodology, Software, Writing - original draft. Yan-Ming Zhang: Methodology, Writing - review & editing. Xu-Yao Zhang: Methodology. Cheng-Lin Liu: Conceptualization, Methodology, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work has been supported by the Major Project for New Generation of AI under Grant No. 2018AAA0100400, the National Natural Science Foundation of China (NSFC) Grants 61836014, 61773376, 61721004.

Yuan-Yuan Shen received the B.S. degree from Anhui University, China, and the M.E. degree in computer science and technology from Xiamen University, China, in 2010 and 2015 respectively. Currently, she is pursuing the PhD degree at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing, China. Her research interests lie in the field of machine learning, pattern recognition and data stream classification. Her research goal is

References (47)

  • S.C. Hoi et al.

    Libol: a library for online learning algorithms

    J. Mach. Learn. Res.

    (2014)
  • N.C. Oza

    Online bagging and boosting

    IEEE International Conference on Systems, Man and Cybernetics

    (2005)
  • A. Saffari et al.

    On-line random forests

    IEEE International Conference on Computer Vision Workshops

    (2009)
  • X. Zhu et al.

    Semi-supervised learning using Gaussian fields and harmonic functions

    International Conference on Machine learning

    (2003)
  • X. Zhu

    Semi-supervised learning literature survey

    Comput. Sci. Univ. Wisconsin-Madison

    (2006)
  • O. Chapelle et al.

    Semi-Supervised Learning

    (2010)
  • A.B. Goldberg et al.

    Online manifold regularization: a new learning setting and empirical study

    Joint European Conference on Machine Learning and Knowledge Discovery in Databases

    (2008)
  • B. Kveton et al.

    Online semi-supervised perception: real-time learning without explicit feedback

    IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

    (2010)
  • M. Valko et al.

    Online semi-supervised learning on quantized graphs

    Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence

    (2010)
  • H. Grabner et al.

    Semi-supervised on-line boosting for robust tracking

    European Conference on Computer Vision

    (2008)
  • H.R. Loo et al.

    Online data stream classification with incremental semi-supervised learning

    ACM IKDD Conference on Data Sciences

    (2015)
  • L. Zhao et al.

    Simnest: social media nested epidemic simulation via online semi-supervised deep learning

    IEEE International Conference on Data Mining

    (2015)
  • T. Wagner et al.

    Semi-supervised learning on data streams via temporal label propagation

    International Conference on Machine Learning

    (2018)
  • Cited by (19)

    • Driving information process system-based real-time energy management for the fuel cell bus to minimize fuel cell engine aging and energy consumption

      2022, Energy
      Citation Excerpt :

      According to the driving characteristics of the urban driving cycle, four typical patterns of the urban driving cycle are selected in this paper: pattern 1 is stopped at the bus station or stop at an intersection; pattern 2 is acceleration mode starting from speed 0; pattern 3 is deceleration mode from maximum speed to speed 0, and pattern 4 is cruise mode of fuel cell bus, in this mode, the bus velocity will keep in a certain speed fluctuation range. This paper selects the learning vector quantization (LVQ) neural network model to realize the driving cycle pattern recognition which can efficiently recognize complex and nonlinear objects [21–24]. Its structure is shown in Fig. 2.

    • A Robust Kernel Least Mean Square Algorithm and its Quantization

      2023, International Journal of Pattern Recognition and Artificial Intelligence
    View all citing articles on Scopus

    Yuan-Yuan Shen received the B.S. degree from Anhui University, China, and the M.E. degree in computer science and technology from Xiamen University, China, in 2010 and 2015 respectively. Currently, she is pursuing the PhD degree at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing, China. Her research interests lie in the field of machine learning, pattern recognition and data stream classification. Her research goal is to develop techniques for continuous learning by addressing challenges and constraints that arise in practical scenarios.

    Yan-Ming Zhang is currently an associate professor at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation of Chinese Academy of Sciences, Beijing, China. He received the Bachelor’s degree from the Beijing University of Posts and Telecommunications, Beijing, China, and the Ph.D. degree in pattern recognition and intelligent systems from the Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing, in 2011. His current research interests include machine learning, pattern recognition, and graph neural network.

    Xu-Yao Zhang is currently an associate professor in National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, Beijing, China. He received BS degree in computational mathematics from Wuhan University, Wuhan, China, in 2008 and PhD degree in pattern recognition and intelligent systems from Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 2013. From March 2015 to March 2016, he was a visiting scholar in Montreal Institute for Learning Algorithms (MILA), University of Montreal, Canada. His research interests include machine learning, pattern recognition, handwriting recognition, and deep learning.

    Cheng-Lin Liu is a Professor at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation of Chinese Academy of Sciences, Beijing, China, and is now the director of the laboratory. He received the B.S. degree in electronic engineering from Wuhan University, Wuhan, China, the M.E. degree in electronic engineering from Beijing Polytechnic University, Beijing, China, the Ph.D. degree in pattern recognition and intelligent control from the Chinese Academy of Sciences, Beijing, China, in 1989, 1992 and 1995, respectively. He was a postdoctoral fellow at Korea Advanced Institute of Science and Technology (KAIST) and later at Tokyo University of Agriculture and Technology from March 1996 to March 1999. From 1999 to 2004, he was a research staff member and later a senior researcher at the Central Research Laboratory, Hitachi, Ltd., Tokyo, Japan. His research interests include pattern recognition, image processing, neural networks, machine learning, and especially the applications to character recognition and document analysis. He has published over 300 technical papers in journals and conferences. He won the IAPR/ICDAR Young Investigator Award of 2005. He is an associate editor-in-chief of Pattern Recognition Journal, an associate editor of Image and Vision and Computing, International Journal on Document Analysis and Recognition, and Cognitive Computation. He is a Fellow of the IAPR and the IEEE.

    View full text