Abstract
The research of semi-supervised learning (SSL) is of great significance because it is very expensive to collect a large quantity of data with labels in some fields. Two recent deep learning-based SSL algorithms, temporal ensembling and virtual adversarial training (VAT), have achieved state-of-the-art accuracy in some classical SSL tasks, while both of them have shortcomings. Because of simply adding random noise to training data, temporal ensembling is not fully utilized. In addition, VAT has considerable time costs because there are two inferences in each epoch for unlabeled samples. In this paper, we propose the use of virtual adversarial perturbations (VAP) in temporal ensembling rather than random noises to improve performance. Moreover, we also find that reusing VAP can accelerate the training process of VAT without losing obvious accuracy. The two methods are validated on MNIST, FashionMNIST and SVHN.
Similar content being viewed by others
References
Baldi P, Sadowski PJ (2013) Understanding dropout. In: NIPS
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434
Blum A, Mitchell TM (1998) Combining labeled and unlabeled data with co-training. In: COLT
Chapelle O, Scholkopf B, Zien A (2009) Semi-supervised learning (Chapelle, O. et al., eds.; 2006) [book reviews]. IEEE Trans Neural Netw 20(3):542
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-FL (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255
Gal Y, Ghahramani Z (2016) Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: ICML
Golub GH, Van Der Vorst HA (2000) Eigenvalue computation in the 20th century. J Comput Appl Math 123(1–2):35–65
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville AC, Bengio Y (2014) Generative adversarial nets. In: NIPS
Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. CoRR, arXiv:1412.6572
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Hoffer E, Ailon N (2016) Semi-supervised deep learning by metric embedding. ArXiv, arXiv:1611.01449
Hong C, Jun Y, Tao D, Meng W (2015) Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval. IEEE Trans Ind Electron 62(6):3742–3751
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint. arXiv:1412.6980
Kingma DP, Mohamed S, Rezende DJ, Welling M (2014) Semi-supervised learning with deep generative models. In: NIPS
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images, vol 1. University of Toronto, Technical report, p 7
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: International conference on neural information processing systems
Laine S, Aila T (2016) Temporal ensembling for semi-supervised learning. arXiv preprint. arXiv:1610.02242
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Lee DH (2013) Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on challenges in representation learning, ICML, vol 3. p 2
Li M, Lv J, Wang J, Sang Y (2019) An abstract painting generation method based on deep generative model. In Neural Processing Letters, pp 1–12
MacKay DJC (2003) Information theory, inference, and learning algorithms. IEEE Trans Inf Theory 50:2544–2545
Miyato T, Maeda SI, Koyama M, Ishii S (2017) Virtual adversarial training: a regularization method for supervised and semi-supervised learning. In: IEEE Trans Pattern Anal Mach Intell 99:1
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: ICML
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2011) Reading digits in natural images with unsupervised feature learning
Ng YC, Colombo N, Silva R (2018) Bayesian semi-supervised learning with graph Gaussian processes
Nigam K, Mccallum ST, Mitchell T (2000) Text classification from labeled and unlabeled documents using EM. Mach Learn 39(2–3):103–134
Oliver A, Odena A, Raffel CA, Cubuk ED, Goodfellow I (2018) Realistic evaluation of deep semi-supervised learning algorithms. In: Advances in neural information processing systems, pp 3235–3246
Park S, Park J-K, Shin S-J, Moon I-C (2018) Adversarial dropout for supervised and semi-supervised learning. In: AAAI
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch
Rasmus A, Berglund M, Honkala M, Valpola H, Raiko T (2015) Semi-supervised learning with ladder networks. In: NIPS
Salimans T, Kingma DP (2016) Weight normalization: a simple reparameterization to accelerate training of deep neural networks. In: Advances in neural information processing systems. pp 901–909
Shinoda S, Worrall DE, Brostow GJ (2017) Virtual adversarial ladder networks for semi-supervised learning. CoRR, arXiv:1711.07476
Sindhwani V, Niyogi P, Belkin M (2005) A co-regularization approach to semi-supervised learning with multiple views. In: ICML 2005
Springenberg JT (2016) Unsupervised and semi-supervised learning with categorical generative adversarial networks. CoRR, arXiv:1511.06390
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 1–9
Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow IJ, Fergus R (2014) Intriguing properties of neural networks. CoRR, arXiv:1312.6199
Tarvainen A, Valpola H (2018) Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results
Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of annual meeting of the association for computational linguistics, pp 189–196
Yu B, Wu J, Zhu Z (2018) Tangent-normal adversarial regularization for semi-supervised learning. CoRR, arXiv:1808.06088
Yu J, Yang X, Gao F, Tao D (2016) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans Cybern 99:1–11
Yu JS, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23:2019–2032
Yu J, Tan M, Zhang H, Tao D, Rui Y (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2019.2932058
Yu J, Zhu C, Zhang J, Huang Q, Tao D (2019) Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition. IEEE Trans Neural Netw Learn Sys. https://doi.org/10.1109/TNNLS.2019.2908982
Zhang JW, Yu JS, Tao D (2018) Local deep-feature alignment for unsupervised dimension reduction. IEEE Trans Image Process 27:2420–2432
Zhu X, Ghahramani Z (2002) Learning from labeled and unlabeled data. Tech Report 3175(2004):237–244
Acknowledgements
The work was supported by the National Key R&D Program of China under Grant 2017YFC1501301, the Natural Science Foundation of China under Grants 61876219, 61503144, 61673188 and 61761130081, the Natural Science Foundation of Hubei Province of China under Grant 2017CFB519.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhou, W., Lian, C., Zeng, Z. et al. Mutual Improvement Between Temporal Ensembling and Virtual Adversarial Training. Neural Process Lett 51, 1111–1124 (2020). https://doi.org/10.1007/s11063-019-10132-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-019-10132-7