Abstract
We propose a novel consistency based regularization method for semi-supervised image classification, called feature distribution matching (FDM), which is designed to induce smoothness of feature space by reducing sliced Wasserstein distance between feature distributions of labeled and unlabeled set. Unlike previous perturbation based methods, FDM does not require extra computational cost except one regularization loss. Our result shows that FDM combined with entropy minimization improves classification accuracy compared to supervised-only baseline and some previous methods. We also analyze our method by visualizing feature embeddings which shows that FDM lead smooth data manifold on feature space.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: ICML, Proceedings of Machine Learning Research, vol. 70, pp. 214–223. PMLR (2017)
Dai, Z., Yang, Z., Yang, F., Cohen, W.W., Salakhutdinov, R.: Good semi-supervised learning that requires a bad GAN. In: NIPS, pp. 6513–6523 (2017)
Deshpande, I., Zhang, Z., Schwing, A.G.: Generative modeling using the sliced Wasserstein distance. CoRR abs/1803.11188 (2018)
Grandvalet, Y., Bengio, Y.: Semi-supervised learning by entropy minimization. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 17, pp. 529–536. MIT Press (2005)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778. IEEE Computer Society (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR, pp. 2261–2269. IEEE Computer Society (2017)
Huang, G., Sun, Y., Liu, Z., Sedra, D., Weinberger, K.Q.: Deep networks with stochastic depth. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 646–661. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_39
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, JMLR Workshop and Conference Proceedings, vol. 37, pp. 448–456 (2015). JMLR.org
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014)
Kingma, D.P., Mohamed, S., Rezende, D.J., Welling, M.: Semi-supervised learning with deep generative models. In: NIPS, pp. 3581–3589 (2014)
Kolouri, S., Martin, C.E., Rohde, G.K.: Sliced-Wasserstein autoencoder: an embarrassingly simple generative model. CoRR abs/1804.01947 (2018)
Kolouri, S., Park, S.R., Thorpe, M., Slepcev, D., Rohde, G.K.: Optimal mass transport: signal processing and machine-learning applications. IEEE Signal Process. Mag. 34(4), 43–59 (2017)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto (2009)
Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. CoRR abs/1610.02242 (2016)
Lee, D.H.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks, July 2013
Luo, Y., Zhu, J., Li, M., Ren, Y., Zhang, B.: Smooth neighbors on teacher graphs for semi-supervised learning. CoRR abs/1711.00258 (2017)
Miyato, T., Maeda, S., Koyama, M., Ishii, S.: Virtual adversarial training: a regularization method for supervised and semi-supervised learning. CoRR abs/1704.03976 (2017)
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011)
Oliver, A., Odena, A., Raffel, C., Cubuk, E.D., Goodfellow, I.J.: Realistic evaluation of deep semi-supervised learning algorithms. CoRR abs/1804.09170 (2018). http://arxiv.org/abs/1804.09170
Salimans, T., Goodfellow, I.J., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: NIPS, pp. 2226–2234 (2016)
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. 15(1), 1929–1958 (2014)
Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 1195–1204 (2017)
Tolstikhin, I.O., Bousquet, O., Gelly, S., Schölkopf, B.: Wasserstein auto-encoders. CoRR abs/1711.01558 (2017)
Acknowledgments
This work was supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIT) (2017-0-01780, The technology development for event recognition/relational reasoning and learning knowledge based system for video understanding).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Kim, J., Lee, C., Kim, J. (2018). Regularizing Feature Distribution Using Sliced Wasserstein Distance for Semi-supervised Learning. In: Kaenampornpan, M., Malaka, R., Nguyen, D., Schwind, N. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2018. Lecture Notes in Computer Science(), vol 11248. Springer, Cham. https://doi.org/10.1007/978-3-030-03014-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-03014-8_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03013-1
Online ISBN: 978-3-030-03014-8
eBook Packages: Computer ScienceComputer Science (R0)