Skip to main content

Two-Layer Contractive Encodings with Shortcuts for Semi-supervised Learning

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8226))

Abstract

Supervised training of multi-layer perceptrons (MLP) with only few labeled examples is prone to overfitting. Pretraining an MLP with unlabeled samples of the input distribution may achieve better generalization. Usually, pretraining is done in a layer-wise, greedy fashion which limits the complexity of the learnable features. To overcome this limitation, two-layer contractive encodings have been proposed recently—which pose a more difficult optimization problem, however. On the other hand, linear transformations of perceptrons have been proposed to make optimization of deep networks easier. In this paper, we propose to combine these two approaches. Experiments on handwritten digit recognition show the benefits of our combined approach to semi-supervised learning.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bengio, Y.: Learning deep architectures for AI. Foundations and Trends in Machine Learning 2(1), 1–127 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence Special Issue on Learning Deep Architectures (2013); early Access

    Google Scholar 

  3. Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems, vol. 19, pp. 153–160. MIT Press, Cambridge (2007)

    Google Scholar 

  4. Bergstra, J., Bardenet, R., Bengio, Y., Kgl, B.: Algorithms for hyper-parameter optimization. In: Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 24, pp. 2546–2554 (2011)

    Google Scholar 

  5. Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge (2006)

    Google Scholar 

  6. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS), JMLR Workshop and Conference Proceedings, vol. 9, pp. 249–256. JMLR W&CP (2010)

    Google Scholar 

  7. Hinton, G., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  8. Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Networks 2(5), 359–366 (1989)

    Article  Google Scholar 

  9. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  10. Raiko, T., Valpola, H., LeCun, Y.: Deep learning made easier by linear transformations in perceptrons. In: Proceedings of the Fifteenth Internation Conference on Artificial Intelligence and Statistics (AISTATS), JMLR Workshop and Conference Proceedings, vol. 22, pp. 924–932. JMLR W&CP (April 2012)

    Google Scholar 

  11. Ranzato, M., Huang, F.J., Boureau, Y.L., LeCun, Y.: Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)

    Google Scholar 

  12. Ranzato, M., Poultney, C., Chopra, S., LeCun, Y.: Efficient learning of sparse representations with an energy-based model. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems 19, pp. 1137–1144. MIT Press, Cambridge (2007)

    Google Scholar 

  13. Rifai, S., Mesnil, G., Vincent, P., Muller, X., Bengio, Y., Dauphin, Y., Glorot, X.: Higher order contractive auto-encoder. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part II. LNCS, vol. 6912, pp. 645–660. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  14. Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive auto-encoders: Explicit invariance during feature extraction. In: Getoor, L., Scheffer, T. (eds.) Proceedings of the 28th International Conference on Machine Learning (ICML), pp. 833–840. ACM, New York (2011)

    Google Scholar 

  15. Schulz, H., Behnke, S.: Learning two-layer contractive encodings. In: Villa, A.E.P., Duch, W., Érdi, P., Masulli, F., Palm, G. (eds.) ICANN 2012, Part I. LNCS, vol. 7552, pp. 620–628. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  16. Vatanen, T., Raiko, T., Valpola, H., LeCun, Y.: Pushing stochastic gradient towards second-order methods – backpropagation learning with transformations in nonlinearities. arXiv:1301.3476 [cs.LG] (May 2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Schulz, H., Cho, K., Raiko, T., Behnke, S. (2013). Two-Layer Contractive Encodings with Shortcuts for Semi-supervised Learning. In: Lee, M., Hirose, A., Hou, ZG., Kil, R.M. (eds) Neural Information Processing. ICONIP 2013. Lecture Notes in Computer Science, vol 8226. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-42054-2_56

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-42054-2_56

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-42053-5

  • Online ISBN: 978-3-642-42054-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics