Skip to main content
Log in

Deep Learning

Layer-Wise Learning of Feature Hierarchies

  • Fachbeitrag
  • Published:
KI - Künstliche Intelligenz Aims and scope Submit manuscript

Abstract

Hierarchical neural networks for object recognition have a long history. In recent years, novel methods for incrementally learning a hierarchy of features from unlabeled inputs were proposed as good starting point for supervised training. These deep learning methods—together with the advances of parallel computers—made it possible to successfully attack problems that were not practical before, in terms of depth and input size. In this article, we introduce the reader to the basic concepts of deep learning, discuss selected methods in detail, and present application examples from computer vision and speech recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Yann LeCun, Yoshua Bengio & Geoffrey Hinton

References

  1. Behnke S (1999) Hebbian learning and competition in the neural abstraction pyramid. In: Proceedings of international joint conference on neural networks (IJCNN), Washington, DC, USA, vol 2, pp 1356–1361

    Google Scholar 

  2. Behnke S (2003a) Discovering hierarchical speech features using convolutional non-negative matrix factorization. In: Proceedings of international joint conference on neural networks (IJCNN), Portland, Oregon, USA, vol 4, pp 2758–2763

    Google Scholar 

  3. Behnke S (2003b) Hierarchical neural networks for image interpretation. Lecture notes in computer science, vol 2766. Springer, Berlin

    Book  MATH  Google Scholar 

  4. Bengio Y, Lamblin P, Popovici D, Larochelle H (2006) Greedy layer-wise training of deep networks. In: Advances in neural information processing systems (NIPS), Vancouver, Canada, pp 153–160

    Google Scholar 

  5. Bottou L (2011) From machine learning to machine reasoning. Arxiv preprint. arXiv:1102.1808

  6. Boureau Y, Bach F, LeCun Y, Ponce J (2010) Learning mid-level features for recognition. In: Proceedings of computer vision and pattern recognition (CVPR), San Francisco, CA, USA, pp 2559–2566

    Google Scholar 

  7. Cireşan DC, Meier U, Masci J, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: Proceedings of computer vision and pattern recognition (CVPR) (in press)

  8. Coates A, Lee H, Ng AY (2010) An analysis of single-layer networks in unsupervised feature learning. In: Proceedings of international conference on artificial intelligence and statistics (AISTATS), Chia, Laguna, Italy

    Google Scholar 

  9. Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of international conference on machine learning (ICML), Helsinki, Finland, pp 160–167

    Chapter  Google Scholar 

  10. Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4):303–314

    Article  MathSciNet  MATH  Google Scholar 

  11. Dahl G, Yu D, Deng L, Acero A (2012) Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1):30–42

    Article  Google Scholar 

  12. Erhan D, Manzagol P, Bengio Y, Bengio S, Vincent P (2009) The difficulty of training deep architectures and the effect of unsupervised pre-training. In: Proceedings of international conference on artificial intelligence and statistics (AISTATS), Clearwater Beach, FL, USA, pp 153–160

    Google Scholar 

  13. Erhan D, Bengio Y, Courville AC, Manzagol PA, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11:625–660

    MathSciNet  MATH  Google Scholar 

  14. Fidler S, Leonardis A (2007) Towards scalable representations of object categories: learning a hierarchy of parts. In: Proceedings of computer vision and pattern recognition (CVPR), Minneapolis, MN, USA

  15. Fukushima K (1980) Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36(4):193–202

    Article  MATH  Google Scholar 

  16. Grangier D, Bottou L, Collobert R (2009) Deep convolutional networks for scene parsing. In: ICML deep learning workshop, Montreal, Canada

    Google Scholar 

  17. Hinton G (2002) Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8):1771–1800

    Article  MATH  Google Scholar 

  18. Hinton G, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Article  MathSciNet  MATH  Google Scholar 

  19. Hinton G, Osindero S, Teh Y (2006) A fast learning algorithm for deep belief nets. Neural Comput. 18(7):1527–1554

    Article  MathSciNet  MATH  Google Scholar 

  20. Hochreiter S, Bengio Y Frasconi P, Schmidhuber J (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer SC, Kolen JF (eds) A field guide to dynamical recurrent neural networks. Wiley/IEEE Press, New York

    Google Scholar 

  21. Huang J, Mumford D (1999) Statistics of natural images and models. In: Proceedings of computer vision and pattern recognition (CVPR), Ft. Collins, CO, USA

    Google Scholar 

  22. Kavukcuoglu K, Ranzato M, LeCun Y (2010) Fast inference in sparse coding algorithms with applications to object recognition. CoRR abs/1010.3467

  23. LeCun Y, Boser B, Denker J, Henderson D, Howard R, Hubbard W, Jackel L (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4):541–551

    Article  Google Scholar 

  24. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc. IEEE 86:2278–2324

    Article  Google Scholar 

  25. Lee H, Grosse R, Ranganath R, Ng A (2009) Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of international conference on machine learning (ICML), New York, NY, USA, pp 609–616

    Google Scholar 

  26. Lee H, Pham P, Largman Y, Ng A (2009) Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Advances in neural information processing systems (NIPS), Vancouver, Canada, pp 1096–1104

    Google Scholar 

  27. Memisevic R (2011) Gradient-based learning of higher-order image features. In: Proceedings of international conference on computer vision (ICCV), Barcelona, Spain, pp 1591–1598

    Chapter  Google Scholar 

  28. Ranzato M, Hinton G (2010) Modeling pixel means and covariances using factorized third-order Boltzmann machines. In: Proceedings of computer vision and pattern recognition (CVPR), San Francisco, CA, USA, pp 2551–2558

    Google Scholar 

  29. Riesenhuber M, Poggio T (1999) Hierarchical models of object recognition in cortex. Nat. Neurosci. 2:1019–1025

    Article  Google Scholar 

  30. Rumelhart D, Hinton G, Williams R (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536

    Article  Google Scholar 

  31. Scherer D, Müller A, Behnke S (2010) Evaluation of pooling operations in convolutional architectures for object recognition. In: Proceedings of international conference on artificial neural networks (ICANN), Thessaloniki, Greece, pp 92–101

    Google Scholar 

  32. Schulz H, Behnke S (2012) Learning object-class segmentation with convolutional neural networks. In: Proceedings of the European symposium on artificial neural networks (ESANN), Bruges, Belgium

    Google Scholar 

  33. Shannon C (1949) The synthesis of two-terminal switching circuits. Bell Syst. Tech. J. 28(1):59–98

    MathSciNet  Google Scholar 

  34. Taylor G, Fergus R, LeCun Y, Bregler C (2010) Convolutional learning of spatio-temporal features. In: Computer Vision (ECCV 2010), pp 140–153

    Chapter  Google Scholar 

  35. Tieleman T (2008) Training restricted Boltzmann machines using approximations to the likelihood gradient. In: Proceedings of international conference on machine learning (ICML), pp 1064–1071

    Chapter  Google Scholar 

  36. Vincent P (2011) A connection between score matching and denoising autoencoders. Neural Comput. 23(7):1661–1674

    Article  MathSciNet  MATH  Google Scholar 

  37. Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11:3371–3408

    MathSciNet  MATH  Google Scholar 

  38. Weston J, Ratle F, Collobert R (2008) Deep learning via semi-supervised embedding. In: Proceedings of international conference on machine learning (ICML), Helsinki, Finland, pp 1168–1175

    Chapter  Google Scholar 

  39. Wiskott L, Sejnowski T (2002) Slow feature analysis: unsupervised learning of invariances. Neural Comput. 14(4):715–770

    Article  MATH  Google Scholar 

  40. Zeiler M, Taylor G, Fergus R (2011) Adaptive deconvolutional networks for mid and high level feature learning. In: Proceedings of international conference on computer vision (ICCV), Barcelona, Spain, pp 2018–2025

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hannes Schulz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schulz, H., Behnke, S. Deep Learning. Künstl Intell 26, 357–363 (2012). https://doi.org/10.1007/s13218-012-0198-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13218-012-0198-z

Keywords

Navigation