Skip to main content

Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition

  • Conference paper
Artificial Neural Networks – ICANN 2010 (ICANN 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6354))

Included in the following conference series:

Abstract

A common practice to gain invariant features in object recognition models is to aggregate multiple low-level features over a small neighborhood. However, the differences between those models makes a comparison of the properties of different aggregation functions hard. Our aim is to gain insight into different functions by directly comparing them on a fixed architecture for several common object recognition tasks. Empirical results show that a maximum pooling operation significantly outperforms subsampling operations. Despite their shift-invariant properties, overlapping pooling windows are no significant improvement over non-overlapping pooling windows. By applying this knowledge, we achieve state-of-the-art error rates of 4.57% on the NORB normalized-uniform dataset and 5.6% on the NORB jittered-cluttered dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ahmed, A., Yu, K., Xu, W., Gong, Y., Xing, E.: Training hierarchical feed-forward visual recognition models using transfer learning from pseudo-tasks. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 69–82. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  2. Behnke, S.: Hierarchical Neural Networks for Image Interpretation. LNCS, vol. 2766. Springer, Heidelberg (2003)

    MATH  Google Scholar 

  3. Dalal, N., Triggs, B.: Histograms of Oriented Gradients for Human Detection. In: CVPR, pp. 886–893 (2005)

    Google Scholar 

  4. Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Computer Vision and Image Understanding 106(1), 59–70 (2007)

    Article  Google Scholar 

  5. Frome, A., Cheung, G., Abdulkader, A., Zennaro, M., Wu, B., Bissacco, A., Adam, H., Neven, H., Vincent, L.: Large-scale Privacy Protection in Google Street View. EUA, California (2009)

    Google Scholar 

  6. Fukushima, K.: A neural network model for selective attention in visual pattern recognition. Biological Cybernetics 55(1), 5–15 (1986)

    Article  MATH  Google Scholar 

  7. Huang, F.-J., LeCun, Y.: Large-scale learning with svm and convolutional nets for generic object categorization. In: Proc. Computer Vision and Pattern Recognition Conference (CVPR 2006). IEEE Press, Los Alamitos (2006)

    Google Scholar 

  8. Hubel, D.H., Wiesel, T.N.: Receptive fields of single neurones in the cat’s striate cortex. The Journal of Physiology 148(3), 574 (1959)

    Google Scholar 

  9. Lazebnik, S., Schmid, C., Ponce, J.: Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In: CVPR, vol. (2), pp. 2169–2178. IEEE Computer Society, Los Alamitos (2006)

    Google Scholar 

  10. LeCun, Y., Bottou, L., Orr, G., Müller, K.: Efficient BackProp. In: Orr, G.B., Müller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, p. 9. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  11. LeCun, Y., Huang, F., Bottou, L.: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting. In: Proceedings of CVPR 2004. IEEE Press, Los Alamitos (2004)

    Google Scholar 

  12. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 91–110 (2004)

    Article  Google Scholar 

  13. Müller, A., Schulz, H., Behnke, S.: Topological Features in Locally Connected RBMs. In: Proc. International Joint Conference on Neural Networks, IJCNN 2010 (2010)

    Google Scholar 

  14. Mutch, J., Lowe, D.G.: Multiclass Object Recognition with Sparse, Localized Features. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition,vol. 1, pp. 11–18 (2006)

    Google Scholar 

  15. Nair, V., Hinton, G.: 3-d object recognition with deep belief nets. In: Advances in Neural Information Processing Systems (2010)

    Google Scholar 

  16. Nvidia Corporation. CUDA Programming Guide 3.0 (February 2010)

    Google Scholar 

  17. Osadchy, M., LeCun, Y., Miller, M.: Synergistic Face Detection and Pose Estimation with Energy-Based Models. Journal of Machine Learning Research 8, 1197–1215 (2007)

    Google Scholar 

  18. Ranzato, M., Huang, F.-J., Boureau, Y.-L., LeCun, Y.: Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: Proc. Computer Vision and Pattern Recognition Conference (CVPR 2007). IEEE Press, Los Alamitos (2007)

    Google Scholar 

  19. Riedmiller, M., Braun, H.: RPROP – Description and Implementation Details. Technical report, University of Karlsruhe (January 1994)

    Google Scholar 

  20. Riesenhuber, M., Poggio, T.: Hierarchical models of object recognition in cortex. Nature Neuroscience 2, 1019–1025 (1999)

    Article  Google Scholar 

  21. Serre, T., Wolf, L., Poggio, T.: Object recognition with features inspired by visual cortex. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 2 (2005)

    Google Scholar 

  22. Siagian, C., Itti, L.: Rapid biologically-inspired scene classification using features shared with visual attention. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(2), 300 (2007)

    Article  Google Scholar 

  23. Simard, P.Y., Steinkraus, D., Platt, J.C.: Best Practice for Convolutional Neural Networks Applied to Visual Document Analysis. In: International Conference on Document Analysis and Recogntion (ICDAR), pp. 958–962. IEEE Computer Society, Los Alamitos (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Scherer, D., Müller, A., Behnke, S. (2010). Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition. In: Diamantaras, K., Duch, W., Iliadis, L.S. (eds) Artificial Neural Networks – ICANN 2010. ICANN 2010. Lecture Notes in Computer Science, vol 6354. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15825-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15825-4_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15824-7

  • Online ISBN: 978-3-642-15825-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics