Skip to main content

A Multi-Scale Learning Framework for Visual Categorization

  • Conference paper
Computer Vision – ACCV 2010 (ACCV 2010)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6492))

Included in the following conference series:

Abstract

Spatial pyramid matching has recently become a promising technique for image classification. Despite its success and popularity, no prior work has tackled the problem of learning the optimal spatial pyramid representation for the given image data and the associated object category. We propose a Multiple Scale Learning (MSL) framework to learn the best weights for each scale in the pyramid. Our MSL algorithm would produce class-specific spatial pyramid image representations and thus provide improved recognition performance. We approach the MSL problem as solving a multiple kernel learning (MKL) task, which defines the optimal combination of base kernels constructed at different pyramid levels. A wide range of experiments on Oxford flower and Caltech-101 datasets are conducted, including the use of state-of-the-art feature encoding and pooling strategies. Finally, excellent empirical results reported on both datasets validate the feasibility of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Csurka, G., Bray, C., Dance, C., Fan, L.: Visual categorization with bags of keypoints. In: ECCV Workshop on Statistical Learning in Computer Vision, pp. 1–22 (2004)

    Google Scholar 

  2. Jurie, F., Triggs, B.: Creating efficient codebooks for visual recognition. In: ICCV 2005: Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV 2005), Washington, DC, USA, vol. 1, pp. 604–610. IEEE Computer Society, Los Alamitos (2005)

    Google Scholar 

  3. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR 2006: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, pp. 2169–2178. IEEE Computer Society, Los Alamitos (2006)

    Google Scholar 

  4. Yang, J., Yu, K., Gong, Y., Huang, T.S.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR 2009: Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1794–1801. IEEE Computer Society, Los Alamitos (2009)

    Google Scholar 

  5. Grauman, K., Darrell, T.: The pyramid match kernel: discriminative classification with sets of image features. In: ICCV 2005: Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV 2005), vol. 2, pp. 1458–1465. IEEE Computer Society, Los Alamitos (2005)

    Google Scholar 

  6. Lanckriet, G.R.G., Cristianini, N., Bartlett, P., Ghaoui, L.E., Jordan, M.I.: Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 5, 27–72 (2004)

    MATH  Google Scholar 

  7. Bach, F.R., Lanckriet, G.R.G., Jordan, M.I.: Multiple kernel learning, conic duality, and the SMO algorithm. In: ICML 2004: Proceedings of the Twenty-First International Conference on Machine Learning, p. 6. ACM, New York (2004)

    Google Scholar 

  8. Crammer, K., Keshet, J., Singer, Y.: Kernel design using boosting. In: Advances in Neural Information Processing Systems 15, pp. 537–544. MIT Press, Cambridge (2003)

    Google Scholar 

  9. Hertz, T., Hillel, A.B., Weinshall, D.: Learning a kernel function for classification with small training samples. In: ICML 2006: Proceedings of the 23rd International Conference on Machine Learning, pp. 401–408. ACM, New York (2006)

    Google Scholar 

  10. Gehler, P.V., Nowozin, S.: On feature combination for multiclass object classification. In: ICCV 2009: Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV 2009). IEEE Computer Society, Los Alamitos (2009)

    Google Scholar 

  11. Varma, M., Ray, D.: Learning the discriminative power-invariance trade-off. In: Proceedings of the IEEE International Conference on Computer Vision. Rio de Janeiro, Brazil (2007)

    Google Scholar 

  12. Bosch, A., Zisserman, A., Munoz, X.: Image classification using ROIs and multiple kernel learning. In: IJCV 2008 (2008) (submitted)

    Google Scholar 

  13. Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: CVPR 2005: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, Los Alamitos (2005)

    Google Scholar 

  14. Babenko, B., Branson, S., Belongie, S.: Similarity metrics for categorization: from monolithic to category specific. In: ICCV 2009: Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV 2009), Kyoto, Japan. IEEE Computer Society, Los Alamitos (2009)

    Google Scholar 

  15. Hertz, T., Bar-Hillel, A., Weinshall, D.: Learning distance functions for image retrieval. In: CVPR 2004: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2 (2004)

    Google Scholar 

  16. Frome, A., Singer, Y., Malik, J.: Image retrieval and classification using local distance functions. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems 19, pp. 417–424. MIT Press, Cambridge (2007)

    Google Scholar 

  17. Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207–244 (2009)

    MATH  Google Scholar 

  18. Yang, L., Jin, R., Sukthankar, R., Liu, Y.: An efficient algorithm for local distance metric learning. In: AAAI 2006: Proceedings of the 21st National Conference on Artificial Intelligence, pp. 543–548. AAAI Press, Menlo Park (2006)

    Google Scholar 

  19. Winder, S., Brown, M.: Learning local image descriptors. In: CVPR 2007: Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE Computer Society, Los Alamitos (2007)

    Google Scholar 

  20. Subrahmanya, N., Shin, Y.C.: Sparse multiple kernel learning for signal processing applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 99 (2009)

    Google Scholar 

  21. Bach, F.R., Thibaux, R., Jordan, M.I.: Computing regularization paths for learning multiple kernels. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, pp. 73–80. MIT Press, Cambridge (2005)

    Google Scholar 

  22. Rakotomamonjy, A., Bach, F., Canu, S., Grandvalet, Y.: More efficiency in multiple kernel learning. In: ICML 2007: Proceedings of the 24th International Conference on Machine Learning, pp. 775–782. ACM, New York (2007)

    Google Scholar 

  23. Sonnenburg, S., Rätsch, G., Schäfer, C., Schölkopf, B.: Large scale multiple kernel learning. J. Mach. Learn. Res. 7, 1531–1565 (2006)

    MATH  Google Scholar 

  24. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001) http://www.csie.ntu.edu.tw/~cjlin/libsvm

  25. Nilsback, M.E., Zisserman, A.: A visual vocabulary for flower classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 1447–1454 (2006)

    Google Scholar 

  26. Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Comput. Vis. Image Underst. 106, 59–70 (2007)

    Article  Google Scholar 

  27. Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online dictionary learning for sparse coding. In: ICML 2009: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 689–696. ACM, New York (2009)

    Google Scholar 

  28. Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.Y.: Self-taught learning: transfer learning from unlabeled data. In: ICML 2007: Proceedings of the 24th International Conference on Machine Learning, pp. 759–766. ACM, New York (2007)

    Google Scholar 

  29. Berg, A.C., Berg, T.L., Malik, J.: Shape matching and object recognition using low distortion correspondence. In: CVPR 2005: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 26–33. IEEE Computer Society, Los Alamitos (2005)

    Google Scholar 

  30. Mutch, J., Lowe, D.G.: Multiclass object recognition with sparse, localized features. In: CVPR 2006: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, pp. 11–18. IEEE Computer Society, Los Alamitos (2006)

    Google Scholar 

  31. Zhang, H., Berg, A.C., Maire, M., Malik, J.: SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. In: CVPR 2006: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2126–2136. IEEE Computer Society, Los Alamitos (2006)

    Google Scholar 

  32. Lin, Y.Y., Liu, T.L., Fuh, C.S.: Local ensemble kernel learning for object category recognition. In: CVPR 2007: Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE Computer Society, Los Alamitos (2007)

    Google Scholar 

  33. Cao, L., Luo, J., Liang, F., Huang, T.S.: Heterogeneous feature machines for visual recognition. In: ICCV 2009: Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV 2009). IEEE Computer Society, Los Alamitos (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, SC., Wang, YC.F. (2011). A Multi-Scale Learning Framework for Visual Categorization. In: Kimmel, R., Klette, R., Sugimoto, A. (eds) Computer Vision – ACCV 2010. ACCV 2010. Lecture Notes in Computer Science, vol 6492. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19315-6_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19315-6_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19314-9

  • Online ISBN: 978-3-642-19315-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics