Skip to main content

Multi-Scale Hierarchy Deep Feature Aggregation for Compact Image Representations

  • Conference paper
  • First Online:
Computer Vision – ACCV 2016 Workshops (ACCV 2016)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10118))

Included in the following conference series:

  • 3151 Accesses

Abstract

Deep Convolutional Neural Networks have set remarkable milestones in the field of computer vision, especially in image classification tasks. However, training a deep network is heavily depending on massive labeled data and expensive computation resource. A number of studies have shown that utilizing a pre-trained model for deep feature extraction can achieve excellent performance. While most of these methods only consider the features from fully connected layers, we delve deep into the intermediate convolution layers. We propose the Selected Multi-Scale Convolution feature (SMSC) for compact deep representations. A convolutional feature map selection and deep descriptor aggregation method are proposed, and a fusion method of the multi-layer features for compact representation is introduced. The experimental results on the known MIT-Indoor dataset have demonstrated the effectiveness and efficiency of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  2. Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. arXiv preprint arXiv:1405.3531 (2014)

  3. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)

    Google Scholar 

  4. Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: ICCV (2003)

    Google Scholar 

  5. Jegou, H., Douze, M., Schmid, C., Perez, P.: Aggregating local descriptors into a compact image representation. In: CVPR (2010)

    Google Scholar 

  6. Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  7. Gao, B. Bin Wei, X.S.: Deep spatial pyramid: the devil is once again in the details. arXiv preprint arXiv:1504.05277 (2015)

  8. Liu, L., Shen, C., van den Hengel, A.: The treasure beneath convolutional layers: cross-convolutional-layer pooling for image classification. In: CVPR (2015)

    Google Scholar 

  9. Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 392–407. Springer, Heidelberg (2014)

    Google Scholar 

  10. Yoo, D., Park, S., Lee, J.Y., Kweon, I.: Multi-scale pyramid pooling for deep convolutional representation. In: CVPR Workshops (2015)

    Google Scholar 

  11. Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: CVPR Workshops (2014)

    Google Scholar 

  12. Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., Zhang, Z.: The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: CVPR (2015)

    Google Scholar 

  13. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015)

    Google Scholar 

  14. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)

    Google Scholar 

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)

  16. Mohedano, E., Salvador, A., McGuinness, K., Marques, F., O’Connor, N.E., Giró-i-Nieto, X.: Bags of local convolutional features for scalable instance search. arXiv preprint arXiv:1604.04653 (2016)

  17. Salvador, A., Giró-i-Nieto, X., Marqués, F., Satoh, S.I.: Faster R-CNN features for instance search. In: CVPR Workshops (2016)

    Google Scholar 

  18. Uricchio, T., Bertini, M., Seidenari, L., Bimbo, A.: Fisher encoded convolutional bag-of-windows for efficient image retrieval and social image tagging. In: CVPR Workshops (2016)

    Google Scholar 

  19. Hariharan, B., Arbelez, P., Girshick, R., Malik, J.: Hypercolumns for object segmentation and fine-grained localization. In: CVPR (2015)

    Google Scholar 

  20. Kulkarni, P., Zepeda, J., Jurie, F., Perez, P., Chevallier, L.: Hybrid multi-layer deep CNN/aggregator feature for image classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015)

    Google Scholar 

  21. Sun, Y., Wang, X., Tang, X.: Deeply learned face representations are sparse, selective, and robust. In: CVPR (2015)

    Google Scholar 

  22. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia (2014)

    Google Scholar 

  23. Wei, X.S., Luo, J.H., Wu, J.: Selective convolutional descriptor aggregation for fine-grained image retrieval. arXiv preprint arXiv:1604.04994 (2016)

  24. Wang, L., Ouyang, W., Wang, X., Lu, H.: Visual tracking with fully convolutional networks. In: ICCV (2015)

    Google Scholar 

  25. Gonzalez, R.C., Woods, R.E., Eddins, S.L.: Digital Image Processing Using MATLAB. McGraw Hill Education, New York City (2010)

    Google Scholar 

  26. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)

    Google Scholar 

  27. Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: CVPR (2009)

    Google Scholar 

  28. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)

    MATH  Google Scholar 

  29. Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms. In: Proceedings of the 18th ACM International Conference on Multimedia (2010)

    Google Scholar 

  30. Juneja, M., Vedaldi, A., Jawahar, C.V., Zisserman, A.: Blocks that shout: distinctive parts for scene classification. In: CVPR (2013)

    Google Scholar 

  31. Pandey, M., Lazebnik, S.: Scene recognition and weakly supervised object localization with deformable part-based models. In: ICCV (2011)

    Google Scholar 

  32. Azizpour, H., Razavian, A., Sullivan, J., Maki, A., Carlsson, S.: From generic to specific deep representations for visual recognition. In: CVPR Workshops (2015)

    Google Scholar 

Download references

Acknowledgement

This work was supported in part by the National Natural Science Foundation of China under grant number 61401154, by the Natural Science Foundation of Hebei Province under grant number F2016502101, and by the Fundamental Research Funds for the Central Universities under grant number 2015ZD20.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guozhi Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zhao, Z., Xu, G., Qi, Y. (2017). Multi-Scale Hierarchy Deep Feature Aggregation for Compact Image Representations. In: Chen, CS., Lu, J., Ma, KK. (eds) Computer Vision – ACCV 2016 Workshops. ACCV 2016. Lecture Notes in Computer Science(), vol 10118. Springer, Cham. https://doi.org/10.1007/978-3-319-54526-4_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54526-4_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54525-7

  • Online ISBN: 978-3-319-54526-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics