Multi-Scale Hierarchy Deep Feature Aggregation for Compact Image Representations

Zhao, Zhenbing; Xu, Guozhi; Qi, Yincheng

doi:10.1007/978-3-319-54526-4_41

Zhenbing Zhao¹⁶,
Guozhi Xu¹⁶ &
Yincheng Qi¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10118))

Included in the following conference series:

Asian Conference on Computer Vision

3151 Accesses

Abstract

Deep Convolutional Neural Networks have set remarkable milestones in the field of computer vision, especially in image classification tasks. However, training a deep network is heavily depending on massive labeled data and expensive computation resource. A number of studies have shown that utilizing a pre-trained model for deep feature extraction can achieve excellent performance. While most of these methods only consider the features from fully connected layers, we delve deep into the intermediate convolution layers. We propose the Selected Multi-Scale Convolution feature (SMSC) for compact deep representations. A convolutional feature map selection and deep descriptor aggregation method are proposed, and a fusion method of the multi-layer features for compact representation is introduced. The experimental results on the known MIT-Indoor dataset have demonstrated the effectiveness and efficiency of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. arXiv preprint arXiv:1405.3531 (2014)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)
Google Scholar
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: ICCV (2003)
Google Scholar
Jegou, H., Douze, M., Schmid, C., Perez, P.: Aggregating local descriptors into a compact image representation. In: CVPR (2010)
Google Scholar
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)
Chapter Google Scholar
Gao, B. Bin Wei, X.S.: Deep spatial pyramid: the devil is once again in the details. arXiv preprint arXiv:1504.05277 (2015)
Liu, L., Shen, C., van den Hengel, A.: The treasure beneath convolutional layers: cross-convolutional-layer pooling for image classification. In: CVPR (2015)
Google Scholar
Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 392–407. Springer, Heidelberg (2014)
Google Scholar
Yoo, D., Park, S., Lee, J.Y., Kweon, I.: Multi-scale pyramid pooling for deep convolutional representation. In: CVPR Workshops (2015)
Google Scholar
Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: CVPR Workshops (2014)
Google Scholar
Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., Zhang, Z.: The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: CVPR (2015)
Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)
Mohedano, E., Salvador, A., McGuinness, K., Marques, F., O’Connor, N.E., Giró-i-Nieto, X.: Bags of local convolutional features for scalable instance search. arXiv preprint arXiv:1604.04653 (2016)
Salvador, A., Giró-i-Nieto, X., Marqués, F., Satoh, S.I.: Faster R-CNN features for instance search. In: CVPR Workshops (2016)
Google Scholar
Uricchio, T., Bertini, M., Seidenari, L., Bimbo, A.: Fisher encoded convolutional bag-of-windows for efficient image retrieval and social image tagging. In: CVPR Workshops (2016)
Google Scholar
Hariharan, B., Arbelez, P., Girshick, R., Malik, J.: Hypercolumns for object segmentation and fine-grained localization. In: CVPR (2015)
Google Scholar
Kulkarni, P., Zepeda, J., Jurie, F., Perez, P., Chevallier, L.: Hybrid multi-layer deep CNN/aggregator feature for image classification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015)
Google Scholar
Sun, Y., Wang, X., Tang, X.: Deeply learned face representations are sparse, selective, and robust. In: CVPR (2015)
Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia (2014)
Google Scholar
Wei, X.S., Luo, J.H., Wu, J.: Selective convolutional descriptor aggregation for fine-grained image retrieval. arXiv preprint arXiv:1604.04994 (2016)
Wang, L., Ouyang, W., Wang, X., Lu, H.: Visual tracking with fully convolutional networks. In: ICCV (2015)
Google Scholar
Gonzalez, R.C., Woods, R.E., Eddins, S.L.: Digital Image Processing Using MATLAB. McGraw Hill Education, New York City (2010)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)
Google Scholar
Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: CVPR (2009)
Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms. In: Proceedings of the 18th ACM International Conference on Multimedia (2010)
Google Scholar
Juneja, M., Vedaldi, A., Jawahar, C.V., Zisserman, A.: Blocks that shout: distinctive parts for scene classification. In: CVPR (2013)
Google Scholar
Pandey, M., Lazebnik, S.: Scene recognition and weakly supervised object localization with deformable part-based models. In: ICCV (2011)
Google Scholar
Azizpour, H., Razavian, A., Sullivan, J., Maki, A., Carlsson, S.: From generic to specific deep representations for visual recognition. In: CVPR Workshops (2015)
Google Scholar

Download references

Acknowledgement

This work was supported in part by the National Natural Science Foundation of China under grant number 61401154, by the Natural Science Foundation of Hebei Province under grant number F2016502101, and by the Fundamental Research Funds for the Central Universities under grant number 2015ZD20.

Author information

Authors and Affiliations

School of Electrical and Electronic Engineering, North China Electric Power University, 619 Yonghua North Street, Baoding, 071003, Hebei, China
Zhenbing Zhao, Guozhi Xu & Yincheng Qi

Authors

Zhenbing Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Guozhi Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yincheng Qi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guozhi Xu .

Editor information

Editors and Affiliations

Institute of Information Science, Academia Sinica, Taipei, Taiwan
Chu-Song Chen
Tsinghua University , Beijing, China
Jiwen Lu
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
Kai-Kuang Ma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, Z., Xu, G., Qi, Y. (2017). Multi-Scale Hierarchy Deep Feature Aggregation for Compact Image Representations. In: Chen, CS., Lu, J., Ma, KK. (eds) Computer Vision – ACCV 2016 Workshops. ACCV 2016. Lecture Notes in Computer Science(), vol 10118. Springer, Cham. https://doi.org/10.1007/978-3-319-54526-4_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-54526-4_41
Published: 16 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54525-7
Online ISBN: 978-3-319-54526-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics