Multi-scale Deep Residual Networks for Fine-Grained Image Classification

Wang, Xiangyang; Jin, Yusu; Liu, Zhi; Zhao, Yadong; Zhu, Xiaoqiang; Zhang, Juan

doi:10.1007/978-981-10-4211-9_21

Xiangyang Wang¹²,
Yusu Jin¹²,
Zhi Liu¹²,
Yadong Zhao¹²,
Xiaoqiang Zhu¹² &
…
Juan Zhang¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 685))

Included in the following conference series:

International Forum of Digital TV and Wireless Multimedia Communication

1340 Accesses

Abstract

Fine-grained image classification aims at distinguishing very similar images, i.e., the subcategories in one class. Compared with generic object recognition, fine-grained image classification is much more challenging due to the small inter-class variance. Deep Residual Networks (ResNet) is a recently proposed deep Convolution Neural Networks (CNN) model, and has achieved the excellent performance on image classification. Though powerful, like other contemporary CNN models, ResNet only exploits the features extracted from the last output layer for classification, which may be insufficient for fine-grained classification. In this paper, we propose a Multi-scale Residual Networks (Multi-scale ResNet) to further improve the fine-grained image classification performance. Based on the ResNet model, we extract features from multiple CNN layers, add these high-level and mid-level features together with different weights for final classification. We compare our proposed model with some state-of-the-art models on two fine-grained image dataset, Stanford Cars and Dogs, and experimental results validate the efficacy of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Krause, J., Jin, H., Yang, J., Fei-Fei, L.: Fine-grained recognition without part annotations. In: CVPR (2015)
Google Scholar
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: 3D Representation and Recognition Workshop at ICCV (2013)
Google Scholar
Yao, B., Khosla, A., Fei-Fei, L.: Combining randomization and discrimination for fine-grained image categorization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1577–1584 (2011)
Google Scholar
Zhang, N., Farrell, R., Iandola, F., Darrell, T.: Deformable part descriptors for fine-grained recognition and attribute prediction. In: IEEE International Conference on Computer Vision (ICCV), pp. 729–736 (2013)
Google Scholar
Yang, S., Bo, L., Wang, J., Shapiro, L.G.: Unsupervised template learning for fine-grained object recognition. In: Advances in Neural Information Processing Systems (NIPS), pp. 3122–3130 (2012)
Google Scholar
Yang, L., Luo, P., Loy, C.C., Tang, X.: A large-scale car dataset for fine-grained categorization and verification. In: CVPR (2015)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015)
Google Scholar
Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)
Google Scholar
Xie, S., Tu, Z.: Holistically-nested edge detection. In: ICCV (2015)
Google Scholar
Lin, T.-Y., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: ICCV (2015)
Google Scholar
Xie, S., Yang, T., Wang, X., Lin, Y.: Hyper-class augmented and regularized deep learning for fine-grained image classification. In: CVPR (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR 2016. arXiv:1512.03385. 10 December 2015
Yang, S., Ramanan, D.: Multi-scale recognition with DAG-CNNs. In: ICCV (2015)
Google Scholar
Li, G., Yu, Y.: Visual saliency based on multiscale deep features. In: CVPR (2015)
Google Scholar
Ma, C., Huang, J.-B., Yang, X., Yang, M.-H.: Hierarchical convolutional features for visual tracking. In: ICCV (2015)
Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML (2010)
Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM Multimedia, pp. 675–678 (2014)
Google Scholar
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)
Article Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)
Google Scholar
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing coadaptation of feature detectors. arXiv:1207.0580 (2012)
Krause, M.S.J., Deng, J., Fei-Fei, L.: Collecting a large-scale dataset of fine-grained cars. In: The Second Workshop on Fine-Grained Visual Categorization (2013)
Google Scholar
Krause, J., Gebru, T., Deng, J., Li, L.-J., Fei-Fei, L.: Learning features and parts for fine-grained recognition. In: IEEE 22nd International Conference on Pattern Recognition (ICPR), pp. 26–33 (2014)
Google Scholar
Khosla, A., Jayadevaprakash, N., Yao, B., Fei-Fei, L.: Novel dataset for fine-grained image categorization. In: First Workshop on Fine-Grained Visual Categorization (FGVC), IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2011)
Google Scholar
Gavves, E., Fernando, B., Snoek, C.G., Smeulders, A.W., Tuytelaars, T.: Local alignments for fine-grained categorization. Int. J. Comput. Vision 111(2), 1–22 (2014)
Google Scholar
Chen, G., Yang, J., Jin, H., Shechtman, E., Brandt, J., Han, T.: Selective pooling vector for fine-grained recognition. In: 2015 IEEE Winter Conference on Applications of Computer Vision (WACV2015)
Google Scholar
Kanan, C.: Fine-grained object recognition with Gnostic fields. In: 2014 IEEE Winter Conference on Applications of Computer Vision (WACV2014)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grants 61471230, 61402277, the Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning, the Innovation Program of Shanghai Municipal Education Commission (15ZZ044), and the Open Project Program of the State Key Lab of CAD&CG (Grant No. 1507), Zhejiang University, the Project of Local Colleges’ and Universities’ Capacity Construction of Science and Technology Commission in Shanghai (Grant No. 15590501300). We also thank NVIDIA Corporation for their GPU donation.

Author information

Authors and Affiliations

School of Communication and Information Engineering, Shanghai University, Shanghai, 200444, China
Xiangyang Wang, Yusu Jin, Zhi Liu, Yadong Zhao & Xiaoqiang Zhu
School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, 201620, China
Juan Zhang

Authors

Xiangyang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yusu Jin
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yadong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoqiang Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Juan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiangyang Wang .

Editor information

Editors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Xiaokang Yang
Shanghai Jiao Tong University, Shanghai, China
Guangtao Zhai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, X., Jin, Y., Liu, Z., Zhao, Y., Zhu, X., Zhang, J. (2017). Multi-scale Deep Residual Networks for Fine-Grained Image Classification. In: Yang, X., Zhai, G. (eds) Digital TV and Wireless Multimedia Communication. IFTC 2016. Communications in Computer and Information Science, vol 685. Springer, Singapore. https://doi.org/10.1007/978-981-10-4211-9_21

Download citation

DOI: https://doi.org/10.1007/978-981-10-4211-9_21
Published: 12 March 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-4210-2
Online ISBN: 978-981-10-4211-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics