Skip to main content

Multi-scale Deep Residual Networks for Fine-Grained Image Classification

  • Conference paper
  • First Online:
Digital TV and Wireless Multimedia Communication (IFTC 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 685))

  • 1340 Accesses

Abstract

Fine-grained image classification aims at distinguishing very similar images, i.e., the subcategories in one class. Compared with generic object recognition, fine-grained image classification is much more challenging due to the small inter-class variance. Deep Residual Networks (ResNet) is a recently proposed deep Convolution Neural Networks (CNN) model, and has achieved the excellent performance on image classification. Though powerful, like other contemporary CNN models, ResNet only exploits the features extracted from the last output layer for classification, which may be insufficient for fine-grained classification. In this paper, we propose a Multi-scale Residual Networks (Multi-scale ResNet) to further improve the fine-grained image classification performance. Based on the ResNet model, we extract features from multiple CNN layers, add these high-level and mid-level features together with different weights for final classification. We compare our proposed model with some state-of-the-art models on two fine-grained image dataset, Stanford Cars and Dogs, and experimental results validate the efficacy of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Krause, J., Jin, H., Yang, J., Fei-Fei, L.: Fine-grained recognition without part annotations. In: CVPR (2015)

    Google Scholar 

  2. Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: 3D Representation and Recognition Workshop at ICCV (2013)

    Google Scholar 

  3. Yao, B., Khosla, A., Fei-Fei, L.: Combining randomization and discrimination for fine-grained image categorization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1577–1584 (2011)

    Google Scholar 

  4. Zhang, N., Farrell, R., Iandola, F., Darrell, T.: Deformable part descriptors for fine-grained recognition and attribute prediction. In: IEEE International Conference on Computer Vision (ICCV), pp. 729–736 (2013)

    Google Scholar 

  5. Yang, S., Bo, L., Wang, J., Shapiro, L.G.: Unsupervised template learning for fine-grained object recognition. In: Advances in Neural Information Processing Systems (NIPS), pp. 3122–3130 (2012)

    Google Scholar 

  6. Yang, L., Luo, P., Loy, C.C., Tang, X.: A large-scale car dataset for fine-grained categorization and verification. In: CVPR (2015)

    Google Scholar 

  7. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  8. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)

    Google Scholar 

  9. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)

    Google Scholar 

  10. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)

    Google Scholar 

  11. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015)

    Google Scholar 

  12. Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)

    Google Scholar 

  13. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)

    Google Scholar 

  14. Xie, S., Tu, Z.: Holistically-nested edge detection. In: ICCV (2015)

    Google Scholar 

  15. Lin, T.-Y., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: ICCV (2015)

    Google Scholar 

  16. Xie, S., Yang, T., Wang, X., Lin, Y.: Hyper-class augmented and regularized deep learning for fine-grained image classification. In: CVPR (2015)

    Google Scholar 

  17. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR 2016. arXiv:1512.03385. 10 December 2015

  18. Yang, S., Ramanan, D.: Multi-scale recognition with DAG-CNNs. In: ICCV (2015)

    Google Scholar 

  19. Li, G., Yu, Y.: Visual saliency based on multiscale deep features. In: CVPR (2015)

    Google Scholar 

  20. Ma, C., Huang, J.-B., Yang, X., Yang, M.-H.: Hierarchical convolutional features for visual tracking. In: ICCV (2015)

    Google Scholar 

  21. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML (2010)

    Google Scholar 

  22. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM Multimedia, pp. 675–678 (2014)

    Google Scholar 

  23. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)

    Article  Google Scholar 

  24. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)

    Google Scholar 

  25. Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing coadaptation of feature detectors. arXiv:1207.0580 (2012)

  26. Krause, M.S.J., Deng, J., Fei-Fei, L.: Collecting a large-scale dataset of fine-grained cars. In: The Second Workshop on Fine-Grained Visual Categorization (2013)

    Google Scholar 

  27. Krause, J., Gebru, T., Deng, J., Li, L.-J., Fei-Fei, L.: Learning features and parts for fine-grained recognition. In: IEEE 22nd International Conference on Pattern Recognition (ICPR), pp. 26–33 (2014)

    Google Scholar 

  28. Khosla, A., Jayadevaprakash, N., Yao, B., Fei-Fei, L.: Novel dataset for fine-grained image categorization. In: First Workshop on Fine-Grained Visual Categorization (FGVC), IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2011)

    Google Scholar 

  29. Gavves, E., Fernando, B., Snoek, C.G., Smeulders, A.W., Tuytelaars, T.: Local alignments for fine-grained categorization. Int. J. Comput. Vision 111(2), 1–22 (2014)

    Google Scholar 

  30. Chen, G., Yang, J., Jin, H., Shechtman, E., Brandt, J., Han, T.: Selective pooling vector for fine-grained recognition. In: 2015 IEEE Winter Conference on Applications of Computer Vision (WACV2015)

    Google Scholar 

  31. Kanan, C.: Fine-grained object recognition with Gnostic fields. In: 2014 IEEE Winter Conference on Applications of Computer Vision (WACV2014)

    Google Scholar 

  32. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grants 61471230, 61402277, the Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning, the Innovation Program of Shanghai Municipal Education Commission (15ZZ044), and the Open Project Program of the State Key Lab of CAD&CG (Grant No. 1507), Zhejiang University, the Project of Local Colleges’ and Universities’ Capacity Construction of Science and Technology Commission in Shanghai (Grant No. 15590501300). We also thank NVIDIA Corporation for their GPU donation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiangyang Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Wang, X., Jin, Y., Liu, Z., Zhao, Y., Zhu, X., Zhang, J. (2017). Multi-scale Deep Residual Networks for Fine-Grained Image Classification. In: Yang, X., Zhai, G. (eds) Digital TV and Wireless Multimedia Communication. IFTC 2016. Communications in Computer and Information Science, vol 685. Springer, Singapore. https://doi.org/10.1007/978-981-10-4211-9_21

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-4211-9_21

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-4210-2

  • Online ISBN: 978-981-10-4211-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics