Skip to main content

Towards Fine-Grained Recognition: Joint Learning for Object Detection and Fine-Grained Classification

  • Conference paper
  • First Online:
Advances in Visual Computing (ISVC 2019)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11845))

Included in the following conference series:

Abstract

Fine-grained classification is a challenging problem due to subtle differences between intra-class categories. In practice, fine-grained classification is often used in conjunction with object detection algorithms to locate and identify object categories. Despite recent achievements in both fine-grained classification and object detection, few works have demonstrated datasets or solutions to simultaneously handle both tasks. We make two contributions to this problem. Firstly, we construct a fine-grained classification and detection benchmark. Secondly, we show an end-to-end convolutional neural networks (CNNs) architecture to detect and classify fine-grained objects. Experimental results verify that our networks perform favorably against alternatives.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sighthound cloud API for vehicle recognition. https://www.sighthound.com/products/cloud

  2. Tesseract open source OCR engine. https://github.com/tesseract-ocr/tesseract

  3. Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2010)

    Article  Google Scholar 

  4. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)

    Google Scholar 

  5. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  6. Huang, S., Xu, Z., Tao, D., Zhang, Y.: Part-stacked CNN for fine-grained visual categorization. In: CVPR, pp. 1173–1182 (2016)

    Google Scholar 

  7. Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: NIPS, pp. 2017–2025 (2015)

    Google Scholar 

  8. Kong, S., Fowlkes, C.: Low-rank bilinear pooling for fine-grained classification. In: CVPR, pp. 7025–7034. IEEE (2017)

    Google Scholar 

  9. Krause, J., Jin, H., Yang, J., Fei-Fei, L.: Fine-grained recognition without part annotations. In: CVPR, pp. 5546–5555 (2015)

    Google Scholar 

  10. Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: 3dRR, Sydney, Australia (2013)

    Google Scholar 

  11. Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)

    Google Scholar 

  12. Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., Sun, J.: Light-head R-CNN: in defense of two-stage object detector. arXiv preprint arXiv:1711.07264 (2017)

  13. Lin, T.Y., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: ICCV, pp. 1449–1457 (2015)

    Google Scholar 

  14. Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  15. Liu, X., Wang, J., Wen, S., Ding, E., Lin, Y.: Localizing by describing: attribute-guided attention localization for fine-grained recognition. In: AAAI, pp. 4190–4196 (2017)

    Google Scholar 

  16. Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)

    MATH  Google Scholar 

  17. Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Is object localization for free?-weakly-supervised learning with convolutional neural networks. In: CVPR, pp. 685–694 (2015)

    Google Scholar 

  18. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788 (2016)

    Google Scholar 

  19. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. arXiv preprint (2017)

    Google Scholar 

  20. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)

    Google Scholar 

  21. Simon, M., Rodner, E.: Neural activation constellations: unsupervised part model discovery with convolutional networks. In: ICCV, pp. 1143–1151 (2015)

    Google Scholar 

  22. Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks. arXiv preprint arXiv:1505.00387 (2015)

  23. Targ, S., Almeida, D., Lyman, K.: Resnet in resnet: generalizing residual architectures. arXiv preprint arXiv:1603.08029 (2016)

  24. Welinder, P., et al.: Caltech-UCSD Birds 200. Technical report, CNS-TR-2010-001, California Institute of Technology (2010)

    Google Scholar 

  25. Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y., Zhang, Z.: The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: CVPR, pp. 842–850 (2015)

    Google Scholar 

  26. Xie, S., Yang, T., Wang, X., Lin, Y.: Hyper-class augmented and regularized deep learning for fine-grained image classification. In: CVPR, pp. 2645–2654 (2015)

    Google Scholar 

  27. Yang, L., Luo, P., Change Loy, C., Tang, X.: A large-scale car dataset for fine-grained categorization and verification. In: CVPR, pp. 3973–3981 (2015)

    Google Scholar 

  28. Zhang, N., Donahue, J., Girshick, R., Darrell, T.: Part-based R-CNNs for fine-grained category detection. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 834–849. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_54

    Chapter  Google Scholar 

  29. Zhang, X., Zhou, F., Lin, Y., Zhang, S.: Embedding label structures for fine-grained feature representation. In: CVPR, pp. 1114–1123 (2016)

    Google Scholar 

  30. Zheng, H., Fu, J., Mei, T., Luo, J.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: ICCV, vol. 6 (2017)

    Google Scholar 

  31. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge in deep scene CNNs. arXiv preprint arXiv:1412.6856 (2014)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiaosong Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, Q., Rasmussen, C. (2019). Towards Fine-Grained Recognition: Joint Learning for Object Detection and Fine-Grained Classification. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2019. Lecture Notes in Computer Science(), vol 11845. Springer, Cham. https://doi.org/10.1007/978-3-030-33723-0_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-33723-0_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-33722-3

  • Online ISBN: 978-3-030-33723-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics