Skip to main content
Log in

Fine-grained visual classification via multilayer bilinear pooling with object localization

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Fine-grained visual classification is a challenging task in the computer vision field. How to explore discriminative features is vital for classification. As one crucial step, exactly object localization is able to eliminate the background noises and highlight interesting objects at the same time. However, some current methods usually use bounding boxes to locate objects, that are not suitable when the poses of objects change. Furthermore, it has been demonstrated that deep features have strong feature representation capability, especially the bilinear pooling features, which achieved superior performance in fine-grained visual classification tasks. However, the bilinear features, which captured only from the last convolutional layer, have limited discriminability, especially when dealing with small-scale objects. In this paper, we propose a multilayer bilinear pooling model combined with object localization. First, a flexible and scalable object localization module is utilized to locate the interesting object in an image instead of using bounding boxes. Then the refined features are obtained by highlighting object region and suppressing background noises. While the multilayer bilinear pooling, which exploits the complementarity between different layers, is used for further extracting more discriminative features. Experiment results on three public datasets show that our proposed method can achieve competitive performance compared with several state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Wah C.B.S., Branson S., Welinder P., Perona P.: The Caltech-UCSD Birds-200-2011 Dataset, Computation and Neural Systems Technical Report 2011 (2011)

  2. Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision (2013)

  3. Maji, S., Kannala, J., Rahtu, E., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. Tech. rep. (2013)

  4. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: CVPR (2009)

  5. Wei, X.S., Wu, J., Cui, Q.: Deep learning for fine-grained image analysis: A survey. arXiv preprint arXiv:1907.03069 (2019)

  6. Reed, S., Akata, Z., Lee, H., Schiele, B.: Learning deep representations of fine-grained visual descriptions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2016)

  7. Sun, X., Chen, L., Yang, J.: Learning from web data using adversarial discriminative neural networks for fine-grained classification. In: Proceedings of the AAAI Conference on Artificial Intelligence (2019)

  8. Zhang, N., Donahue, J., Girshick, R., Darrell, T.: Part-based R-CNNs for fine-grained category detection. In: Lecture Notes in Computer Science (Including its Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014)

  9. Lin, D., Shen, X., Lu, C., Jia, J.: Deep LAC: deep localization, alignment and classification for fine-grained recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2015)

  10. Wei, X.S., Xie, C.W., Wu, J., Shen, C.: Mask-CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recognit. 76, 704–714 (2018)

    Article  Google Scholar 

  11. Liu, X., Xia, T., Wang, J., Yang, Y., Zhou, F., Lin, Y.: Fully convolutional attention networks for fine-grained recognition. arXiv preprint arXiv:1603.06765 (2016)

  12. Fu, J., Zheng, H., Mei, T.: Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of 30th IEEE Conference Computer Visible Pattern Recognition, CVPR 2017 (2017)

  13. Zheng, H., Fu, J., Mei, T., Luo, J.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE International Conference on Computer Vision (2017)

  14. Yao, H., Zhang, S., Zhang, Y., Li, J., Tian, Q.: Coarse-to-fine description for fine-grained visual categorization. IEEE Trans. Image Process. 25(10), 4858–4872 (2016)

    Article  MathSciNet  Google Scholar 

  15. Tuzel, O. Porikli, F., Meer, P.: Region covariance: a fast descriptor for detection and classification. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006)

  16. Lin, T.Y., Roychowdhury, A., Maji, S.: Bilinear convolutional neural networks for fine-grained visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1309–1322 (2018)

    Article  Google Scholar 

  17. Gao, Y., Beijbom, O., Zhang, N., Darrell, T.: Compact bilinear pooling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

  18. Kong, S., Fowlkes, C.: Low-rank bilinear pooling for fine-grained classification. In: Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 (2017)

  19. Liao, Q., Wang, D., Holewa, H., Xu, M.: Squeezed bilinear pooling for fine-grained visual categorization. In: Proceedings of the 2019 International Conference on Computer Vision Workshops. ICCVW 2019 (2019)

  20. Cui, Y., Zhou, F., Wang, J., Liu, X., Lin, Y., Belongie, S.: Kernel pooling for convolutional neural networks. In: Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 (2017)

  21. Sun, Q., Wang, Q., Zhang, J., Li, P.: Hyperlayer Bilinear Pooling with application to fine-grained categorization and image retrieval. Neurocomputing 282, 174–183 (2018)

    Article  Google Scholar 

  22. Yu, C., Zhao, X., Zheng, Q., Zhang, P., You, X.: Hierarchical bilinear pooling for fine-grained visual recognition. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018)

  23. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)

    Article  Google Scholar 

  24. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: single shot multibox detector. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016)

  25. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2016)

  26. Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: CenterNet: Keypoint triplets for object detection. In: Proceedings of the IEEE International Conference on Computer Vision (2019)

  27. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2016)

  28. Choe, J., Shim, H.: Attention-based dropout layer for weakly supervised object localization. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2019)

  29. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge in deep scene CNNs. In: 3rd International Conference on Learning Representations. ICLR 2015, Conference Track Proceedings (2015)

  30. Zhang, X., Xiong, H., Zhou, W., Lin, W., Tian, Q.: Picking neural activations for fine-grained recognition. IEEE Trans. Multimed 19(12), 2736–2750 (2017)

    Google Scholar 

  31. Wei, X.S., Luo, J.H., Wu, J., Zhou, Z.H.: Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Trans. Image Process. 26(6), 2868–2881 (2017)

  32. Kar, P., Karnick, H.: Random feature maps for dot product kernels. In: Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR, vol. 22, pp. 583–591 (2012)

  33. Pham, N., Pagh, R.: Fast and scalable polynomial kernels via explicit feature maps. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2013)

  34. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2016)

  35. Li, Y., Wang, N., Liu, J., Hou, X.: Factorized bilinear models for image recognition. In: Proceedings of the IEEE International Conference on Computer Vision (2017)

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (NSFC) under Grant 61971426.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lin Lei.

Ethics declarations

Conflict of interest

We declare that we have no conflict of interest with other people or organizations.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, M., Lei, L., Sun, H. et al. Fine-grained visual classification via multilayer bilinear pooling with object localization. Vis Comput 38, 811–820 (2022). https://doi.org/10.1007/s00371-020-02052-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-020-02052-8

Keywords

Navigation