Skip to main content
Log in

Coordinate feature fusion networks for fine-grained image classification

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Learning feature representations from discriminative local features plays a key role in fine-grained classification, but many methods tend to focus only on salient features in images and ignore most latent features. Exploiting rich features between channels and spaces helps capture this difference. Based on this idea, this paper proposes a Coordinate Feature Fusion Network (CFFN), which can be modeled by the channel and spatial feature interactions of images. CFFN consists of Feature Enhancement and Suppression Modules (FESM) and Coordinate Feature Interaction Module (CFIM). FESM gets saliency factors by aggregating the most salient parts in spatial and channel features, which obtains salient features through feature mapping, and suppresses the obtained salient features to force the network to mine the remaining latent features. Through the saliency and latent feature modules, more discriminative features can be effectively captured. The CFIM module can explore feature correlations in images, and the model learns complementary features from related channels and spaces, resulting in stronger fine-grained features. Our model can be trained in an end-to-end manner and does not require bounding boxes. It achieves 89.5, 93.4 and 94.8% accuracy on three benchmark datasets CUB-200–2011, FGVC-Aircraft and Stanford Cars, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

Some or all data, models, or code generated or used during the study are available from the corresponding author by request.

References

  1. Krizhevsky Alex, Ilya Sutskever, and Geoffrey E. Hinton.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems. 25:1097–1105 (2012)

  2. Ren, Shaoqing, et al.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: International Conference on Neural Information Processing Systems. 28, 91–99 (2015)

  3. Long, Jonathan, Evan Shelhamer, and Trevor Darrell.: Fully convolutional networks for semantic segmentation. In: IEEE Transactions on Pattern Analysis and Machine Intelligence . 39(4), 640–651 (2015)

  4. Xie, Lingxi, et al.: Hierarchical part matching for fine-grained visual categorization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1641–1648 (2013)

  5. Chai, Y., Lempitsky, V. & Zisserman, A.: Symbiotic Segmentation and Part Localization for Fine-Grained Categorization. In: IEEE International Conference on Computer Vision, pp. 321–328 (2013). https://doi.org/10.1109/iccv.2013.47.

  6. Zhang, N., et al.: Part-based R-CNNs for fine-grained category detection. In: European Conference on Computer Vision, pp. 834–849. Springer, Cham (2014)

    Google Scholar 

  7. Zheng, Heliang, et al.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE international conference on computer vision, pp. 5209–5217(2017)

  8. Zhang, Xiaopeng, et al.: Picking deep filter responses for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1134–1142(2016)

  9. Zhao, B., Wu, X., Feng, J., Peng, Q., Yan, S.: Diversified visual attention networks for fine-grained object classification. IEEE Trans. Multimedia 19(6), 1245–1256 (2017)

    Article  Google Scholar 

  10. Liu, W., Anguelov, D., et al.: Ssd: Single shot multibox detector. In: European Conference on Computer Vision. Springer, Cham, pp. 21–37 (2016)

  11. Lin, Tsung-Yi, et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

  12. Cai, Sijia, Wangmeng Zuo, and Lei Zhang.: Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 511–520 (2017)

  13. Song, J. & Yang, R.: Feature Boosting, Suppression, and Diversification for Fine-Grained Visual Classification. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2021) https://doi.org/10.1109/ijcnn52387.2021.9534004

  14. Lin, T. Y., RoyChowdhury, A., & Maji, S.: Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp. 1449–1457 (2015)

  15. Gao, Yu, et al.: Channel interaction networks for fine-grained image categorization. In: Proceedings of the AAAI Conference on Artificial Intelligence. 34(07), 10818–10825 (2020)

  16. Diao Q, Jiang Y, Wen B, et al.: MetaFormer: A Unified Meta Framework for Fine-Grained Recognition. arXiv preprint arXiv:2203.02751. (2022)

  17. Hou, Q., Zhou, D., & Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722 (2021)

  18. Radenovic, F., Tolias, G. & Chum, O.: Fine-Tuning CNN Image Retrieval with No Human Annotation. IEEE Trans Pattern Anal Mach Intell. 41(7), 1655–1668 (2019). https://www.ncbi.nlm.nih.gov/pubmed/29994246

  19. Yang, Ze, et al.: Learning to navigate for fine-grained classification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 420–435 (2018)

  20. Zhuang, P., Wang, Y., & Qiao, Y.: Learning attentive pairwise interaction for fine-grained classification. In: Proceedings of the AAAI Conference on Artificial Intelligence. 34(07), 13130–13137 (2020)

  21. Zhou, M., Bai, Y., Zhang, W., Zhao, T., & Mei, T.: Look-into-object: Self-supervised structure modeling for object recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11774–11783 (2020)

  22. Tan, M., Yuan, F., Yu, J., Wang, G., Gu, X.: Fine-grained Image Classification via Multi-scale Selective Hierarchical Biquadratic Pooling. ACM Trans. Multimed. Comput. Commun. Appl. 18(1s), 1–23 (2022)

    Article  Google Scholar 

  23. Zhang, Y., et al.: MSEC: Multi-Scale Erasure and Confusion for fine-grained image classification. Neurocomputing 449, 1–14 (2021)

    Article  Google Scholar 

  24. Zhang, L., Huang, S., Liu, W.: Learning sequentially diversified representations for fine-grained categorization. Pattern Recognit. 121, 10821 (2022)

    Article  Google Scholar 

  25. Zhang, L., Huang, S., Liu, W., & Tao, D.: Learning a mixture of granularity-specific experts for fine-grained categorization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8331–8340 (2019)

  26. Ding, Y., Zhou, Y., Zhu, Y., Ye, Q., & Jiao, J.: Selective sparse sampling for fine-grained image recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6599–6608 (2019)

  27. Liu, Chuanbin, et al.: Filtration and distillation: Enhancing region attention for fine-grained visual categorization. In: Proceedings of the AAAI Conference on Artificial Intelligence. 34(07), 11555–11562 (2020)

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China Project No.52075435, 61771386 and Natural Science Foundation of Shaanxi Province No.2021JM-340

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kaiyang Liao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liao, K., Huang, G., Zheng, Y. et al. Coordinate feature fusion networks for fine-grained image classification. SIViP 17, 807–815 (2023). https://doi.org/10.1007/s11760-022-02291-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-022-02291-3

Keywords

Navigation