Abstract
Learning feature representations from discriminative local features plays a key role in fine-grained classification, but many methods tend to focus only on salient features in images and ignore most latent features. Exploiting rich features between channels and spaces helps capture this difference. Based on this idea, this paper proposes a Coordinate Feature Fusion Network (CFFN), which can be modeled by the channel and spatial feature interactions of images. CFFN consists of Feature Enhancement and Suppression Modules (FESM) and Coordinate Feature Interaction Module (CFIM). FESM gets saliency factors by aggregating the most salient parts in spatial and channel features, which obtains salient features through feature mapping, and suppresses the obtained salient features to force the network to mine the remaining latent features. Through the saliency and latent feature modules, more discriminative features can be effectively captured. The CFIM module can explore feature correlations in images, and the model learns complementary features from related channels and spaces, resulting in stronger fine-grained features. Our model can be trained in an end-to-end manner and does not require bounding boxes. It achieves 89.5, 93.4 and 94.8% accuracy on three benchmark datasets CUB-200–2011, FGVC-Aircraft and Stanford Cars, respectively.
Similar content being viewed by others
Data availability
Some or all data, models, or code generated or used during the study are available from the corresponding author by request.
References
Krizhevsky Alex, Ilya Sutskever, and Geoffrey E. Hinton.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems. 25:1097–1105 (2012)
Ren, Shaoqing, et al.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: International Conference on Neural Information Processing Systems. 28, 91–99 (2015)
Long, Jonathan, Evan Shelhamer, and Trevor Darrell.: Fully convolutional networks for semantic segmentation. In: IEEE Transactions on Pattern Analysis and Machine Intelligence . 39(4), 640–651 (2015)
Xie, Lingxi, et al.: Hierarchical part matching for fine-grained visual categorization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1641–1648 (2013)
Chai, Y., Lempitsky, V. & Zisserman, A.: Symbiotic Segmentation and Part Localization for Fine-Grained Categorization. In: IEEE International Conference on Computer Vision, pp. 321–328 (2013). https://doi.org/10.1109/iccv.2013.47.
Zhang, N., et al.: Part-based R-CNNs for fine-grained category detection. In: European Conference on Computer Vision, pp. 834–849. Springer, Cham (2014)
Zheng, Heliang, et al.: Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE international conference on computer vision, pp. 5209–5217(2017)
Zhang, Xiaopeng, et al.: Picking deep filter responses for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1134–1142(2016)
Zhao, B., Wu, X., Feng, J., Peng, Q., Yan, S.: Diversified visual attention networks for fine-grained object classification. IEEE Trans. Multimedia 19(6), 1245–1256 (2017)
Liu, W., Anguelov, D., et al.: Ssd: Single shot multibox detector. In: European Conference on Computer Vision. Springer, Cham, pp. 21–37 (2016)
Lin, Tsung-Yi, et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Cai, Sijia, Wangmeng Zuo, and Lei Zhang.: Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 511–520 (2017)
Song, J. & Yang, R.: Feature Boosting, Suppression, and Diversification for Fine-Grained Visual Classification. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2021) https://doi.org/10.1109/ijcnn52387.2021.9534004
Lin, T. Y., RoyChowdhury, A., & Maji, S.: Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp. 1449–1457 (2015)
Gao, Yu, et al.: Channel interaction networks for fine-grained image categorization. In: Proceedings of the AAAI Conference on Artificial Intelligence. 34(07), 10818–10825 (2020)
Diao Q, Jiang Y, Wen B, et al.: MetaFormer: A Unified Meta Framework for Fine-Grained Recognition. arXiv preprint arXiv:2203.02751. (2022)
Hou, Q., Zhou, D., & Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722 (2021)
Radenovic, F., Tolias, G. & Chum, O.: Fine-Tuning CNN Image Retrieval with No Human Annotation. IEEE Trans Pattern Anal Mach Intell. 41(7), 1655–1668 (2019). https://www.ncbi.nlm.nih.gov/pubmed/29994246
Yang, Ze, et al.: Learning to navigate for fine-grained classification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 420–435 (2018)
Zhuang, P., Wang, Y., & Qiao, Y.: Learning attentive pairwise interaction for fine-grained classification. In: Proceedings of the AAAI Conference on Artificial Intelligence. 34(07), 13130–13137 (2020)
Zhou, M., Bai, Y., Zhang, W., Zhao, T., & Mei, T.: Look-into-object: Self-supervised structure modeling for object recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11774–11783 (2020)
Tan, M., Yuan, F., Yu, J., Wang, G., Gu, X.: Fine-grained Image Classification via Multi-scale Selective Hierarchical Biquadratic Pooling. ACM Trans. Multimed. Comput. Commun. Appl. 18(1s), 1–23 (2022)
Zhang, Y., et al.: MSEC: Multi-Scale Erasure and Confusion for fine-grained image classification. Neurocomputing 449, 1–14 (2021)
Zhang, L., Huang, S., Liu, W.: Learning sequentially diversified representations for fine-grained categorization. Pattern Recognit. 121, 10821 (2022)
Zhang, L., Huang, S., Liu, W., & Tao, D.: Learning a mixture of granularity-specific experts for fine-grained categorization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8331–8340 (2019)
Ding, Y., Zhou, Y., Zhu, Y., Ye, Q., & Jiao, J.: Selective sparse sampling for fine-grained image recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6599–6608 (2019)
Liu, Chuanbin, et al.: Filtration and distillation: Enhancing region attention for fine-grained visual categorization. In: Proceedings of the AAAI Conference on Artificial Intelligence. 34(07), 11555–11562 (2020)
Acknowledgements
This work is supported by the National Natural Science Foundation of China Project No.52075435, 61771386 and Natural Science Foundation of Shaanxi Province No.2021JM-340
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liao, K., Huang, G., Zheng, Y. et al. Coordinate feature fusion networks for fine-grained image classification. SIViP 17, 807–815 (2023). https://doi.org/10.1007/s11760-022-02291-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-022-02291-3