Skip to main content

Advertisement

Log in

Accurate Fine-Grained Object Recognition with Structure-Driven Relation Graph Networks

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Fine-grained object recognition (FGOR) aims to learn discriminative features that can identify the subtle distinctions between visually similar objects. However, less effort has been devoted to overcoming the impact of object’s personalized differences, e.g., varying posture or perspective. We argue that the personalized differences could decline the network’s perception of discriminative features, thus discarding some discriminative clues and degrading the FGOR performance accordingly. This motivates us to explore the intrinsic structure knowledge: the fixed spatial correlation between object parts, and thus apply this knowledge to associate diverse semantic parts and recover the missing discriminative details caused by the personalized differences accordingly. In this paper, we propose an end-to-end Structure-driven Relation Graph Network (SRGN) for fine-grained object recognition, and target at exploring and exploiting the object structure information without any additional annotations to associate diverse semantic parts, making the network sensitive to discriminative details influenced by personalized differences. Specifically, the core of SRGN is a Structure-aware Axial Graph (SAG) module, which first infers the structure embedding by establishing the correlation between position information and visual features along the axial direction, and then applies this embedding as aggregation weights to emphasize each discriminative representation by weighted reassembling all relevant features to it. Additionally, our SAG can be readily extensible to a multi-graph schema, that leverages the complementary advantages of different structure embeddings between the position information and visual content, further improving SAG. In this way, our SRGN can demonstrate remarkable robustness in scenarios characterized by extreme distribution perturbations, ultimately leading to superior performance. Extensive experiments and explainable visualizations validate the efficacy of the proposed approach on widely-used fine-grained benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data Availability

The datasets utilized during the current study are available in the internet repository: CUB-200-2011 (Branson et al., 2014): http://www.vision.caltech.edu/datasets/cub_200_2011/. Stanford Cars (Krause et al., 2013): https://ai.stanford.edu/~jkrause/cars/car_dataset.html. FGVC Aircraft (Maji et al., 2013): https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/. iNaturalist2017 (Horn et al., 2018): https://github.com/visipedia/inat_comp/tree/master/2017. Market1501 (Bai et al., 2017): https://drive.google.com/file/d/0B8-rUzbwVRk0c054eEozWG9COHM/view?usp=sharing. DuKeMTMC-reID (Ristani et al., 2016): https://drive.google.com/open?id=1jjE85dRCMOgRtvJ5RQV9-Afs-2_5dY3O. MSM T17 (Wei et al., 2018): https://pan.baidu.com/s/19-cKxL_UVKNHc7kqqp0GVg password: yf3z. Veri-776 (Liu et al., 2016): https://vehiclereid.github.io/VeRi/. VehicleID (Liu et al., 2016): https://www.pkuml.org/resources/pku-vehicleid.html. VERI-Wild (Lou et al., 2019): https://github.com/PKU-IMRE/VERI-Wild. MS1MV2 (Guo et al., 2016): https://www.dropbox.com/s/wpx6tqjf0y5mf6r/faces_ms1m-refine-v2_112x112.zip?dl=0. IJB-B (Whitelam et al., 2017) & IJB-C(Maze et al., 2018): https://drive.google.com/file/d/1aC4zf2Bn0xCVH_ZtEuQipR2JvRb1bf8o/view?usp=sharing.

References

  • Bai, S., Bai, X., & Tian, Q. (2017). Scalable person re-identification on supervised smoothed manifold. In: 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, pp. 3356–3365.

  • Bai, Y., Liu, J., Lou, Y., Wang, C., & Duan, L. (2022). Disentangled feature learning network and a comprehensive benchmark for vehicle re-identification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10), 6854–6871.

    Article  Google Scholar 

  • Bai, Y., Lou, Y., Gao, F., Wang, S., Wu, Y., & Duan, L. (2018). Group-sensitive triplet embedding for vehicle reidentification. IEEE Transactions on Multimedia, 20(9), 2385–2399.

    Article  Google Scholar 

  • Bera, A., Wharton, Z., Liu, Y., Bessis, N., & Behera, A. (2022). SR-GNN: spatial relation-aware graph neural network for fine-grained image categorization. IEEE Transactions on Image Processing, 31, 6017–6031.

    Article  Google Scholar 

  • Branson, S., Horn, G.V., Belongie, S.J., Perona, P. (2014). Bird species categorization using pose normalized deep convolutional nets. CoRR:1406.2952

  • Cai, S., Zuo, W., & Zhang, L. (2017). Higher-order integration of hierarchical convolutional activations for fine-grained visual categorization. In: ICCV 2017, Venice, Italy, October 22–29, pp. 511–520

  • Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S. (2020). End-to-end object detection with transformers. In: Computer vision—ECCV 2020—16th European Conference, Glasgow, UK, August 23–28, Proceedings, Part I, pp. 213–229.

  • Chen, Y., Bai, Y., Zhang, W., & Mei, T. (2019). Destruction and construction learning for fine-grained image recognition. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, pp. 5157–5166.

  • Chen, B., Deng, W., & Hu, J. (2019). Mixed high-order attention network for person re-identification. In: 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, pp. 371–381.

  • Chen, T., Lin, L., Chen, R., Wu, Y., & Luo, X. (2018). Knowledge-embedded representation learning for fine-grained image recognition. In: Lang, J. (ed.) Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI 2018, July 13–19, Stockholm, Sweden, pp. 627–634.

  • Chen, G., Lin, C., Ren, L., Lu, J., & Zhou, J. (2019). Self-critical attention learning for person re-identification. In: 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, pp. 9636–9645.

  • Choi, S., Kim, J.T., & Choo, J. (2020). Cars can’t fly up in the sky: Improving urban-scene segmentation via height-driven attention networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2020, Seattle, WA, USA, June 13–9, pp. 9370–9380.

  • Chu, R., Sun, Y., Li, Y., Liu, Z., Zhang, C., & Wei, Y. (2019) Vehicle re-identification with viewpoint-aware metric learning. In: 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, pp. 8281–8290.

  • Cui, Y., Zhou, F., Wang, J., Liu, X., Lin, Y., & Belongie, S.J. (2017). Kernel pooling for convolutional neural networks. In: CVPR 2017, Honolulu, HI, USA, July 21–26, pp. 3049–3058

  • Deng, J., Guo, J., Xue, N., & Zafeiriou, S. (2019). Arcface: Additive angular margin loss for deep face recognition. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, pp. 4690–4699.

  • Dereli, O., Oguz, C., & Gönen, M. (2019). A multitask multiple kernel learning algorithm for survival analysis with application to cancer biology. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th international conference on machine learning, ICML 2019, 9–15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 1576–1585.

  • Diao, Q., Jiang, Y., Wen, B., Sun, J., & Yuan, Z. (2022). Metaformer: A unified meta framework for fine-grained recognition. CoRR:2203.02751

  • Ding, Y., Zhou, Y., Zhu, Y., Ye, Q., & Jiao, J. (2019). Selective sparse sampling for fine-grained image recognition. In: The IEEE international conference on computer vision (ICCV).

  • Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. In: 9th International conference on learning representations, ICLR 2021, Virtual Event, Austria, May 3–7.

  • Dubey, A., Gupta, O., Raskar, R., & Naik, N. (2018) Maximum-entropy fine grained classification. In: Bengio, S., Wallach, H.M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in neural information processing systems 31: Annual conference on neural information processing systems 2018, NeurIPS 2018, December 3–8, 2018, Montréal, Canada, pp. 635–645.

  • Fang, P., Zhou, J., Roy, S.K., Petersson, L., & Harandi, M. (2019). Bilinear attention networks for person retrieval. In: 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, pp. 8029–8038.

  • Gao, Y., Beijbom, O., Zhang, N., & Darrell, T. (2016). Compact bilinear pooling. In: CVPR 2016, Las Vegas, NV, USA, June 27–30, pp. 317–326.

  • Ge, W., Lin, X., Yu, Y. (2019). Weakly supervised complementary parts models for fine-grained image classification from the bottom up. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, pp. 3034–3043.

  • Guo, Y., Zhang, L., Hu, Y., He, X., & Gao, J. (2016). Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016—14th European conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III. Lecture Notes in Computer Science, vol. 9907, pp. 87–102.

  • He, J., Chen, J., Liu, S., Kortylewski, A., Yang, C., Bai, Y., & Wang, C. (2022). Transfg: A transformer architecture for fine-grained recognition. In: Thirty-Sixth AAAI conference on artificial intelligence, AAAI 2022, thirty-fourth conference on innovative applications of artificial intelligence, IAAI 2022, the twelveth symposium on educational advances in artificial intelligence, EAAI 2022 virtual event, February 22–March 1, 2022, pp. 852–860.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: CVPR 2016, Las Vegas, NV, USA, June 27–30, pp. 770–778.

  • He, X., Peng, Y., & Zhao, J. (2019). Which and how many regions to gaze: Focus discriminative regions for fine-grained visual categorization. International Journal of Computer Vision, 127(9), 1235–1255.

    Article  Google Scholar 

  • Horn, G.V., Aodha, O.M., Song, Y., Cui, Y., Sun, C., Shepard, A., Adam, H., Perona, P., & Belongie, S.J. (2018). The inaturalist species classification and detection dataset. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, pp. 8769–8778.

  • Hou, R., Ma, B., Chang, H., Gu, X., Shan, S., & Chen, X. (2019) Interaction-and-aggregation network for person re-identification. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, 2019, pp. 9317–9326.

  • Hu, Y., Jin, X., Zhang, Y., Hong, H., Zhang, J., He, Y., & Xue, H. (2021). Rams-trans: Recurrent attention multi-scale transformer for fine-grained image recognition. In: Shen, H.T., Zhuang, Y., Smith, J.R., Yang, Y., César, P., Metze, F., Prabhakaran, B. (eds.) MM ’21: ACM multimedia conference, Virtual Event, China, October 20–24, pp. 4239–4248.

  • Hu, L., Zhang, P., Zhang, B., Pan, P., Xu, Y., & Jin, R. (2021). Learning position and target consistency for memory-based video object segmentation. In: IEEE conference on computer vision and pattern recognition, CVPR 2021, Virtual, June 19–25, pp 4144–4154.

  • Huang, Z., & Li, Y. (2020). Interpretable and accurate fine-grained recognition via region grouping. In: 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 8659–8669

  • Huang, S., Wang, X., & Tao, D. (2021). Stochastic partial swap: Enhanced model generalization and interpretability for fine-grained recognition. In: 2021 IEEE/CVF international conference on computer vision, ICCV 2021, Montreal, QC, Canada, October 10–17, pp. 600–609.

  • Huang, Y., Wang, Y., Tai, Y., Liu, X., Shen, P., Li, S., Li, J., & Huang, F. (2020). Curricularface: Adaptive curriculum learning loss for deep face recognition. In: 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2020, Seattle, WA, USA, June 13–19, pp. 5900–5909.

  • Huang, S., Xu, Z., Tao, D., & Zhang, Y. (2016) Part-stacked CNN for fine-grained visual categorization. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, pp. 1173–1182

  • Ji, R., Wen, L., Zhang, L., Du, D., Wu, Y., Zhao, C., Liu, X., & Huang, F. (2020). Attention convolutional binary neural tree for fine-grained visual categorization. In: 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 10465–10474

  • Jin, X., Lan, C., Zeng, W., & Chen, Z. (2020). Uncertainty-aware multi-shot knowledge distillation for image-based object re-identification. In: The thirty-fourth AAAI conference on artificial intelligence, AAAI2020, the thirty-second innovative applications of artificial intelligence conference, IAAI 2020, the tenth AAAI symposium on educational advances in artificial intelligence, EAAI 2020, New York, NY, USA, February 7–12, 2020, pp. 11165–11172.

  • Joung, S., Kim, S., Kim, M., Kim, I., & Sohn, K. (2021). Learning canonical 3d object representation for fine-grained recognition. In: 2021 IEEE/CVF international conference on computer vision, ICCV 2021, Montreal, QC, Canada, October 10–17, pp. 1015–1025.

  • Kalayeh, M.M., Basaran, E., Gökmen, M., Kamasak, M.E., & Shah, M. (2018). Human semantic parsing for person re-identification. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, pp. 1062–1071

  • Khorramshahi, P., Kumar, A., Peri, N., Rambhatla, S.S., Chen, J., & Chellappa, R (2019): A dual path modelwith adaptive attention for vehicle re-identification. CoRR:1905.03397

  • Khorramshahi, P., Kumar, A., Peri, N., Rambhatla, S.S., Chen, J., & Chellappa, R. (2019) A dual-path model with adaptive attention for vehicle re-identification. In: 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, pp. 6131–6140.

  • Khorramshahi, P., Peri, N., Chen, J., & Chellappa, R. (2020). The devil is in the details: Self-supervised attention for vehicle re-identification. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J. (eds.) Computer vision—ECCV 2020—16th European conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XIV. Lecture Notes in Computer Science, vol. 12359, pp. 369–386.

  • Kim, M., Jain, A.K., & Liu, X. (2022). Adaface: Quality adaptive margin for face recognition. In: IEEE/CVF conference on computer vision and pattern recognition, CVPR 2022, New Orleans, LA, USA, June 18–24, pp. 18729–18738.

  • Kim, Y., Park, W., & Shin, J. (2020). Broadface: Looking at tens of thousands of people at once for face recognition. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J. (eds.) Computer vision—ECCV 2020—16th European conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part IX. Lecture Notes in Computer Science, vol. 12354, pp. 536–552.

  • Kong, S., & Fowlkes, C.C. (2017). Low-rank bilinear pooling for fine-grained classification. In: CVPR 2017, Honolulu, HI, USA, July 21–26, pp. 7025–7034.

  • Krause, J., Jin, H., Yang, J., & Li, F. (2015). Fine-grained recognition without part annotations. In: CVPR 2015, Boston, MA, USA, June 7–12, pp. 5546–5555

  • Krause, J., Stark, M., Deng, J., & Fei-Fei, L. (2013). 3d object representations for fine-grained categorization. In: ICCV Workshops 2013, Sydney, Australia, December 1–8, pp. 554–561

  • Kumar, R., Weill, E., Aghdasi, F., & Sriram, P. (2019). Vehicle re-identification: an efficient baseline using triplet embedding. In: International joint conference on neural networks, IJCNN 2019 Budapest, Hungary, July 14–19, pp. 1–9.

  • Li, S., Xu, J., Xu, X., Shen, P., Li, S., & Hooi, B. (2021) Spherical confidence learning for face recognition. In: IEEE conference on computer vision and pattern recognition, CVPR 2021, Virtual, June 19–25, pp. 15629–15637.

  • Lin, T., Roy Chowdhury, A., & Maji, S. (2015). Bilinear CNN models for fine-grained visual recognition. In: ICCV 2015, Santiago, Chile, December 7–13, pp. 1449–1457.

  • Liu, X., Liu, W., Mei, T., & Ma, H. (2016). A deep learning-based approach to progressive vehicle re-identification for urban surveillance. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016—14th European conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II. Lecture Notes in Computer Science, vol. 9906, pp. 869–884.

  • Liu, H., Tian, Y., Wang, Y., Pang, L., & Huang, T. (2016). Deep relative distance learning: Tell the difference between similar vehicles. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, pp. 2167–2175.

  • Liu, J., Wu, Y., Wu, Y., Li, C., Hu, X., Liang, D., & Wang, M. (2021). DAM: discrepancy alignment metric for face recognition. In: 2021 IEEE/CVF international conference on computer vision, ICCV2021, Montreal, QC, Canada, October 10-17, pp. 3794–3803.

  • Liu, Y., Zhou, L., Zhang, P., Bai, X., Gu, L., Yu, X., Zhou, J., Hancock, E.R. (2022). Where to focus: Investigating hierarchical attention relationship for fine-grained visual classification. In: Avidan, S., Brostow, G.J., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision—ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23-27, Proceedings, Part XXIV. Lecture Notes in Computer Science, vol. 13684, pp. 57–73.

  • Lou, Y., Bai, Y., Liu, J., Wang, S., & Duan, L. (2019). Veri-wild: A large dataset and a new method for vehicle re-identification in the wild. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16–20, pp. 3235–3243.

  • Lou, Y., Bai, Y., Liu, J., Wang, S., & Duan, L. (2019). Embedding adversarial learning for vehicle re-identification. IEEE Transactions on Image Processing, 28(8), 3794–3807.

    Article  MathSciNet  Google Scholar 

  • Luo, C., Chen, Y., Wang, N., & Zhang, Z. (2019). Spectral feature transformation for person re-identification. In: 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, pp. 4975–4984.

  • Maji, S., Rahtu, E., Kannala, J., Blaschko, M.B., & Vedaldi, A. (2013). Fine-grained visual classification of aircraft. CoRR:1306.5151

  • Maze, B., Adams, J.C., Duncan, J.A., Kalka, N.D., Miller, T., Otto, C., Jain, A.K., Niggel, W.T., Anderson, J., Cheney, J., & Grother, P. (2018). IARPA janus benchmark-C: Face dataset and protocol. In: 2018 International conference on biometrics, ICB 2018, Gold Coast, Australia, February 20–23, pp. 158–165.

  • Meng, D., Li, L., Liu, X., Li, Y., Yang, S., Zha, Z., Gao, X., Wang, S., & Huang, Q. (2020). Parsing-based view-aware embedding network for vehicle re-identification. In: 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2020, Seattle, WA, USA, June 13–19, pp. 7101–7110.

  • Meng, Q., Zhao, S., Huang, Z., & Zhou, F. (2021). Magface: A universal representation for face recognition and quality assessment. In: IEEE conference on computer vision and pattern recognition, CVPR 2021, Virtual, June 19–25, pp. 14225–14234.

  • Peng, Y., He, X., & Zhao, J. (2018). Object-part attention model for fine-grained image classification. TIP, 27(3), 1487–1500.

    MathSciNet  Google Scholar 

  • Quan, R., Dong, X., Wu, Y., Zhu, L., & Yang, Y. (2019). Auto-reid: Searching for a part-aware convnet for person re-identification. In: 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, pp. 3749–3758.

  • Rao, Y., Chen, G., Lu, J., & Zhou, J. (2021). Counterfactual attention learning for fine-grained visual categorization and re-identification. In: 2021 IEEE/CVF international conference on computer vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pp. 1005–1014.

  • Recasens, A., Kellnhofer, P., Stent, S., Matusik, W., & Torralba, A. (2018). Learning to zoom: A saliency-based sampling layer for neural networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer vision—ECCV 2018—15th European conference, Munich, Germany, September 8–14, Proceedings, Part IX. Lecture Notes in Computer Science, vol. 11213, pp. 52–67.

  • Ristani, E., Solera, F., Zou, R.S., Cucchiara, R., & Tomasi, C. (2016). Performance measures and a data set for multi-target, multi-camera tracking. In: Hua, G., Jégou, H. (eds.) Computer Vision—ECCV 2016 Workshops—Amsterdam, The Netherlands, October 8–10 and 15–16, Proceedings, Part II. Lecture Notes in Computer Science, vol. 9914, pp. 17–35.

  • Saunders, B., Camgöz, N. C., & Bowden, R. (2021). Continuous 3d multi-channel sign language production via progressive transformers and mixture density networks. International Journal of Computer Vision, 129(7), 2113–2135.

    Article  Google Scholar 

  • Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. In: IEEE international conference on computer vision, ICCV 2017, Venice, Italy, October 22–29, pp. 618–626.

  • Shen, F., Zhu, J., Zhu, X., Xie, Y., & Huang, J. (2022). Exploring spatial significance via hybrid pyramidal graph network for vehicle re-identification. IEEE Transactions on Intelligent Transportation Systems, 23(7), 8793–8804.

    Article  Google Scholar 

  • Shi, Y., Yu, X., Sohn, K., Chandraker, M., & Jain, A.K. (2020). Towards universal representation learning for deep face recognition. In: 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2020, Seattle, WA, USA, June 13–19, pp. 6816–6825.

  • Shin, A., Ishii, M., & Narihira, T. (2022). Perspectives and prospects on transformer architecture for cross-modal tasks with language and vision. International Journal of Computer Vision, 130(2), 435–454.

    Article  Google Scholar 

  • Sun, Y., Cheng, C., Zhang, Y., Zhang, C., Zheng, L., Wang, Z., & Wei, Y. (2020). Circle loss: A unified perspective of pair similarity optimization. In: 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 6397–6406.

  • Sun, H., He, X., & Peng, Y. (2022). Sim-trans: Structure information modeling transformer for fine-grained visual categorization. In: Magalhães, J., Bimbo, A.D., Satoh, S., Sebe, N., Alameda-Pineda, X., Jin, Q., Oria, V., Toni, L. (eds.) MM ’22: The 30th ACM international conference on multimedia, Lisboa, Portugal, October 10–14, pp. 5853–5861.

  • Sun, M., Yuan, Y., Zhou, F., & Ding, E. (2018) Multi-attention multi-class constraint for fine-grained image recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XVI. Lecture Notes in Computer Science, vol. 11220, pp. 834–850.

  • Sun, Y., Zheng, L., Yang, Y., Tian, Q., & Wang, S. (2018). Beyond part models: Person retrieval with refined part pooling (and A strong convolutional baseline). In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision—ECCV 2018—15th European conference, Munich, Germany, September 8–14, 2018, Proceedings, Part IV. Lecture Notes in Computer Science, vol. 11208, pp. 501–518.

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In: Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R. (eds.) Advances in neural information processing systems 30: Annual conference on neural information processing systems 2017, December 4–9, Long Beach, CA, USA, pp. 5998–6008.

  • Wang, S., Chang, J., Li, H., Wang, Z., Ouyang, W., & Tian, Q. (2023). Open-set fine-grained retrieval via prompting vision-language evaluator. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 19381–19391

  • Wang, S., Li, H., Wang, Z., & Ouyang, W. (2021). Dynamic position-aware network for fine-grained image recognition. In: Thirty-fifth AAAI conference on artificial intelligence, AAAI 2021, thirty-third conference on innovative applications of artificial intelligence, IAAI 2021, the eleventh symposium on educational advances in artificial intelligence, EAAI 2021, virtual event, February 2–9, 2021, pp. 2791–2799

  • Wang, S., Wang, Z., Li, H., & Ouyang, W. (2020). Category-specific semantic coherency learning for fine-grained image recognition. In: MM ’20: The 28th ACM international conference on multimedia, virtual event/seattle, WA, USA, October 12–16, pp. 174–183.

  • Wang, S., Wang, Z., Li, H., & Ouyang, W. (2022) Category-specific nuance exploration network for fine-grained object retrieval. In: Thirty-sixth AAAI conference on artificial intelligence, AAAI 2022, thirty-fourth conference on innovative applications of artificial intelligence, IAAI 2022, the twelveth symposium on educational advances in artificial intelligence, EAAI 2022 virtual event, February 22-March 1, pp. 2513–2521.

  • Wang, S., Wang, Z., Li, H., Chang, J., Ouyang, W., & Tian, Q. (2023) Semantic-guided information alignment network for fine-grained image recognition. IEEE Transactions on Circuits and Systems for Video Technology.

  • Wang, Z., Wang, S., Li, H., Dou, Z., & Li, J. (2020). Graph-propagation based correlation learning for weakly supervised fine-grained image classification. In: The thirty-fourth AAAI conference on artificial intelligence, AAAI 2020, the thirty-second innovative applications of artificial intelligence conference, IAAI 2020, the tenth AAAI symposium on educational advances in artificial intelligence, EAAI 2020, New York, NY, USA, February 7–12, pp. 12289–12296.

  • Wang, Z., Wang, S., Yang, S., Li, H., Li, J., & Li, Z. (2020). Weakly supervised fine-grained image classification via guassian mixture model oriented discriminative learning. In: 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2020, Seattle, WA, USA, June 13–19, pp. 9746–9755

  • Wang, Z., Wang, S., Zhang, P., Li, H., Zhong, W., & Li, J. (2019). Weakly supervised fine-grained image classification via correlation-guided discriminative learning. In: Proceedings of the 27th ACM international conference on multimedia, MM 2019, Nice, France, October 21–25, pp. 1851–1860.

  • Wang, X., Zhang, S., Wang, S., Fu, T., Shi, H., & Mei, T. (2020). Mis-classified vector guided softmax loss for face recognition. In: The thirty-fourth AAAI conference on artificial intelligence, AAAI 2020, the thirty-second innovative applications of artificial intelligence conference, IAAI 2020, the tenth AAAI symposium on educational advances in artificial intelligence, EAAI 2020, New York, NY, USA, February 7–12, pp. 12241–12248.

  • Wei, X., Xie, C., Wu, J. (2016). Mask-cnn: Localizing parts and selecting descriptors for fine-grained image recognition. CoRR:1605.06878

  • Wei, L., Zhang, S., Gao, W., & Tian, Q. (2018). Person transfer GAN to bridge domain gap for person re-identification. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, pp. 79–88.

  • Wei, X., Luo, J., Wu, J., & Zhou, Z. (2017). Selective convolutional descriptor aggregation for fine-grained image retrieval. TIP, 26(6), 2868–2881.

    MathSciNet  Google Scholar 

  • Whitelam, C., Taborsky, E., Blanton, A., Maze, B., Adams, J.C., Miller, T., Kalka, N.D., Jain, A.K., Duncan, J.A., Allen, K., Cheney, J., & Grother, P. (2017). IARPA janus benchmark-b face dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2017, Honolulu, HI, USA, July 21–26, pp. 592–600.

  • Xu, K., Lai, R., Gu, L., & Li, Y. (2021). Multiresolution discriminative mixup network for fine-grained visual categorization. IEEE Transactions on Neural Networks and Learning Systems, pp. 1–13.

  • Yang, Z., Luo, T., Wang, D., Hu, Z., Gao, J., & Wang, L. (2018). Learning to navigate for fine-grained classification. In: ECCV, Germany, September 8-14, 2018, Proceedings, Part XIV, pp. 438–454.

  • Yang, X., Wang, Y., Chen, K., Xu, Y., & Tian, Y. (2022). Fine-grained object classification via self-supervised pose alignment. In: IEEE/CVF conference on computer vision and pattern recognition, CVPR 2022, New Orleans, LA, USA, June 18–24, pp. 7389–7398.

  • Yao, H., Zhang, S., Yan, C., Zhang, Y., Li, J., & Tian, Q. (2018). Autobd: Automated bi-level description for scalable fine-grained visual categorization. TIP, 27(1), 10–23.

    MathSciNet  Google Scholar 

  • Yue, X., Kuang, Z., Lin, C., Sun, H., & Zhang, W. (2020). Robustscanner: Dynamically enhancing positional clues for robust text recognition. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J. (eds.) Computer vision—ECCV 2020—16th European conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XIX. Lecture Notes in Computer Science, vol. 12364, pp. 135–151.

  • Zhang, L., Huang, S., Liu, W., Tao, D. (2019). Learning a mixture of granularity-specific experts for fine-grained categorization. In: 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, pp. 8330–8339.

  • Zhang, X., Xiong, H., Zhou, W., Lin, W., & Tian, Q. (2016). Picking deep filter responses for fine-grained image recognition. In: CVPR 2016, Las Vegas, NV, USA, June 27–30, pp. 1134–1142.

  • Zhao, Y., Yan, K., Huang, F., & Li, J. (2021). Graph-based high-order relation discovery for fine-grained recognition. In: IEEE conference on computer vision and pattern recognition, CVPR 2021, Virtual, June 19–25, pp. 15079–15088.

  • Zhao, Y., Li, J., Chen, X., & Tian, Y. (2021). Part-guided relational transformers for fine-grained visual recognition. IEEE Transactions on Image Processing, 30, 9470–9481.

    Article  Google Scholar 

  • Zheng, H., Fu, J., Zha, Z., & Luo, J. (2019). Learning deep bilinear transformation for fine-grained image representation. In H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, & R. Garnett (Eds.), Advances in neural information processing systems 32: Annual conference on neural information processing systems 2019, NeurIPS 2019, 8–14 December 2019 (pp. 4279–4288). Vancouver.

  • Zheng, H., Fu, J., Zha, Z., & Luo, J. (2019). Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, pp. 5012–5021.

  • Zheng, Z., Yang, X., Yu, Z., Zheng, L., Yang, Y., & Kautz, J. (2019). Joint discriminative and generative learning for person re-identification. In: IEEE conference on computer vision and pattern recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, pp. 2138–2147.

  • Zheng, H., Fu, J., Zha, Z., Luo, J., & Mei, T. (2020). Learning rich part hierarchies with progressive attention networks for fine-grained image recognition. IEEE Transaction Image Processing, 29, 476-488.

    Article  MathSciNet  Google Scholar 

  • Zhou, Y., & Shao, L. (2018). Viewpoint-aware attentive multi-view inference for vehicle re-identification. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, pp. 6489–6498.

  • Zhou, M., Bai, Y., Zhang, W., Zhao, T., & Mei, T. (2020). Look-into-object: Self-supervised structure modeling for object recognition. In: 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2020, Seattle, WA, USA, June 13–19, pp. 11771–11780.

  • Zhou, K., Yang, Y., Cavallaro, A., & Xiang, T. (2019). Omni-scale feature learning for person re-identification. In: 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, pp. 3701–3711.

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China (NSFC) under Grants No.61976038 and No.61932020, and The Taishan Scholar Program of Shandong Province.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haojie Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Wang, Z., Li, H. et al. Accurate Fine-Grained Object Recognition with Structure-Driven Relation Graph Networks. Int J Comput Vis 132, 137–160 (2024). https://doi.org/10.1007/s11263-023-01873-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-023-01873-z

Keywords

Navigation