Abstract
Fine-grained image recognition (FGIR) is more challenging than general image recognition tasks due to the inherently subtle object variation. The existing FGIR methods are mainly based on single-granularity feature fusion, the extracted fused features often cannot fully reflect the characteristics of the object, and the recognition results based on the fused feature also lack interpretability. To solve this problem, we propose a novel end-to-end trusted multi-granularity information fusion (TMGIF) model for weakly-supervised fine-grained image recognition. It can automatically extract multi-granularity information representation for a fine-grained image, further evaluate the quality of information granules, and then progressively fuse multi-granularity information according to the quality to obtain a reliable and interpretable recognition result. We evaluate TMGIF on three standard benchmark datasets, and demonstrate the proposed method can provide competitive results.
Similar content being viewed by others
References
Wei XS, Song YZ, Mac Aodha O, et al. Fine-grained image analysis with deep learning: a survey. IEEE transactions on pattern analysis and machine intelligence, 2021.
Wah C, Branson S, Welinder P, et al. The caltech-ucsd birds-200–2011 dataset. California Institute of Technology, Pasadena, 2011.
Khosla A, Jayadevaprakash N, Yao B, et al. Novel dataset for fine-grained image categorization: Stanford dogs [C] // Proceedings of CVPR Workshop on Fine-Grained Visual Categorization (FGVC). 2011, 2(1).
Krause J, Stark M, Deng J, et al. 3d object representations for fine-grained categorization. Proceedings of the IEEE international conference on computer vision workshops. 2013: 554–561.
Allegra D, Litrico M, Spatafora M A N, et al. Exploiting Egocentric Vision on Shopping Cart for Out-Of-Stock Detection in Retail Environments. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 1735–1740. https://doi.org/10.1109/ICCVW54120.2021.00199.
Ratnayake M N, Dyer A G, Dorin A. Towards Computer Vision and Deep Learning Facilitated Pollination Monitoring for Agriculture. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 2921–2930. https://doi.org/10.1109/CVPRW53098.2021.00327
Van Horn G, Cole E, Beery S, et al. Benchmarking representation learning for natural world image collections. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 12884–12893. https://doi.org/10.1109/CVPR46437.2021.01269.
Liu C, Huynh DQ, Sun Y et al (2020) A vision-based pipeline for vehicle counting, speed estimation, and classification. IEEE Trans Intell Transp Syst 22(12):7547–7560
Min S, Yao H, Xie H et al (2020) Multi-objective matrix normalization for fine-grained visual recognition. IEEE Trans Image Process 29:4996–5009
Zheng H, Fu J, Zha Z J, et al. Learning deep bilinear transformation for fine-grained image representation. Adv Neural Inform Process Syst 2019, 32.
Wei X, Zhang Y, Gong Y, et al. Grassmann pooling as compact homogeneous bilinear pooling for fine-grained visual classification. Proceedings of the European Conference on Computer Vision (ECCV), 2018: 355–370.
Li JH, Zhou XR (2022) Attribute reduction multi-granularity formal decision contexts. Pattern Recognition and Artifical Intelligence 35(5):387–400
Xin Z, Chen G, Chen J et al (2022) MGPOOL: multi-granular graph pooling convolutional networks representation learning. Int J Mach Learn Cybern 13(3):783–796
Berg T, Belhumeur P N. Poof: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2013: 955–962.
Xie L, Tian Q, Hong R, et al. Hierarchical part matching for fine-grained visual categorization. Proceedings of the IEEE international conference on computer vision. 2013: 1641–1648.
Lei J, Duan J, Wu F et al (2016) Fast mode decision based on grayscale similarity and inter-view correlation for depth map coding in 3D-HEVC. IEEE Trans Circuits Syst Video Technol 28(3):706–718
Huang S, Xu Z, Tao D, et al. Part-stacked cnn for fine-grained visual categorization. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 1173–1182.
Nauta M, van Bree R, Seifert C. Neural prototype trees for interpretable fine-grained image recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 14933–14943.
Zhang L, Huang S, Liu W. Intra-class part swapping for fine-grained image classification. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2021: 3209–3218.
He G, Li F, Wang Q et al (2021) A hierarchical sampling based triplet network for fine-grained image classification. Pattern Recogn 115:107889
Ding Y, Ma Z, Wen S et al (2021) AP-CNN: Weakly supervised attention pyramid convolutional neural network for fine-grained visual classification. IEEE Trans Image Process 30:2826–2836
Cao S, Wang W, Zhang J, et al. A few-shot fine-grained image classification method leveraging global and local structures. International Journal of Machine Learning and Cybernetics, 2022: 1–9.
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770–778.
Wang J, Tu Z, Fu J et al (2022) Guest Editorial: Introduction to the Special Section on Fine-Grained Visual Categorization. IEEE Trans Pattern Anal Mach Intell 44(02):560–562
Zhang N, Donahue J, Girshick R, et al. Part-based R-CNNs for fine-grained category detection. European conference on computer vision. Springer, Cham, 2014: 834–849.
Wei XS, Xie CW, Wu J et al (2018) Mask-CNN: Localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recogn 76:704–714
Wang Z, Wang S, Li H, et al. Graph-propagation based correlation learning for weakly supervised fine-grained image classification. Proceedings of the AAAI Conference on Artificial Intelligence. 2020, 34(07): 12289–12296.
Lin T Y, RoyChowdhury A, Maji S. Bilinear cnn models for fine-grained visual recognition. Proceedings of the IEEE international conference on computer vision. 2015: 1449–1457.
Zhuang P, Wang Y, Qiao Y. Learning attentive pairwise interaction for fine-grained classification. Proceedings of the AAAI Conference on Artificial Intelligence. 2020, 34(07): 13130–13137.
Chen Y, Bai Y, Zhang W, et al. Destruction and construction learning for fine-grained image recognition [C] // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 5157–5166.
Du R, Chang D, Bhunia A K, et al. Fine-grained visual classification via progressive multi-granularity training of jigsaw patches. European Conference on Computer Vision. Springer, Cham, 2020: 153–168.
Du R, Xie J, Ma Z, et al. Progressive Learning of Category-Consistent Multi-Granularity Features for Fine-Grained Visual Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
Zhang P, Li T, Wang G et al (2021) Multi-source information fusion based on rough set theory: a review. Information Fusion 68:85–117
Meraner A, Ebel P, Zhu XX et al (2020) Cloud removal in Sentinel-2 imagery using a deep residual neural network and SAR-optical data fusion. ISPRS J Photogramm Remote Sens 166:333–346
Yu R, Ye D, Wang Z, et al. CFFNN: Cross feature fusion neural network for collaborative filtering. IEEE Transactions on Knowledge and Data Engineering, 2021.
Zhang Z, Zhang X, Peng C, et al. Exfuse: Enhancing feature fusion for semantic segmentation. Proceedings of the European conference on computer vision (ECCV). 2018: 269–284.
Pan Y, Zhang L, Li ZW et al (2019) Improved fuzzy Bayesian network-based risk analysis with interval-valued fuzzy sets and D-S evidence theory. IEEE Trans Fuzzy Syst 28(9):2063–2077
Jøsang A (2002) The consensus operator for combining beliefs. Artif Intell 141(1–2):157–170
Jøsang A (2001) A logic for uncertain probabilities. Internat J Uncertain Fuzziness Knowl-Based Syst 9(03):279–311
Josang A, Cho J H, Chen F. Uncertainty characteristics of subjective opinions. Proceedings of the 21st International Conference on Information Fusion (FUSION), 2018: 1998–2005.
Wang X, Jiang X, Ding H et al (2019) Bi-directional dermoscopic feature learning and multi-scale consistent decision fusion for skin lesion segmentation [J]. IEEE Trans Image Process 29:3039–3051
Han Z, Zhang C, Fu H, et al. Trusted multi-view classification. International Conference on Learning Representations, 2020.
Maji S, Rahtu E, Kannala J, et al. Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151, 2013.
Ridnik T, Ben-Baruch E, Noy A, et al. Imagenet-21k pretraining for the masses. arXiv preprint arXiv:2104.10972, 2021.
Dubey A, Gupta O, Raskar R, et al. Maximum-entropy fine grained classification. Advances in neural information processing systems, 2018, 31.
Hu Y, Liu X, Zhang B, et al. Alignment Enhancement Network for Fine-grained Visual Categorization. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2021, 17(1s): 1–20.
Chang D, Ding Y, Xie J et al (2020) The devil is in the channels: Mutual-channel loss for fine-grained image classification. IEEE Trans Image Process 29:4683–4695
Joung S, Kim S, Kim M, et al. Learning Canonical 3D Object Representation for Fine-Grained Recognition. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 1035–1045.
Wang S, Li H, Wang Z, et al. Dynamic Position-aware Network for Fine-grained Image Recognition, Proceedings of the AAAI Conference on Artificial Intelligence. 2021, 35(4): 2791–2799.
Chang D, Pang K, Zheng Y, et al. Your" Flamingo" is My" Bird": Fine-Grained, or Not. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 11476–11485.
Zhang L, Huang S, Liu W, et al. Learning a mixture of granularity-specific experts for fine-grained categorization. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 8331–8340.
Gao Y, Han X, Wang X, et al. Channel interaction networks for fine-grained image categorization. Proceedings of the AAAI Conference on Artificial Intelligence. 2020, 34(07): 10818–10825.
Tan M, Yuan F, Yu J, et al. Fine-grained image classification via multi-scale selective hierarchical biquadratic pooling. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2022, 18(1s): 1–23.
Zhu H, Ke W, Li D, et al. Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 4692–4702.
Selvaraju R R, Cogswell M, Das A, et al. Grad-cam: Visual explanations from deep networks via gradient-based localization. Proceedings of the IEEE international conference on computer vision. 2017: 618–626.
Yu Y, Zhu H, Wang L et al (2021) Dense crowd counting based on adaptive scene division. Int J Mach Learn Cybern 12(4):931–942
Yue X, Chen Y, Yuan B, et al. Three-way image classification with evidential deep convolutional neural networks. Cognitive Computation, 2021: 1–13.
Yue X, Zhang C, Fujita H et al (2021) Clothing fashion style recognition with design issue graph. Appl Intell 51(6):3548–3560
Acknowledgements
This paper was supported by the National Natural Science Foundation of China (No. 62163016, 62066014), the Natural Science Foundation of Jiangxi Province (20212ACB202001, 20202BABL202018), Double Thousand Plan of Jiangxi Province of China, the State Key Laboratory of Computer Science Open Subject Fund (CN) under Grant SYSKF2102.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yu, Y., Tang, H., Qian, J. et al. Fine-grained image recognition via trusted multi-granularity information fusion. Int. J. Mach. Learn. & Cyber. 14, 1105–1117 (2023). https://doi.org/10.1007/s13042-022-01685-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-022-01685-6