Abstract
Accurately and swiftly retrieving from fine-grained images is a critical and challenging task. As the key technology for fine-grained image retrieval, deep metric learning aims to learn a mapping space, where samples exhibit two properties: positive concentration and negative separation, facilitating the measurement of similarities between samples. Unsupervised deep metric learning, which obviates the need for labels during training, has garnered widespread attention compared to its supervised counterparts due to its convenience. Current methods in unsupervised deep metric learning face issues such as imbalance in sample construction, difficulty in sample differentiation, and neglect of intrinsic image features. To address these challenges, we propose Manifold and Patch-based Unsupervised Deep Metric Learning (MPUDML) for Fine-Grained Image Retrieval. Specifically, we adopt a manifold similarity-based balanced sampling strategy for constructing more balanced mini-batch samples. Moreover, we leverage soft supervision information obtained from the manifold and cosine similarities between unlabeled images for sample differentiation, effectively reducing the impact of noisy samples. Additionally, we utilize the rich feature information between internal image patches through image patch-level clustering and localization tasks to guide the acquisition of a more comprehensive feature embedding representation, thereby enhancing retrieval performance. Our method, MPUDML, was evaluated against various state-of-the-art unsupervised deep metric learning approaches in fine-grained image retrieval and clustering tasks. Experimental findings indicate that our MPUDML method exceeds other advanced methods in recall (R@K) and Normalized Mutual Information (NMI).
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability and access
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Lu J, Hu J, Jie Z (2017) Deep metric learning for visual understanding: an overview of recent advances. IEEE Signal Process Mag 34(6):76–84
Qayyum A, Anwar SM, Awais M, Majid M (2017) Medical image retrieval using deep convolutional neural network. Neurocomputing 266:8–20
De Divitiis L, Becattini F, Baecchi C, Del Bimbo A (2023) Disentangling features for fashion recommendation. ACM Trans Multimed Comput Commun Appl 19(1s):1–21
Ji Z, Yao W, Pi H, Wei L, He J, Wang H (2017) A survey of personalised image retrieval and recommendation. In: Theoretical computer science: 35th national conference, NCTCS 2017, Wuhan, China, October 14-15, 2017, Proceedings, Springer, pp 233–247
Karnila S, Irianto S, Kurniawan R (2019) Face recognition using content based image retrieval for intelligent security. Int J Advan Eng Res Sci 6(1):91–98
Kim S, Kim D, Cho M, Kwak S (2022) Self-taught metric learning without labels. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7431–7441
Yan J, Luo L, Deng C, Huang H (2021) Unsupervised hyperbolic metric learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12465–12474
Zhang L, Zhang M, Song R, Zhao Z, Li X (2023) Unsupervised embedding learning with mutual-information graph convolutional networks. IEEE Trans Multimedia 25:5916–5926
Roth K, Milbich T, Sinha S, Gupta P, Ommer B, Cohen JP (2020) Revisiting training strategies and generalization performance in deep metric learning. In: International conference on machine learning, PMLR, pp 8242–8252
Liu Y, Guo Y, Zhu Y, Ming Y (2022) Mining semantic information from intra-image and cross-image for few-shot segmentation. Multimed Tool Appl 81(13):18305–18326
Zhang R, Isola P, Efros AA (2016) Colorful image colorization. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14, Springer, pp 649–666
Mirzasoleiman B, Bilmes J, Leskovec J (2019) Coresets for accelerating incremental gradient methods
Johnson TB, Guestrin C (2018) Training deep models faster with robust, approximate importance sampling. Advan Neural Inform Process Syst 31
Sinha S, Zhang H, Goyal A, Bengio Y, Larochelle H, Odena A (2020) Small-gan: speeding up gan training using core-sets. In: International conference on machine learning, PMLR, pp 9005–9015
Bucher M, Herbin S, Jurie F (2016) Hard negative mining for metric learning based zero-shot classification. In: Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14, Springer, pp 524–531
Harwood B, Vijay KBG, Carneiro G, Reid I, Drummond T (2017) Smart mining for deep metric learning. In: Proceedings of the IEEE international conference on computer vision, pp 2821–2829
Chao-Yuan W, Manmatha R, Smola AJ, Krahenbuhl P (2017) Sampling matters in deep embedding learning. In: Proceedings of the IEEE international conference on computer vision, pp 2840–2848
Zhang C, Wan Y, Qiang H (2024) Deep noise mitigation and semantic reconstruction hashing for unsupervised cross-modal retrieval. Neural Comput Appl:1–15
Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on Computer Vision and Pattern Recognition (CVPR’06), IEEE, vol 2, pp 1735–1742
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823
Sohn K (2016) Improved deep metric learning with multi-class n-pair loss objective. Advan Neural Inform Process Syst 29
Song HO, Xiang Y, Jegelka S, Savarese S (2016) Deep metric learning via lifted structured feature embedding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4004–4012
Wang X, Hua Y, Kodirov E, Guosheng H, Garnier R, Robertson NM (2019) Ranked list loss for deep metric learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5207–5216
Wang J, Zhou F, Wen S, Liu X, Lin Y (2017) Deep metric learning with angular loss. In: Proceedings of the IEEE international conference on computer vision, pp 2593–2601
Ge W (2018) Deep metric learning with hierarchical triplet loss. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 269–285
Kim W, Goyal B, Chawla K, Lee J, Kwon K (2018) Attention-based ensemble for deep metric learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 736–751
Zheng W, Chen Z, Jiwen L, Zhou J (2019) Hardness-aware deep metric learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 72–81
Wang X, Han X, Huang W, Dong D, Scott MR (2019) Multi-similarity loss with general pair weighting for deep metric learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5022–5030
Alexey D, Fischer P, Tobias J, Springenberg MR, Brox T (2016) Discriminative unsupervised feature learning with exemplar convolutional neural networks. IEEE TPAMI 38(9):1734–1747
Li Y, Kan S, He Z (2020) Unsupervised deep metric learning with transformed attention consistency and contrastive clustering loss. In: European conference on computer vision, Springer, pp 141–157
Mang YX, Zhang PC, Yuen, Shih-Fu C, (2019) Unsupervised embedding learning via invariant and spreading instance feature. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6210–6219
Cao X, Chen B-C, Lim S-N (2019) Unsupervised deep metric learning via auxiliary rotation loss. arXiv:1911.07072
Zhang L, Qi G-J, Wang L, Luo J (2019) Aet vs. aed: unsupervised representation learning by auto-encoding transformations rather than data. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2547–2555
Zhirong W, Xiong Y, Yu SX, Lin D (2018) Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3733–3742
Huang Jiabo, Dong Qi, Gong Shaogang, Zhu Xiatian (2019) Unsupervised deep learning by neighbourhood discovery. In: International conference on machine learning, PMLR, pp 2849–2858
Ye M, Jianbing S, Zhang X, Yuen PC, Shih-Fu C (2020) Augmentation invariant and instance spreading feature for softmax embedding. IEEE Trans on Pattern Anal Mach Intell 44(2):924–939
Dutta UK, Harandi M, Sekhar CC (2020) Unsupervised deep metric learning via orthogonality based probabilistic loss. IEEE Trans Artif Intell 1(1):74–84
Iscen A, Tolias G, Avrithis Y, Chum O (2018) Mining on manifolds: metric learning without labels. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7642–7651
Zhou D, Weston J, Gretton A, Bousquet O, Schölkopf B (2003) Ranking on data manifolds. Advan Neural Inform Process Syst 16
Zhou D, Bousquet O, Lal T, Weston J, Schölkopf B (2003) Learning with local and global consistency. Advan Neural Inform Process Syst 16
Kim S, Kim D, Cho M, Kwak S (2021) Embedding transfer with label relaxation for improved metric learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3967–3976
Chen P, Liu S, Jia J (2021) Jigsaw clustering for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11526–11535
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset
Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision workshops, pp 554–561
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Caron M, Bojanowski P, Joulin A, Douze M (2018) Deep clustering for unsupervised learning of visual features. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 132–149
Zhou J, Tang Y, Bing S, Ying W (2021) Unsupervised embedding learning from uncertainty momentum modeling. arXiv:2107.08892
Ye M, Jianbing S, Zhang X, Yuen PC, Shih-Fu C (2022) Augmentation invariant and instance spreading feature for softmax embedding. IEEE Trans Pattern Anal Mach Intell 44(2):924–939
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Deng J, Dong W, Socher R Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, IEEE, pp 248–255
Song HO, Jegelka S, Rathod V, Murphy K (2017) Deep metric learning via facility location. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5382–5390
Michael O, Georg W, Horst P, Horst B (2018) Deep metric learning with bier: boosting independent embeddings robustly. IEEE TPAMI 42(2):276–290
Acknowledgements
This work is supported by the National Natural Science Foundation of China under Grant No. 62262006, the Guangxi Science and Technology Key Research and Development Program under Grant No. AB24010112, the Key Project of Science and Technology Research Program of Chongqing Education Commission of China under Grant No. KJZD-K202402501, the State Key Laboratory of Geo-Information Engineering and the Key Laboratory of Surveying and Mapping Science and Geospatial Information Technology of MNR, CASM under Grant No. 2023-04-03, and Heavy Rainfall Research Foundation of China under Grant No. BYKJ2024Z12.
Author information
Authors and Affiliations
Contributions
Shi-hao Yuan designed the research plan and conducted key experimental validations. Yong Feng (co-corresponding author) provided critical academic guidance and participated in the optimization of the methodology. A-Gen Qiu (co-corresponding author) was responsible for the experimental components and made the final modifications to the methods. Guo-fan Duan and Ming-liang Zhou conducted the data analysis and wrote the initial draft. Bao-hua Qiang and Yong-heng Wang reviewed and revised the entire manuscript. All authors participated in the writing and revision of the manuscript and approved the final version of the content.
Corresponding authors
Ethics declarations
Competing interests
The authors declare that they have no known competing fnancial interests or personal relationships that could have appeared to infuence the work reported in this paper.
Ethical and informed consent for data used
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yuan, Sh., Feng, Y., Qiu, AG. et al. Manifold and patch-based unsupervised deep metric learning for fine-grained image retrieval. Appl Intell 55, 96 (2025). https://doi.org/10.1007/s10489-024-05926-9
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-05926-9