Skip to main content

Advertisement

Manifold and patch-based unsupervised deep metric learning for fine-grained image retrieval

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Accurately and swiftly retrieving from fine-grained images is a critical and challenging task. As the key technology for fine-grained image retrieval, deep metric learning aims to learn a mapping space, where samples exhibit two properties: positive concentration and negative separation, facilitating the measurement of similarities between samples. Unsupervised deep metric learning, which obviates the need for labels during training, has garnered widespread attention compared to its supervised counterparts due to its convenience. Current methods in unsupervised deep metric learning face issues such as imbalance in sample construction, difficulty in sample differentiation, and neglect of intrinsic image features. To address these challenges, we propose Manifold and Patch-based Unsupervised Deep Metric Learning (MPUDML) for Fine-Grained Image Retrieval. Specifically, we adopt a manifold similarity-based balanced sampling strategy for constructing more balanced mini-batch samples. Moreover, we leverage soft supervision information obtained from the manifold and cosine similarities between unlabeled images for sample differentiation, effectively reducing the impact of noisy samples. Additionally, we utilize the rich feature information between internal image patches through image patch-level clustering and localization tasks to guide the acquisition of a more comprehensive feature embedding representation, thereby enhancing retrieval performance. Our method, MPUDML, was evaluated against various state-of-the-art unsupervised deep metric learning approaches in fine-grained image retrieval and clustering tasks. Experimental findings indicate that our MPUDML method exceeds other advanced methods in recall (R@K) and Normalized Mutual Information (NMI).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability and access

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Lu J, Hu J, Jie Z (2017) Deep metric learning for visual understanding: an overview of recent advances. IEEE Signal Process Mag 34(6):76–84

    Article  MATH  Google Scholar 

  2. Qayyum A, Anwar SM, Awais M, Majid M (2017) Medical image retrieval using deep convolutional neural network. Neurocomputing 266:8–20

    Article  MATH  Google Scholar 

  3. De Divitiis L, Becattini F, Baecchi C, Del Bimbo A (2023) Disentangling features for fashion recommendation. ACM Trans Multimed Comput Commun Appl 19(1s):1–21

    Article  MATH  Google Scholar 

  4. Ji Z, Yao W, Pi H, Wei L, He J, Wang H (2017) A survey of personalised image retrieval and recommendation. In: Theoretical computer science: 35th national conference, NCTCS 2017, Wuhan, China, October 14-15, 2017, Proceedings, Springer, pp 233–247

  5. Karnila S, Irianto S, Kurniawan R (2019) Face recognition using content based image retrieval for intelligent security. Int J Advan Eng Res Sci 6(1):91–98

    Article  Google Scholar 

  6. Kim S, Kim D, Cho M, Kwak S (2022) Self-taught metric learning without labels. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7431–7441

  7. Yan J, Luo L, Deng C, Huang H (2021) Unsupervised hyperbolic metric learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12465–12474

  8. Zhang L, Zhang M, Song R, Zhao Z, Li X (2023) Unsupervised embedding learning with mutual-information graph convolutional networks. IEEE Trans Multimedia 25:5916–5926

    Article  MATH  Google Scholar 

  9. Roth K, Milbich T, Sinha S, Gupta P, Ommer B, Cohen JP (2020) Revisiting training strategies and generalization performance in deep metric learning. In: International conference on machine learning, PMLR, pp 8242–8252

  10. Liu Y, Guo Y, Zhu Y, Ming Y (2022) Mining semantic information from intra-image and cross-image for few-shot segmentation. Multimed Tool Appl 81(13):18305–18326

    Article  Google Scholar 

  11. Zhang R, Isola P, Efros AA (2016) Colorful image colorization. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14, Springer, pp 649–666

  12. Mirzasoleiman B, Bilmes J, Leskovec J (2019) Coresets for accelerating incremental gradient methods

  13. Johnson TB, Guestrin C (2018) Training deep models faster with robust, approximate importance sampling. Advan Neural Inform Process Syst 31

  14. Sinha S, Zhang H, Goyal A, Bengio Y, Larochelle H, Odena A (2020) Small-gan: speeding up gan training using core-sets. In: International conference on machine learning, PMLR, pp 9005–9015

  15. Bucher M, Herbin S, Jurie F (2016) Hard negative mining for metric learning based zero-shot classification. In: Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14, Springer, pp 524–531

  16. Harwood B, Vijay KBG, Carneiro G, Reid I, Drummond T (2017) Smart mining for deep metric learning. In: Proceedings of the IEEE international conference on computer vision, pp 2821–2829

  17. Chao-Yuan W, Manmatha R, Smola AJ, Krahenbuhl P (2017) Sampling matters in deep embedding learning. In: Proceedings of the IEEE international conference on computer vision, pp 2840–2848

  18. Zhang C, Wan Y, Qiang H (2024) Deep noise mitigation and semantic reconstruction hashing for unsupervised cross-modal retrieval. Neural Comput Appl:1–15

  19. Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on Computer Vision and Pattern Recognition (CVPR’06), IEEE, vol 2, pp 1735–1742

  20. Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823

  21. Sohn K (2016) Improved deep metric learning with multi-class n-pair loss objective. Advan Neural Inform Process Syst 29

  22. Song HO, Xiang Y, Jegelka S, Savarese S (2016) Deep metric learning via lifted structured feature embedding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4004–4012

  23. Wang X, Hua Y, Kodirov E, Guosheng H, Garnier R, Robertson NM (2019) Ranked list loss for deep metric learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5207–5216

  24. Wang J, Zhou F, Wen S, Liu X, Lin Y (2017) Deep metric learning with angular loss. In: Proceedings of the IEEE international conference on computer vision, pp 2593–2601

  25. Ge W (2018) Deep metric learning with hierarchical triplet loss. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 269–285

  26. Kim W, Goyal B, Chawla K, Lee J, Kwon K (2018) Attention-based ensemble for deep metric learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 736–751

  27. Zheng W, Chen Z, Jiwen L, Zhou J (2019) Hardness-aware deep metric learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 72–81

  28. Wang X, Han X, Huang W, Dong D, Scott MR (2019) Multi-similarity loss with general pair weighting for deep metric learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5022–5030

  29. Alexey D, Fischer P, Tobias J, Springenberg MR, Brox T (2016) Discriminative unsupervised feature learning with exemplar convolutional neural networks. IEEE TPAMI 38(9):1734–1747

    Article  Google Scholar 

  30. Li Y, Kan S, He Z (2020) Unsupervised deep metric learning with transformed attention consistency and contrastive clustering loss. In: European conference on computer vision, Springer, pp 141–157

  31. Mang YX, Zhang PC, Yuen, Shih-Fu C, (2019) Unsupervised embedding learning via invariant and spreading instance feature. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6210–6219

  32. Cao X, Chen B-C, Lim S-N (2019) Unsupervised deep metric learning via auxiliary rotation loss. arXiv:1911.07072

  33. Zhang L, Qi G-J, Wang L, Luo J (2019) Aet vs. aed: unsupervised representation learning by auto-encoding transformations rather than data. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2547–2555

  34. Zhirong W, Xiong Y, Yu SX, Lin D (2018) Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3733–3742

  35. Huang Jiabo, Dong Qi, Gong Shaogang, Zhu Xiatian (2019) Unsupervised deep learning by neighbourhood discovery. In: International conference on machine learning, PMLR, pp 2849–2858

  36. Ye M, Jianbing S, Zhang X, Yuen PC, Shih-Fu C (2020) Augmentation invariant and instance spreading feature for softmax embedding. IEEE Trans on Pattern Anal Mach Intell 44(2):924–939

    Article  MATH  Google Scholar 

  37. Dutta UK, Harandi M, Sekhar CC (2020) Unsupervised deep metric learning via orthogonality based probabilistic loss. IEEE Trans Artif Intell 1(1):74–84

    Article  Google Scholar 

  38. Iscen A, Tolias G, Avrithis Y, Chum O (2018) Mining on manifolds: metric learning without labels. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7642–7651

  39. Zhou D, Weston J, Gretton A, Bousquet O, Schölkopf B (2003) Ranking on data manifolds. Advan Neural Inform Process Syst 16

  40. Zhou D, Bousquet O, Lal T, Weston J, Schölkopf B (2003) Learning with local and global consistency. Advan Neural Inform Process Syst 16

  41. Kim S, Kim D, Cho M, Kwak S (2021) Embedding transfer with label relaxation for improved metric learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3967–3976

  42. Chen P, Liu S, Jia J (2021) Jigsaw clustering for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11526–11535

  43. Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset

  44. Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision workshops, pp 554–561

  45. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  46. Caron M, Bojanowski P, Joulin A, Douze M (2018) Deep clustering for unsupervised learning of visual features. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 132–149

  47. Zhou J, Tang Y, Bing S, Ying W (2021) Unsupervised embedding learning from uncertainty momentum modeling. arXiv:2107.08892

  48. Ye M, Jianbing S, Zhang X, Yuen PC, Shih-Fu C (2022) Augmentation invariant and instance spreading feature for softmax embedding. IEEE Trans Pattern Anal Mach Intell 44(2):924–939

    Article  MATH  Google Scholar 

  49. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  50. Deng J, Dong W, Socher R Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, IEEE, pp 248–255

  51. Song HO, Jegelka S, Rathod V, Murphy K (2017) Deep metric learning via facility location. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5382–5390

  52. Michael O, Georg W, Horst P, Horst B (2018) Deep metric learning with bier: boosting independent embeddings robustly. IEEE TPAMI 42(2):276–290

    MATH  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant No. 62262006, the Guangxi Science and Technology Key Research and Development Program under Grant No. AB24010112, the Key Project of Science and Technology Research Program of Chongqing Education Commission of China under Grant No. KJZD-K202402501, the State Key Laboratory of Geo-Information Engineering and the Key Laboratory of Surveying and Mapping Science and Geospatial Information Technology of MNR, CASM under Grant No. 2023-04-03, and Heavy Rainfall Research Foundation of China under Grant No. BYKJ2024Z12.

Author information

Authors and Affiliations

Authors

Contributions

Shi-hao Yuan designed the research plan and conducted key experimental validations. Yong Feng (co-corresponding author) provided critical academic guidance and participated in the optimization of the methodology. A-Gen Qiu (co-corresponding author) was responsible for the experimental components and made the final modifications to the methods. Guo-fan Duan and Ming-liang Zhou conducted the data analysis and wrote the initial draft. Bao-hua Qiang and Yong-heng Wang reviewed and revised the entire manuscript. All authors participated in the writing and revision of the manuscript and approved the final version of the content.

Corresponding authors

Correspondence to Yong Feng or A-Gen Qiu.

Ethics declarations

Competing interests

The authors declare that they have no known competing fnancial interests or personal relationships that could have appeared to infuence the work reported in this paper.

Ethical and informed consent for data used

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yuan, Sh., Feng, Y., Qiu, AG. et al. Manifold and patch-based unsupervised deep metric learning for fine-grained image retrieval. Appl Intell 55, 96 (2025). https://doi.org/10.1007/s10489-024-05926-9

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-05926-9

Keywords