Manifold and patch-based unsupervised deep metric learning for fine-grained image retrieval

Yuan, Shi-hao; Feng, Yong; Qiu, A-Gen; Duan, Guo-fan; Zhou, Ming-liang; Qiang, Bao-hua; Wang, Yong-heng

doi:10.1007/s10489-024-05926-9

Manifold and patch-based unsupervised deep metric learning for fine-grained image retrieval

Published: 07 December 2024

Volume 55, article number 96, (2025)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Shi-hao Yuan¹,
Yong Feng ORCID: orcid.org/0000-0002-8820-8388¹,
A-Gen Qiu²,
Guo-fan Duan³,
Ming-liang Zhou¹,
Bao-hua Qiang⁴ &
…
Yong-heng Wang⁵

157 Accesses
Explore all metrics

Abstract

Accurately and swiftly retrieving from fine-grained images is a critical and challenging task. As the key technology for fine-grained image retrieval, deep metric learning aims to learn a mapping space, where samples exhibit two properties: positive concentration and negative separation, facilitating the measurement of similarities between samples. Unsupervised deep metric learning, which obviates the need for labels during training, has garnered widespread attention compared to its supervised counterparts due to its convenience. Current methods in unsupervised deep metric learning face issues such as imbalance in sample construction, difficulty in sample differentiation, and neglect of intrinsic image features. To address these challenges, we propose Manifold and Patch-based Unsupervised Deep Metric Learning (MPUDML) for Fine-Grained Image Retrieval. Specifically, we adopt a manifold similarity-based balanced sampling strategy for constructing more balanced mini-batch samples. Moreover, we leverage soft supervision information obtained from the manifold and cosine similarities between unlabeled images for sample differentiation, effectively reducing the impact of noisy samples. Additionally, we utilize the rich feature information between internal image patches through image patch-level clustering and localization tasks to guide the acquisition of a more comprehensive feature embedding representation, thereby enhancing retrieval performance. Our method, MPUDML, was evaluated against various state-of-the-art unsupervised deep metric learning approaches in fine-grained image retrieval and clustering tasks. Experimental findings indicate that our MPUDML method exceeds other advanced methods in recall (R@K) and Normalized Mutual Information (NMI).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 1

Fig. 7

Fig. 8

The Group Loss for Deep Metric Learning

Few-shot Metric Learning: Online Adaptation of Embedding for Retrieval

FeatEMD: Better Patch Sampling and Distance Metric for Few-Shot Image Classification

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability and access

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Lu J, Hu J, Jie Z (2017) Deep metric learning for visual understanding: an overview of recent advances. IEEE Signal Process Mag 34(6):76–84
Article MATH Google Scholar
Qayyum A, Anwar SM, Awais M, Majid M (2017) Medical image retrieval using deep convolutional neural network. Neurocomputing 266:8–20
Article MATH Google Scholar
De Divitiis L, Becattini F, Baecchi C, Del Bimbo A (2023) Disentangling features for fashion recommendation. ACM Trans Multimed Comput Commun Appl 19(1s):1–21
Article MATH Google Scholar
Ji Z, Yao W, Pi H, Wei L, He J, Wang H (2017) A survey of personalised image retrieval and recommendation. In: Theoretical computer science: 35th national conference, NCTCS 2017, Wuhan, China, October 14-15, 2017, Proceedings, Springer, pp 233–247
Karnila S, Irianto S, Kurniawan R (2019) Face recognition using content based image retrieval for intelligent security. Int J Advan Eng Res Sci 6(1):91–98
Article Google Scholar
Kim S, Kim D, Cho M, Kwak S (2022) Self-taught metric learning without labels. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7431–7441
Yan J, Luo L, Deng C, Huang H (2021) Unsupervised hyperbolic metric learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12465–12474
Zhang L, Zhang M, Song R, Zhao Z, Li X (2023) Unsupervised embedding learning with mutual-information graph convolutional networks. IEEE Trans Multimedia 25:5916–5926
Article MATH Google Scholar
Roth K, Milbich T, Sinha S, Gupta P, Ommer B, Cohen JP (2020) Revisiting training strategies and generalization performance in deep metric learning. In: International conference on machine learning, PMLR, pp 8242–8252
Liu Y, Guo Y, Zhu Y, Ming Y (2022) Mining semantic information from intra-image and cross-image for few-shot segmentation. Multimed Tool Appl 81(13):18305–18326
Article Google Scholar
Zhang R, Isola P, Efros AA (2016) Colorful image colorization. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14, Springer, pp 649–666
Mirzasoleiman B, Bilmes J, Leskovec J (2019) Coresets for accelerating incremental gradient methods
Johnson TB, Guestrin C (2018) Training deep models faster with robust, approximate importance sampling. Advan Neural Inform Process Syst 31
Sinha S, Zhang H, Goyal A, Bengio Y, Larochelle H, Odena A (2020) Small-gan: speeding up gan training using core-sets. In: International conference on machine learning, PMLR, pp 9005–9015
Bucher M, Herbin S, Jurie F (2016) Hard negative mining for metric learning based zero-shot classification. In: Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part III 14, Springer, pp 524–531
Harwood B, Vijay KBG, Carneiro G, Reid I, Drummond T (2017) Smart mining for deep metric learning. In: Proceedings of the IEEE international conference on computer vision, pp 2821–2829
Chao-Yuan W, Manmatha R, Smola AJ, Krahenbuhl P (2017) Sampling matters in deep embedding learning. In: Proceedings of the IEEE international conference on computer vision, pp 2840–2848
Zhang C, Wan Y, Qiang H (2024) Deep noise mitigation and semantic reconstruction hashing for unsupervised cross-modal retrieval. Neural Comput Appl:1–15
Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on Computer Vision and Pattern Recognition (CVPR’06), IEEE, vol 2, pp 1735–1742
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823
Sohn K (2016) Improved deep metric learning with multi-class n-pair loss objective. Advan Neural Inform Process Syst 29
Song HO, Xiang Y, Jegelka S, Savarese S (2016) Deep metric learning via lifted structured feature embedding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4004–4012
Wang X, Hua Y, Kodirov E, Guosheng H, Garnier R, Robertson NM (2019) Ranked list loss for deep metric learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5207–5216
Wang J, Zhou F, Wen S, Liu X, Lin Y (2017) Deep metric learning with angular loss. In: Proceedings of the IEEE international conference on computer vision, pp 2593–2601
Ge W (2018) Deep metric learning with hierarchical triplet loss. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 269–285
Kim W, Goyal B, Chawla K, Lee J, Kwon K (2018) Attention-based ensemble for deep metric learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 736–751
Zheng W, Chen Z, Jiwen L, Zhou J (2019) Hardness-aware deep metric learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 72–81
Wang X, Han X, Huang W, Dong D, Scott MR (2019) Multi-similarity loss with general pair weighting for deep metric learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5022–5030
Alexey D, Fischer P, Tobias J, Springenberg MR, Brox T (2016) Discriminative unsupervised feature learning with exemplar convolutional neural networks. IEEE TPAMI 38(9):1734–1747
Article Google Scholar
Li Y, Kan S, He Z (2020) Unsupervised deep metric learning with transformed attention consistency and contrastive clustering loss. In: European conference on computer vision, Springer, pp 141–157
Mang YX, Zhang PC, Yuen, Shih-Fu C, (2019) Unsupervised embedding learning via invariant and spreading instance feature. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6210–6219
Cao X, Chen B-C, Lim S-N (2019) Unsupervised deep metric learning via auxiliary rotation loss. arXiv:1911.07072
Zhang L, Qi G-J, Wang L, Luo J (2019) Aet vs. aed: unsupervised representation learning by auto-encoding transformations rather than data. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2547–2555
Zhirong W, Xiong Y, Yu SX, Lin D (2018) Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3733–3742
Huang Jiabo, Dong Qi, Gong Shaogang, Zhu Xiatian (2019) Unsupervised deep learning by neighbourhood discovery. In: International conference on machine learning, PMLR, pp 2849–2858
Ye M, Jianbing S, Zhang X, Yuen PC, Shih-Fu C (2020) Augmentation invariant and instance spreading feature for softmax embedding. IEEE Trans on Pattern Anal Mach Intell 44(2):924–939
Article MATH Google Scholar
Dutta UK, Harandi M, Sekhar CC (2020) Unsupervised deep metric learning via orthogonality based probabilistic loss. IEEE Trans Artif Intell 1(1):74–84
Article Google Scholar
Iscen A, Tolias G, Avrithis Y, Chum O (2018) Mining on manifolds: metric learning without labels. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7642–7651
Zhou D, Weston J, Gretton A, Bousquet O, Schölkopf B (2003) Ranking on data manifolds. Advan Neural Inform Process Syst 16
Zhou D, Bousquet O, Lal T, Weston J, Schölkopf B (2003) Learning with local and global consistency. Advan Neural Inform Process Syst 16
Kim S, Kim D, Cho M, Kwak S (2021) Embedding transfer with label relaxation for improved metric learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3967–3976
Chen P, Liu S, Jia J (2021) Jigsaw clustering for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11526–11535
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset
Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision workshops, pp 554–561
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Caron M, Bojanowski P, Joulin A, Douze M (2018) Deep clustering for unsupervised learning of visual features. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 132–149
Zhou J, Tang Y, Bing S, Ying W (2021) Unsupervised embedding learning from uncertainty momentum modeling. arXiv:2107.08892
Ye M, Jianbing S, Zhang X, Yuen PC, Shih-Fu C (2022) Augmentation invariant and instance spreading feature for softmax embedding. IEEE Trans Pattern Anal Mach Intell 44(2):924–939
Article MATH Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Deng J, Dong W, Socher R Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, IEEE, pp 248–255
Song HO, Jegelka S, Rathod V, Murphy K (2017) Deep metric learning via facility location. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5382–5390
Michael O, Georg W, Horst P, Horst B (2018) Deep metric learning with bier: boosting independent embeddings robustly. IEEE TPAMI 42(2):276–290
MATH Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant No. 62262006, the Guangxi Science and Technology Key Research and Development Program under Grant No. AB24010112, the Key Project of Science and Technology Research Program of Chongqing Education Commission of China under Grant No. KJZD-K202402501, the State Key Laboratory of Geo-Information Engineering and the Key Laboratory of Surveying and Mapping Science and Geospatial Information Technology of MNR, CASM under Grant No. 2023-04-03, and Heavy Rainfall Research Foundation of China under Grant No. BYKJ2024Z12.

Author information

Authors and Affiliations

College of Computer Science, Chongqing University, Chongqing 401331, China, and Heavy Rainfall Research Center of China, No.3, Donghu East Road, Hongshan District, Wuhan, 401311, China
Shi-hao Yuan, Yong Feng & Ming-liang Zhou
State Key Laboratory of Geo-Information Engineering and Key Laboratory of Surveying and Mapping Science and Geospatial Information Technology of MNR, CASM, Beijing, 100036, China
A-Gen Qiu
Chongqing Metropolitan College of Science and Technology, Chongqing, 402167, China
Guo-fan Duan
Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, 541004, China
Bao-hua Qiang
8# of Zhejiang Lab, Yuhang district, Hangzhou, 311121, China
Yong-heng Wang

Authors

Shi-hao Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Yong Feng
View author publications
You can also search for this author in PubMed Google Scholar
A-Gen Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Guo-fan Duan
View author publications
You can also search for this author in PubMed Google Scholar
Ming-liang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Bao-hua Qiang
View author publications
You can also search for this author in PubMed Google Scholar
Yong-heng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Shi-hao Yuan designed the research plan and conducted key experimental validations. Yong Feng (co-corresponding author) provided critical academic guidance and participated in the optimization of the methodology. A-Gen Qiu (co-corresponding author) was responsible for the experimental components and made the final modifications to the methods. Guo-fan Duan and Ming-liang Zhou conducted the data analysis and wrote the initial draft. Bao-hua Qiang and Yong-heng Wang reviewed and revised the entire manuscript. All authors participated in the writing and revision of the manuscript and approved the final version of the content.

Corresponding authors

Correspondence to Yong Feng or A-Gen Qiu.

Ethics declarations

Competing interests

The authors declare that they have no known competing fnancial interests or personal relationships that could have appeared to infuence the work reported in this paper.

Ethical and informed consent for data used

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yuan, Sh., Feng, Y., Qiu, AG. et al. Manifold and patch-based unsupervised deep metric learning for fine-grained image retrieval. Appl Intell 55, 96 (2025). https://doi.org/10.1007/s10489-024-05926-9

Download citation

Accepted: 16 October 2024
Published: 07 December 2024
DOI: https://doi.org/10.1007/s10489-024-05926-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Manifold and patch-based unsupervised deep metric learning for fine-grained image retrieval

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The Group Loss for Deep Metric Learning

Few-shot Metric Learning: Online Adaptation of Embedding for Retrieval

FeatEMD: Better Patch Sampling and Distance Metric for Few-Shot Image Classification

Data availability and access

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Manifold and patch-based unsupervised deep metric learning for fine-grained image retrieval

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The Group Loss for Deep Metric Learning

Few-shot Metric Learning: Online Adaptation of Embedding for Retrieval

FeatEMD: Better Patch Sampling and Distance Metric for Few-Shot Image Classification

Explore related subjects

Data availability and access

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation