Abstract
A good similarity metric should be consistent with the human perception of similarities: a sparrow is more similar to an owl if compared to a dog but is more similar to a dog if compared to a car. It depends on the semantic levels to determine if two images are from the same class. As most existing metric learning methods push away interclass samples and pull closer intraclass samples, it seems contradictory if the labels cross semantic levels. The core problem is that a negative pair on a finer semantic level can be a positive pair on a coarser semantic level, so pushing away this pair damages the class structure on the coarser semantic level. We identify the negative repulsion as the key obstacle in existing methods since a positive pair is always positive for coarser semantic levels but not for negative pairs. Our solution, cross-level concept distillation (CLCD), is simple in concept: we only pull closer positive pairs. To facilitate the cross-level semantic structure of the image representations, we propose a hierarchical concept refiner to construct multiple levels of concept embeddings of an image and then pull closer the distance of the corresponding concepts. Extensive experiments demonstrate that the proposed CLCD method outperforms all other competing methods on the hierarchically labeled datasets. Code is available at: https://github.com/wzzheng/CLCD.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966)
Cakir, F., He, K., Xia, X., Kulis, B., Sclaroff, S.: Deep metric learning to rank. In: CVPR, pp. 1861–1870 (2019)
Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. In: NeurIPS (2020)
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Chen, W., Chen, X., Zhang, J., Huang, K.: Beyond triplet loss: a deep quadruplet network for person re-identification. In: CVPR, pp. 1320–329 (2017)
Chen, X., He, K.: Exploring simple siamese representation learning. In: CVPR, pp. 15750–15758 (2021)
Chu, X., et al.: Twins: revisiting the design of spatial attention in vision transformers (2021)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: additive angular margin loss for deep face recognition. In: CVPR, pp. 4690–4699 (2019)
Dhall, A., Makarova, A., Ganea, O., Pavllo, D., Greeff, M., Krause, A.: Hierarchical image classification using entailment cone embeddings. In: CVPRW, pp. 836–837 (2020)
Do, T.T., Tran, T., Reid, I., Kumar, V., Hoang, T., Carneiro, G.: A theoretically sound upper bound on the triplet loss for improving the efficiency of deep distance metric learning. In: CVPR, pp. 10404–10413 (2019)
Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. In: ICLR (2020)
Duan, Y., Zheng, W., Lin, X., Lu, J., Zhou, J.: Deep adversarial metric learning. In: CVPR, pp. 2780–2789 (2018)
Dutt, A., Pellerin, D., Quénot, G.: Improving hierarchical image classification with merged cnn architectures. In: Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing, pp. 1–7 (2017)
Elezi, I., Vascon, S., Torcinovich, A., Pelillo, M., Leal-Taixé, L.: The group loss for deep metric learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 277–294. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_17
Ge, W., Huang, W., Dong, D., Scott, M.R.: Deep metric learning with hierarchical triplet loss. In: ECCV, pp. 269–285 (2018)
Ghosh, S., Singh, R., Vatsa, M.: On learning density aware embeddings. In: CVPR, pp. 4884–4892 (2019)
Grill, J.B., et al.: Bootstrap your own latent: a new approach to self-supervised learning. arXiv abs/2006.07733 (2020)
Guo, Y., Liu, Y., Bakker, E.M., Guo, Y., Lew, M.S.: Cnn-rnn: a large-scale hierarchical image classification framework. Multimedia Tools Appl. 77(8), 10251–10271 (2018)
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: CVPR, pp. 1735–1742 (2006)
Harwood, B., Kumar B G, V., Carneiro, G., Reid, I., Drummond, T.: Smart mining for deep metric learning. In: ICCV, pp. 2840–2848 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Hu, J., Lu, J., Tan, Y.P.: Discriminative deep metric learning for face verification in the wild. In: CVPR, pp. 1875–1882 (2014)
Huang, C., Loy, C.C., Tang, X.: Local similarity-aware deep feature embedding. In: NeurIPS, pp. 1262–1270 (2016)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)
Khrulkov, V., Mirvakhabova, L., Ustinova, E., Oseledets, I., Lempitsky, V.: Hyperbolic image embeddings. In: CVPR, pp. 6418–6428 (2020)
Kim, S., Kim, D., Cho, M., Kwak, S.: Proxy anchor loss for deep metric learning. In: CVPR, pp. 3238–3247 (2020)
Ko, B., Gu, G.: Embedding expansion: augmentation in embedding space for deep metric learning. In: CVPR, pp. 7255–7264 (2020)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Law, M.T., Urtasun, R., Zemel, R.S.: Deep spectral clustering learning. In: ICML, pp. 1985–1994 (2017)
Lin, X., Duan, Y., Dong, Q., Lu, J., Zhou, J.: Deep variational metric learning. In: ECCV, pp. 689–704 (2018)
Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., Song, L.: Sphereface: deep hypersphere embedding for face recognition. In: CVPR, pp. 6738–6746 (2017)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows (2021)
Movshovitz-Attias, Y., Toshev, A., Leung, T.K., Ioffe, S., Singh, S.: No fuss distance metric learning using proxies. In: ICCV, pp. 360–368 (2017)
Musgrave, K., Belongie, S., Lim, S.-N.: A metric learning reality check. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 681–699. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_41
Nickel, M., Kiela, D.: Poincaré embeddings for learning hierarchical representations. In: NeurIPS, vol. 30 (2017)
Nickel, M., Kiela, D.: Learning continuous hierarchies in the lorentz model of hyperbolic geometry. In: ICML, pp. 3779–3788 (2018)
Opitz, M., Waltner, G., Possegger, H., Bischof, H.: Deep metric learning with bier: boosting independent embeddings robustly. TPAMI 42, 276–290 (2018)
Qian, Q., Shang, L., Sun, B., Hu, J.: Softtriple loss: deep metric learning without triplet sampling. In: ICCV (2019)
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: CVPR, pp. 815–823 (2015)
Shi, H., et al.: Embedding deep metric for person re-identification: a study against large variations. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 732–748. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_44
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv abs/1409.1556 (2014)
Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: NeurIPS, pp. 1857–1865 (2016)
Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: CVPR, pp. 4004–4012 (2016)
Sun, Y., et al.: Circle loss: a unified perspective of pair similarity optimization. In: CVPR, pp. 6398–6407 (2020)
Sun, Y., et al.: Dynamic metric learning: towards a scalable metric space to accommodate multiple semantic scales. In: CVPR, pp. 5393–5402 (2021)
Szegedy, C., et al.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015)
Verma, N., Mahajan, D., Sellamanickam, S., Nair, V.: Learning hierarchical similarity metrics. In: CVPR, pp. 2280–2287 (2012)
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.J.: The Caltech-UCSD Birds-200-2011 dataset. Technical Report. CNS-TR-2011-001, California Institute of Technology (2011)
Wang, F., Zuo, W., Lin, L., Zhang, D., Zhang, L.: Joint learning of single-image and cross-image representations for person re-identification. In: CVPR, pp. 1288–1296 (2016)
Wang, H., et al.: Cosface: large margin cosine loss for deep face recognition. In: CVPR, pp. 5265–5274 (2018)
Wang, J., Zhou, F., Wen, S., Liu, X., Lin, Y.: Deep metric learning with angular loss. In: ICCV, pp. 2593–2601 (2017)
Wang, X., Han, X., Huang, W., Dong, D., Scott, M.R.: Multi-similarity loss with general pair weighting for deep metric learning. In: CVPR, pp. 5022–5030 (2019)
Wang, Y., Hu, B.G.: Hierarchical image classification using support vector machines. In: ACCV, pp. 23–25 (2002)
Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. JMLR 10(2), 207–244 (2009)
Wu, C.Y., Manmatha, R., Smola, A.J., Krähenbühl, P.: Sampling matters in deep embedding learning. In: ICCV, pp. 2859–2867 (2017)
Yan, Z., et al.: Hd-cnn: hierarchical deep convolutional neural networks for large scale visual recognition. In: ICCV, pp. 2740–2748 (2015)
Yu, B., Tao, D.: Deep metric learning with tuplet margin loss. In: ICCV, pp. 6490–6499 (2019)
Yu, R., Dou, Z., Bai, S., Zhang, Z., Xu, Y., Bai, X.: Hard-aware point-to-set deep metric for person re-identification. In: ECCV, pp. 188–204 (2018)
Yuan, T., Deng, W., Tang, J., Tang, Y., Chen, B.: Signal-to-noise ratio: a robust distance metric for deep metric learning. In: CVPR, pp. 4815–4824 (2019)
Yuan, Y., Yang, K., Zhang, C.: Hard-aware deeply cascaded embedding. In: ICCV, pp. 814–823 (2017)
Zhai, A., Wu, H.Y.: Classification is a strong baseline for deep metric learning. arXiv abs/1811.12649 (2018)
Zhao, Y., Jin, Z., Qi, G.J., Lu, H., Hua, X.S.: An adversarial approach to hard triplet generation. In: ECCV, pp. 501–517 (2018)
Zheng, W., Chen, Z., Lu, J., Zhou, J.: Hardness-aware deep metric learning. In: CVPR, pp. 72–81 (2019)
Zhou, J., Yu, P., Tang, W., Wu, Y.: Efficient online local metric adaptation via negative samples for person re-identification. In: ICCV, pp. 2420–2428 (2017)
Acknowledgements
This work was supported in part by the National Key Research and Development Program of China under Grant 2017YFA0700802, in part by the National Natural Science Foundation of China under Grant 62125603 and Grant U1813218, in part by a grant from the Beijing Academy of Artificial Intelligence (BAAI).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zheng, W., Huang, Y., Zhang, B., Zhou, J., Lu, J. (2022). Dynamic Metric Learning with Cross-Level Concept Distillation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13684. Springer, Cham. https://doi.org/10.1007/978-3-031-20053-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-20053-3_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20052-6
Online ISBN: 978-3-031-20053-3
eBook Packages: Computer ScienceComputer Science (R0)