Skip to main content

Dynamic Metric Learning with Cross-Level Concept Distillation

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13684))

Included in the following conference series:

  • 2473 Accesses

Abstract

A good similarity metric should be consistent with the human perception of similarities: a sparrow is more similar to an owl if compared to a dog but is more similar to a dog if compared to a car. It depends on the semantic levels to determine if two images are from the same class. As most existing metric learning methods push away interclass samples and pull closer intraclass samples, it seems contradictory if the labels cross semantic levels. The core problem is that a negative pair on a finer semantic level can be a positive pair on a coarser semantic level, so pushing away this pair damages the class structure on the coarser semantic level. We identify the negative repulsion as the key obstacle in existing methods since a positive pair is always positive for coarser semantic levels but not for negative pairs. Our solution, cross-level concept distillation (CLCD), is simple in concept: we only pull closer positive pairs. To facilitate the cross-level semantic structure of the image representations, we propose a hierarchical concept refiner to construct multiple levels of concept embeddings of an image and then pull closer the distance of the corresponding concepts. Extensive experiments demonstrate that the proposed CLCD method outperforms all other competing methods on the hierarchically labeled datasets. Code is available at: https://github.com/wzzheng/CLCD.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/KevinMusgrave/pytorch-metric-learning.

References

  1. Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966)

    Article  MATH  Google Scholar 

  2. Cakir, F., He, K., Xia, X., Kulis, B., Sclaroff, S.: Deep metric learning to rank. In: CVPR, pp. 1861–1870 (2019)

    Google Scholar 

  3. Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. In: NeurIPS (2020)

    Google Scholar 

  4. Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)

    Article  MathSciNet  Google Scholar 

  5. Chen, W., Chen, X., Zhang, J., Huang, K.: Beyond triplet loss: a deep quadruplet network for person re-identification. In: CVPR, pp. 1320–329 (2017)

    Google Scholar 

  6. Chen, X., He, K.: Exploring simple siamese representation learning. In: CVPR, pp. 15750–15758 (2021)

    Google Scholar 

  7. Chu, X., et al.: Twins: revisiting the design of spatial attention in vision transformers (2021)

    Google Scholar 

  8. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)

    Google Scholar 

  9. Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: additive angular margin loss for deep face recognition. In: CVPR, pp. 4690–4699 (2019)

    Google Scholar 

  10. Dhall, A., Makarova, A., Ganea, O., Pavllo, D., Greeff, M., Krause, A.: Hierarchical image classification using entailment cone embeddings. In: CVPRW, pp. 836–837 (2020)

    Google Scholar 

  11. Do, T.T., Tran, T., Reid, I., Kumar, V., Hoang, T., Carneiro, G.: A theoretically sound upper bound on the triplet loss for improving the efficiency of deep distance metric learning. In: CVPR, pp. 10404–10413 (2019)

    Google Scholar 

  12. Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. In: ICLR (2020)

    Google Scholar 

  13. Duan, Y., Zheng, W., Lin, X., Lu, J., Zhou, J.: Deep adversarial metric learning. In: CVPR, pp. 2780–2789 (2018)

    Google Scholar 

  14. Dutt, A., Pellerin, D., Quénot, G.: Improving hierarchical image classification with merged cnn architectures. In: Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing, pp. 1–7 (2017)

    Google Scholar 

  15. Elezi, I., Vascon, S., Torcinovich, A., Pelillo, M., Leal-Taixé, L.: The group loss for deep metric learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 277–294. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_17

    Chapter  Google Scholar 

  16. Ge, W., Huang, W., Dong, D., Scott, M.R.: Deep metric learning with hierarchical triplet loss. In: ECCV, pp. 269–285 (2018)

    Google Scholar 

  17. Ghosh, S., Singh, R., Vatsa, M.: On learning density aware embeddings. In: CVPR, pp. 4884–4892 (2019)

    Google Scholar 

  18. Grill, J.B., et al.: Bootstrap your own latent: a new approach to self-supervised learning. arXiv abs/2006.07733 (2020)

    Google Scholar 

  19. Guo, Y., Liu, Y., Bakker, E.M., Guo, Y., Lew, M.S.: Cnn-rnn: a large-scale hierarchical image classification framework. Multimedia Tools Appl. 77(8), 10251–10271 (2018)

    Article  Google Scholar 

  20. Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: CVPR, pp. 1735–1742 (2006)

    Google Scholar 

  21. Harwood, B., Kumar B G, V., Carneiro, G., Reid, I., Drummond, T.: Smart mining for deep metric learning. In: ICCV, pp. 2840–2848 (2017)

    Google Scholar 

  22. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  23. Hu, J., Lu, J., Tan, Y.P.: Discriminative deep metric learning for face verification in the wild. In: CVPR, pp. 1875–1882 (2014)

    Google Scholar 

  24. Huang, C., Loy, C.C., Tang, X.: Local similarity-aware deep feature embedding. In: NeurIPS, pp. 1262–1270 (2016)

    Google Scholar 

  25. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)

    Google Scholar 

  26. Khrulkov, V., Mirvakhabova, L., Ustinova, E., Oseledets, I., Lempitsky, V.: Hyperbolic image embeddings. In: CVPR, pp. 6418–6428 (2020)

    Google Scholar 

  27. Kim, S., Kim, D., Cho, M., Kwak, S.: Proxy anchor loss for deep metric learning. In: CVPR, pp. 3238–3247 (2020)

    Google Scholar 

  28. Ko, B., Gu, G.: Embedding expansion: augmentation in embedding space for deep metric learning. In: CVPR, pp. 7255–7264 (2020)

    Google Scholar 

  29. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)

    Google Scholar 

  30. Law, M.T., Urtasun, R., Zemel, R.S.: Deep spectral clustering learning. In: ICML, pp. 1985–1994 (2017)

    Google Scholar 

  31. Lin, X., Duan, Y., Dong, Q., Lu, J., Zhou, J.: Deep variational metric learning. In: ECCV, pp. 689–704 (2018)

    Google Scholar 

  32. Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., Song, L.: Sphereface: deep hypersphere embedding for face recognition. In: CVPR, pp. 6738–6746 (2017)

    Google Scholar 

  33. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows (2021)

    Google Scholar 

  34. Movshovitz-Attias, Y., Toshev, A., Leung, T.K., Ioffe, S., Singh, S.: No fuss distance metric learning using proxies. In: ICCV, pp. 360–368 (2017)

    Google Scholar 

  35. Musgrave, K., Belongie, S., Lim, S.-N.: A metric learning reality check. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12370, pp. 681–699. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58595-2_41

    Chapter  Google Scholar 

  36. Nickel, M., Kiela, D.: Poincaré embeddings for learning hierarchical representations. In: NeurIPS, vol. 30 (2017)

    Google Scholar 

  37. Nickel, M., Kiela, D.: Learning continuous hierarchies in the lorentz model of hyperbolic geometry. In: ICML, pp. 3779–3788 (2018)

    Google Scholar 

  38. Opitz, M., Waltner, G., Possegger, H., Bischof, H.: Deep metric learning with bier: boosting independent embeddings robustly. TPAMI 42, 276–290 (2018)

    Article  Google Scholar 

  39. Qian, Q., Shang, L., Sun, B., Hu, J.: Softtriple loss: deep metric learning without triplet sampling. In: ICCV (2019)

    Google Scholar 

  40. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  41. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: CVPR, pp. 815–823 (2015)

    Google Scholar 

  42. Shi, H., et al.: Embedding deep metric for person re-identification: a study against large variations. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 732–748. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_44

    Chapter  Google Scholar 

  43. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv abs/1409.1556 (2014)

    Google Scholar 

  44. Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: NeurIPS, pp. 1857–1865 (2016)

    Google Scholar 

  45. Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: CVPR, pp. 4004–4012 (2016)

    Google Scholar 

  46. Sun, Y., et al.: Circle loss: a unified perspective of pair similarity optimization. In: CVPR, pp. 6398–6407 (2020)

    Google Scholar 

  47. Sun, Y., et al.: Dynamic metric learning: towards a scalable metric space to accommodate multiple semantic scales. In: CVPR, pp. 5393–5402 (2021)

    Google Scholar 

  48. Szegedy, C., et al.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015)

    Google Scholar 

  49. Verma, N., Mahajan, D., Sellamanickam, S., Nair, V.: Learning hierarchical similarity metrics. In: CVPR, pp. 2280–2287 (2012)

    Google Scholar 

  50. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.J.: The Caltech-UCSD Birds-200-2011 dataset. Technical Report. CNS-TR-2011-001, California Institute of Technology (2011)

    Google Scholar 

  51. Wang, F., Zuo, W., Lin, L., Zhang, D., Zhang, L.: Joint learning of single-image and cross-image representations for person re-identification. In: CVPR, pp. 1288–1296 (2016)

    Google Scholar 

  52. Wang, H., et al.: Cosface: large margin cosine loss for deep face recognition. In: CVPR, pp. 5265–5274 (2018)

    Google Scholar 

  53. Wang, J., Zhou, F., Wen, S., Liu, X., Lin, Y.: Deep metric learning with angular loss. In: ICCV, pp. 2593–2601 (2017)

    Google Scholar 

  54. Wang, X., Han, X., Huang, W., Dong, D., Scott, M.R.: Multi-similarity loss with general pair weighting for deep metric learning. In: CVPR, pp. 5022–5030 (2019)

    Google Scholar 

  55. Wang, Y., Hu, B.G.: Hierarchical image classification using support vector machines. In: ACCV, pp. 23–25 (2002)

    Google Scholar 

  56. Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. JMLR 10(2), 207–244 (2009)

    MATH  Google Scholar 

  57. Wu, C.Y., Manmatha, R., Smola, A.J., Krähenbühl, P.: Sampling matters in deep embedding learning. In: ICCV, pp. 2859–2867 (2017)

    Google Scholar 

  58. Yan, Z., et al.: Hd-cnn: hierarchical deep convolutional neural networks for large scale visual recognition. In: ICCV, pp. 2740–2748 (2015)

    Google Scholar 

  59. Yu, B., Tao, D.: Deep metric learning with tuplet margin loss. In: ICCV, pp. 6490–6499 (2019)

    Google Scholar 

  60. Yu, R., Dou, Z., Bai, S., Zhang, Z., Xu, Y., Bai, X.: Hard-aware point-to-set deep metric for person re-identification. In: ECCV, pp. 188–204 (2018)

    Google Scholar 

  61. Yuan, T., Deng, W., Tang, J., Tang, Y., Chen, B.: Signal-to-noise ratio: a robust distance metric for deep metric learning. In: CVPR, pp. 4815–4824 (2019)

    Google Scholar 

  62. Yuan, Y., Yang, K., Zhang, C.: Hard-aware deeply cascaded embedding. In: ICCV, pp. 814–823 (2017)

    Google Scholar 

  63. Zhai, A., Wu, H.Y.: Classification is a strong baseline for deep metric learning. arXiv abs/1811.12649 (2018)

    Google Scholar 

  64. Zhao, Y., Jin, Z., Qi, G.J., Lu, H., Hua, X.S.: An adversarial approach to hard triplet generation. In: ECCV, pp. 501–517 (2018)

    Google Scholar 

  65. Zheng, W., Chen, Z., Lu, J., Zhou, J.: Hardness-aware deep metric learning. In: CVPR, pp. 72–81 (2019)

    Google Scholar 

  66. Zhou, J., Yu, P., Tang, W., Wu, Y.: Efficient online local metric adaptation via negative samples for person re-identification. In: ICCV, pp. 2420–2428 (2017)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Key Research and Development Program of China under Grant 2017YFA0700802, in part by the National Natural Science Foundation of China under Grant 62125603 and Grant U1813218, in part by a grant from the Beijing Academy of Artificial Intelligence (BAAI).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiwen Lu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 173 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zheng, W., Huang, Y., Zhang, B., Zhou, J., Lu, J. (2022). Dynamic Metric Learning with Cross-Level Concept Distillation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13684. Springer, Cham. https://doi.org/10.1007/978-3-031-20053-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20053-3_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20052-6

  • Online ISBN: 978-3-031-20053-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics