Skip to main content

Deep Metric Learning with False Positive Probability

Trade Off Hard Levels in a Weighted Way

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10636))

Abstract

In recent years, deep metric learning has been an end-to-end fashion in computer vision community due to the great success of deep learning. However, existing deep metric learning frameworks are faced with a dilemma about the hard level trade-off for training examples. Namely, the “harder” examples we feed to neural networks, the more likely we attain highly discriminative models, but the more easily neural networks get stuck into poor local minimal in practice. To fight against this dilemma, we propose a deep metric learning method with FAlse Positive ProbabilitY (FAPPY) to gradually incorporate different hard levels. Unlike mainstream deep metric learning schemes, the presented approach optimizes similarity probability distribution among training samples, instead of the similarity itself. Experimental results on CUB-200-2011, Stanford Online Products and VehicleID datasets show that our FAPPY method achieves or outperforms state-of-the-art metric learning methods on fine-grained image retrieval and vehicle re-identification tasks. Besides, the presented method has relatively low sensitivity of hyper-parameters and it requires minor changes on traditional classification networks.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Recall@K is the average recall scores over all the query images in testing set. For each query image, the recall score is 1 if at least one positive image in the nearest K returned images and 0 otherwise.

References

  1. Cui, Y., Zhou, F., Lin, Y., Belongie, S.: Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1153–1162 (2016)

    Google Scholar 

  2. Zhang, X., Zhou, F., Lin, Y., Zhang, S.: Embedding label structures for fine-grained feature representation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1114–1123 (2016)

    Google Scholar 

  3. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)

    Google Scholar 

  4. Wen, Y., Zhang, K., Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: 2016 European Conference on Computer Vision, pp. 499–515 (2016)

    Google Scholar 

  5. Liu, H., Tian, Y., Yang, Y., Pang, L., Huang, T.: Deep relative distance learning: tell the difference between similar vehicles. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2167–2175 (2016)

    Google Scholar 

  6. You, J., Wu, A., Zheng, W.S.: Top-push video-based person re-identification. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1345–1353 (2016)

    Google Scholar 

  7. Zhang, Z., Saligrama, V.: Zero-shot learning via joint latent similarity embedding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 634–642 (2016)

    Google Scholar 

  8. Bucher, M., Herbin, S., Jurie, F.: Improving semantic embedding consistency by metric learning for zero-shot classification. In: 2016 European Conference on Computer Vision, pp. 730–746 (2016)

    Google Scholar 

  9. Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: 2016 European Conference on Computer Vision, pp. 539–546 (2016)

    Google Scholar 

  10. Ustinova, E., Lempitsky, V.: Learning deep embeddings with histogram loss. In: Advances in Neural Information Processing Systems, pp. 4170–4178 (2016)

    Google Scholar 

  11. Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 4004–4012 (2016)

    Google Scholar 

  12. Song, H.O., Jegelka, S., Rathod, V., Murphy, K.: Deep metric learning via facility location (2017)

    Google Scholar 

  13. Yuan, Y., Yang, K., Zhang, C.: Hard-aware deeply cascaded embedding. arXiv preprint:1611.05720 (2016)

    Google Scholar 

  14. Zucchini, W., Berzel, A., Nenadic, O.: Applied Smoothing Techniques. Part I: Kernel Density Estimation, pp. 5 (2003)

    Google Scholar 

  15. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: 22nd ACM International Conference on Multimedia, pp. 675–678 (2014)

    Google Scholar 

  16. Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD birds 200 (2011)

    Google Scholar 

  17. Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2011)

    Article  Google Scholar 

  18. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

    Google Scholar 

  19. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  20. Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. arXiv preprint arXiv:1405.3531 (2014)

  21. Gray, D., Brennan, S., Tao, H.: Evaluating appearance models for recognition, reacquisition, and tracking. In: IEEE International Workshop on Performance Evaluation for Tracking and Surveillance (PETS), vol. 3, no. 5 (2007)

    Google Scholar 

Download references

Acknowledgements

This work is partially supported by National Science Foundation of China (No. U1611461), Shenzhen Peacock Plan (20130408-183003656), Science, Technology Planning Project of Guangdong Province (No. 2014B090910001) and National Natural Science Foundation of China (61602014). In addition, we would like to thank Guangzhou Supercomputer Center for providing us with Tianhe-2 system to conduct the experiment and giving us technical supports.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ge Li .

Editor information

Editors and Affiliations

Appendix A

Appendix A

Lemma 1.

Proof.

We compute the gradients of these pairs:

$$ \frac{{\partial P_{ij}^{\Delta } }}{{\partial {\text{s}}_{ij} }} = h_{i}^{t} + h_{j}^{t} , \quad \frac{{\partial P_{ij}^{\Delta } }}{{\partial {\text{s}}_{ik} }} = \left\{ {\begin{array}{*{20}c} { - h_{ij}^{right} ,} & {s_{ik} \in \left[ {b_{t} ,b_{t + 1} } \right]} \\ {0,} & {otherwise} \\ \end{array} } \right.,\quad \frac{{\partial P_{ij}^{\Delta } }}{{\partial {\text{s}}_{jl} }} = \left\{ {\begin{array}{*{20}c} { - h_{ij}^{right} , } & {s_{jl} \in \left[ {b_{t} ,b_{t + 1} } \right]} \\ {0,} & {otherwise} \\ \end{array} } \right. $$

Strictly speaking, the necessary and sufficient condition of \( \frac{{\partial P_{ij}^{\Delta } }}{{\partial {\text{s}}_{ik} }} \ne 0 \) is \( s_{ik} , s_{ij} \in \left( {b_{t} ,b_{t + 1} } \right]. \) Likewise, \( \frac{{\partial P_{ij}^{\Delta } }}{{\partial {\text{s}}_{jl} }} \ne 0 \) if and only if \( s_{jl} , s_{ij} \in \left( {b_{t} ,b_{t + 1} } \right]. \)

The Equivalency

Proof.

In the L2-normalized space, the feature vectors are defined as: \( \overrightarrow {{f_{x}^{L2} }} = \frac{{\overrightarrow {{f_{x} }} }}{{\overrightarrow {{\left\| {f_{x} } \right\|}} }}. \)

The Euclidian distance \( {\text{d}}_{xy}^{L2} \) and \( \Delta^{L2} \) are as follows:

$$ {\text{d}}_{xy}^{L2} = \left\| {\overrightarrow {{f_{x}^{L2} }} - \overrightarrow {{f_{y}^{L2} }} } \right\| = \left\| {\frac{{\overrightarrow {{f_{x} }} }}{{\overrightarrow {{\left\| {f_{x} } \right\|}} }} - \frac{{\overrightarrow {{f_{y} }} }}{{\left\| {\overrightarrow {{f_{y} }} } \right\|}}} \right\| = \sqrt {1 - 2cos\left\langle {\overrightarrow {{f_{x} }} , \overrightarrow {{f_{y} }} } \right\rangle + 1} = \sqrt {2 - 2s_{xy} } $$
$$ \Delta^{L2} = d_{t}^{L2} - d_{t + 1}^{L2} = \sqrt {2 - 2b_{t} } - \sqrt {2 - 2b_{t + 1} } $$

Therefore, \( \Delta^{L2} \) is positively related to \( \Delta \) while \( {\text{d}}_{xy}^{L2} \) is negatively related to \( s_{xy} \), and the explanation in Fig. 4 is reasonable.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zhong, JX., Li, G., Li, N. (2017). Deep Metric Learning with False Positive Probability. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10636. Springer, Cham. https://doi.org/10.1007/978-3-319-70090-8_66

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-70090-8_66

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-70089-2

  • Online ISBN: 978-3-319-70090-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics