Abstract
In recent years, deep metric learning has been an end-to-end fashion in computer vision community due to the great success of deep learning. However, existing deep metric learning frameworks are faced with a dilemma about the hard level trade-off for training examples. Namely, the “harder” examples we feed to neural networks, the more likely we attain highly discriminative models, but the more easily neural networks get stuck into poor local minimal in practice. To fight against this dilemma, we propose a deep metric learning method with FAlse Positive ProbabilitY (FAPPY) to gradually incorporate different hard levels. Unlike mainstream deep metric learning schemes, the presented approach optimizes similarity probability distribution among training samples, instead of the similarity itself. Experimental results on CUB-200-2011, Stanford Online Products and VehicleID datasets show that our FAPPY method achieves or outperforms state-of-the-art metric learning methods on fine-grained image retrieval and vehicle re-identification tasks. Besides, the presented method has relatively low sensitivity of hyper-parameters and it requires minor changes on traditional classification networks.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Recall@K is the average recall scores over all the query images in testing set. For each query image, the recall score is 1 if at least one positive image in the nearest K returned images and 0 otherwise.
References
Cui, Y., Zhou, F., Lin, Y., Belongie, S.: Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1153–1162 (2016)
Zhang, X., Zhou, F., Lin, Y., Zhang, S.: Embedding label structures for fine-grained feature representation. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1114–1123 (2016)
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
Wen, Y., Zhang, K., Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: 2016 European Conference on Computer Vision, pp. 499–515 (2016)
Liu, H., Tian, Y., Yang, Y., Pang, L., Huang, T.: Deep relative distance learning: tell the difference between similar vehicles. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2167–2175 (2016)
You, J., Wu, A., Zheng, W.S.: Top-push video-based person re-identification. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1345–1353 (2016)
Zhang, Z., Saligrama, V.: Zero-shot learning via joint latent similarity embedding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 634–642 (2016)
Bucher, M., Herbin, S., Jurie, F.: Improving semantic embedding consistency by metric learning for zero-shot classification. In: 2016 European Conference on Computer Vision, pp. 730–746 (2016)
Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: 2016 European Conference on Computer Vision, pp. 539–546 (2016)
Ustinova, E., Lempitsky, V.: Learning deep embeddings with histogram loss. In: Advances in Neural Information Processing Systems, pp. 4170–4178 (2016)
Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 4004–4012 (2016)
Song, H.O., Jegelka, S., Rathod, V., Murphy, K.: Deep metric learning via facility location (2017)
Yuan, Y., Yang, K., Zhang, C.: Hard-aware deeply cascaded embedding. arXiv preprint:1611.05720 (2016)
Zucchini, W., Berzel, A., Nenadic, O.: Applied Smoothing Techniques. Part I: Kernel Density Estimation, pp. 5 (2003)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: 22nd ACM International Conference on Multimedia, pp. 675–678 (2014)
Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-UCSD birds 200 (2011)
Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2011)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. arXiv preprint arXiv:1405.3531 (2014)
Gray, D., Brennan, S., Tao, H.: Evaluating appearance models for recognition, reacquisition, and tracking. In: IEEE International Workshop on Performance Evaluation for Tracking and Surveillance (PETS), vol. 3, no. 5 (2007)
Acknowledgements
This work is partially supported by National Science Foundation of China (No. U1611461), Shenzhen Peacock Plan (20130408-183003656), Science, Technology Planning Project of Guangdong Province (No. 2014B090910001) and National Natural Science Foundation of China (61602014). In addition, we would like to thank Guangzhou Supercomputer Center for providing us with Tianhe-2 system to conduct the experiment and giving us technical supports.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix A
Appendix A
Lemma 1.
Proof.
We compute the gradients of these pairs:
Strictly speaking, the necessary and sufficient condition of \( \frac{{\partial P_{ij}^{\Delta } }}{{\partial {\text{s}}_{ik} }} \ne 0 \) is \( s_{ik} , s_{ij} \in \left( {b_{t} ,b_{t + 1} } \right]. \) Likewise, \( \frac{{\partial P_{ij}^{\Delta } }}{{\partial {\text{s}}_{jl} }} \ne 0 \) if and only if \( s_{jl} , s_{ij} \in \left( {b_{t} ,b_{t + 1} } \right]. \)
The Equivalency
Proof.
In the L2-normalized space, the feature vectors are defined as: \( \overrightarrow {{f_{x}^{L2} }} = \frac{{\overrightarrow {{f_{x} }} }}{{\overrightarrow {{\left\| {f_{x} } \right\|}} }}. \)
The Euclidian distance \( {\text{d}}_{xy}^{L2} \) and \( \Delta^{L2} \) are as follows:
Therefore, \( \Delta^{L2} \) is positively related to \( \Delta \) while \( {\text{d}}_{xy}^{L2} \) is negatively related to \( s_{xy} \), and the explanation in Fig. 4 is reasonable.
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Zhong, JX., Li, G., Li, N. (2017). Deep Metric Learning with False Positive Probability. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10636. Springer, Cham. https://doi.org/10.1007/978-3-319-70090-8_66
Download citation
DOI: https://doi.org/10.1007/978-3-319-70090-8_66
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70089-2
Online ISBN: 978-3-319-70090-8
eBook Packages: Computer ScienceComputer Science (R0)