Skip to main content
Log in

SSCRL: fine-grained object retrieval with switched shifted centralized ranking loss

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Image retrieval is an attractive task in computer vision that aims at browsing, searching, and returning images from a large database of digital images after delivering a retrieval query. Numerous works have focused on fine-grained object retrieval (FGOR) because it is extremely challenging and of great value in practical application. Due to the large diversity within a class and the small diversity across different classes of fine-grained objects data, a convolutional neural network (CNN) is a powerful extractor that can be used to obtain fine-grained features for distinguishing tiny variations between classes. As an indispensable part of a convolutional neural network model, the loss function is of critical importance for feature extraction. In this work, based on the global structure loss function, we propose a variant of softmax loss, named switched shifted softmax loss, to potentially reduce the overfitting phenomenon of the model. Comparative experiments with different backbone structures verify that the developed loss function with trivial transformation enhances the fine-grained retrieval performance of deep learning methods1. Furthermore, additional experiments of fine-grained object classification and person re-identification (re-ID) prove that our method has a wide spectrum of applicability to other tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://www.vision.caltech.edu/visipedia/CUB-200-2011.html

  2. https://ai.stanford.edu/~jkrause/cars/car_dataset.html

  3. https://pytorch.org

  4. the values of α and λ are same as in previous works [46, 52]. And the switched value β ∈ [0.4, 0.6] can obtain similar performance.

  5. https://github.com/KaiyangZhou/deep-person-reid

References

  1. Bell S, Bala K (2015) Learning visual similarity for product design with convolutional neural networks. ACM Trans Graph (TOG) 34(4):98

    Article  Google Scholar 

  2. Deng C, Liu X, Mu Y, Li J (2015) Large-scale multi-task image labeling with adaptive relevance discovery and feature hashing. Signal Process 112:137–145

    Article  Google Scholar 

  3. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition. IEEE, pp 248–255

  4. Dubey A, Gupta O, Guo P, Raskar R, Farrell R, Naik N (2018) Pairwise confusion for fine-grained visual classification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 70–86

  5. Dubey A, Gupta O, Raskar R, Naik N (2018) Maximum-entropy fine grained classification. In: Advances in neural information processing systems, pp 637–647

  6. Golik P, Doetsch P, Ney H (2013) Cross-entropy vs. squared error training: a theoretical and experimental comparison. In: Interspeech, vol 13, pp 1756–1760

  7. Gudivada VN, Raghavan VV (1995) Content based image retrieval systems. Computer 28(9):18–22

    Article  Google Scholar 

  8. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2961–2969

  9. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  10. Hoi SC, Liu W, Chang SF (2010) Semi-supervised distance metric learning for collaborative image retrieval and clustering. ACM Trans Multimed Comput Commun Appl (TOMM) 6(3):1–26

    Article  Google Scholar 

  11. Huang C, Loy CC, Tang X (2016) Local similarity-aware deep feature embedding. In: Advances in neural information processing systems, pp 1262–1270

  12. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  13. Jain AK, Vailaya A (1996) Image retrieval using color and shape. Pattern Recogn 29 (8):1233–1244

    Article  Google Scholar 

  14. Khosla A, Jayadevaprakash N, Yao B, Li FF (2011) Novel dataset for fine-grained image categorization: Stanford dogs. In: Proc. CVPR workshop on fine-grained visual categorization (FGVC), vol 2

  15. Krause J, Stark M, Deng J (2013) Fei-fei, L.: 3d object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp 554–561

  16. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  17. Li C, Deng C, Wang L, Xie D, Liu X (2019) Coupled cyclegan: Unsupervised hashing network for cross-modal retrieval. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 33, pp 176–183

  18. Liu W, Wang J, Ji R, Jiang YG, Chang SF (2012) Supervised hashing with kernels. In: 2012 IEEE Conference on computer vision and pattern recognition. IEEE, pp 2074–2081

  19. Liu Z, Li H, Zhou W, Zhao R, Tian Q (2014) Contextual hashing for large-scale image search. IEEE Trans Image Process 23(4):1606–1614

    Article  MathSciNet  MATH  Google Scholar 

  20. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60 (2):91–110

    Article  Google Scholar 

  21. Maji S, Kannala J, Rahtu E, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. Technical report

  22. Nilsback M, Zisserman A (2006) A visual vocabulary for flower classification. In: 2006 IEEE Computer society conference on computer vision and pattern recognition (CVPR’06), vol 2, pp 1447–1454. https://doi.org/10.1109/CVPR.2006.42

  23. Oh Song H, Jegelka S, Rathod V, Murphy K (2017) Deep metric learning via facility location. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5382–5390

  24. Oh Song H, Xiang Y, Jegelka S, Savarese S (2016) Deep metric learning via lifted structured feature embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4004–4012

  25. Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971– 987

    Article  MATH  Google Scholar 

  26. Radenović F, Tolias G, Chum O (2018) Fine-tuning cnn image retrieval with no human annotation. IEEE Trans Pattern Anal Mach Intell 41(7):1655–1668

    Article  Google Scholar 

  27. Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823

  28. Shi W, Gong Y, Tao X, Cheng D, Zheng N (2018) Fine-grained image classification using modified DCNNs trained by cascaded softmax and generalized large-margin losses. IEEE Trans Neural Netw Learn Syst 30(3):683–694

    Article  Google Scholar 

  29. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Proceedings of the IEEE International Conference on Learning Representations

  30. Sohn K (2016) Improved deep metric learning with multi-class n-pair loss objective. In: Advances in neural information processing systems, pp 1857–1865

  31. Su X, Liu Z, Zhang Y, Chen CP (2019) Event-triggered adaptive fuzzy tracking control for uncertain nonlinear systems preceded by unknown Prandtl-Ishlinskii hysteresis. IEEE Trans Cybern 51 (6):2979–2992

    Article  Google Scholar 

  32. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826

  33. Ustinova E, Lempitsky V (2016) Learning deep embeddings with histogram loss. In: Advances in neural information processing systems, pp 4170–4178

  34. Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset

  35. Wang H, Wang Y, Zhou Z, Ji X, Gong D, Zhou J, Li Z, Liu W (2018) Cosface: Large margin cosine loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5265–5274

  36. Wang J, Song Y, Leung T, Rosenberg C, Wang J, Philbin J, Chen B, Wu Y (2014) Learning fine-grained image similarity with deep ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1386–1393

  37. Wei L, Zhang S, Gao W, Tian Q (2018) Person transfer gan to bridge domain gap for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 79–88

  38. Wei XS, Luo J, Wu J, Zhou ZH (2017) Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Trans Image Process 26(6):2868–2881

    Article  MathSciNet  MATH  Google Scholar 

  39. Wei XS, Wu J, Cui Q (2019) Deep learning for fine-grained image analysis: A survey. arXiv:1907.03069

  40. Xie L, Wang J, Zhang B, Tian Q (2015) Fine-grained image search. IEEE Trans Multimed 17(5):636–647

    Article  Google Scholar 

  41. Xu B, Bu J, Chen C, Cai D, He X, Liu W, Luo J (2011) Efficient manifold ranking for image retrieval. In: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pp 525–534

  42. Yi D, Lei Z, Li S (2014) Deep metric learning for practical person re-identification. ArXiv e-prints

  43. Yuan L, Wang T, Zhang X, Tay FE, Jie Z, Liu W, Feng J (2020) Central similarity quantization for efficient image and video retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3083–3092

  44. Yuan X, Yu J, Qin Z, Wan T (2011) A sift-lbp image retrieval model based on bag of features. In: IEEE International conference on image processing, pp 1061–1064

  45. Zeng X, Wang X, Chen K, Zhang Y, Li D (2019) Dividing the neighbors is not enough: adding confusion makes local descriptor stronger. IEEE Access 7:136106–136115

    Article  Google Scholar 

  46. Zeng X, Zhang Y, Wang X, Chen K, Li D, Yang W (2020) Fine-grained image retrieval via piecewise cross entropy loss. Image Vis Comput 93:103820

    Article  Google Scholar 

  47. Zhang S, Yang M, Wang X, Lin Y, Tian Q (2015) Semantic-aware co-indexing for image retrieval. IEEE Trans Pattern Anal Mach Intell 37(12):2573–2587

    Article  Google Scholar 

  48. Zhang X, Zhou F, Lin Y, Zhang S (2016) Embedding label structures for fine-grained feature representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1114–1123

  49. Zheng L, Wang S, Tian Q (2014) Coupled binary embedding for large-scale image retrieval. IEEE Trans Image Process 23(8): 3368–3380

    Article  MathSciNet  MATH  Google Scholar 

  50. Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: a benchmark. In: Proceedings of the IEEE international conference on computer vision, pp 1116–1124

  51. Zheng X, Ji R, Sun X, Wu Y, Huang F, Yang Y (2018) Centralized ranking loss with weakly supervised localization for fine-grained object retrieval. In: IJCAI, pp 1226–1233

  52. Zheng X, Ji R, Sun X, Zhang B, Wu Y, Huang F (2019) Towards optimal fine grained retrieval via decorrelated centralized loss with normalize-scale layer

  53. Zheng Z, Zheng L, Yang Y (2017) Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3754–3762

  54. Zhou K, Xiang T (2019) Torchreid: A library for deep learning person re-identification in pytorch. arXiv:1910.10093

  55. Zhou K, Yang Y, Cavallaro A, Xiang T (2019) Omni-scale feature learning for person re-identification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 3702–3712

Download references

Acknowledgments

This work was supported by the Ph.D. Start-up Fund of Guangdong Polytechnic Normal University (991641258 and 991641231), Guangzhou Science and Technology Program (105130372030), the National Natural Science Foundation of China (61803090), the Natural Science Foundation of Guangdong Province (2019A1515012109). We appreciate Prof. Rongjun Chen for his professional advice for our work. We thank the associate editor and all the reviewers for their time and evaluation on our work, which is very helpful for us to improve the quality and presentation of our paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaodong Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The code is available at https://github.com/Zengxianxian727/FGOR

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zeng, X., Liu, S., Wang, X. et al. SSCRL: fine-grained object retrieval with switched shifted centralized ranking loss. Appl Intell 53, 336–350 (2023). https://doi.org/10.1007/s10489-022-03287-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03287-9

Keywords

Navigation