Skip to main content

Leveraging Two-Scale Features to Enhance Fine-Grained Object Retrieval

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1961))

Included in the following conference series:

  • 419 Accesses

Abstract

Constructing a discriminative embedding for an image based on the features extracted by a convolutional neural network (CNN) has become a common solution for fine-grained object retrieval (FGOR). However, existing methods construct the embedding based solely on features extracted by the last layer of CNN, neglecting the potential benefits of leveraging features from other layers. Based on the fact that features extracted by different layers of CNN represent different abstraction and semantic information on those levels, we believe that leveraging features from multiple layers of CNN can construct a more discriminative embedding. Upon this, we propose a simple yet efficient end-to-end model named TSF-Enhance, which leverages two-scale features extracted by the CNN to construct the discriminative embedding. Specifically, we extract features from the third and fourth layers of Resnet50 and construct an embedding based on features from these two layers respectively. When testing, we concatenate these two embeddings to get a more discriminative embedding for retrieval. Additionally, we design a Feature Enhancement Module (FEM) that consists of several common operations, such as layer normalization, to process the features. Finally, we achieve competitive results on three FGOR datasets, specifically exceeding the current state-of-the-art performance on the most challenging dataset CUB200. Furthermore, our model also demonstrates strong scalability compared to localization-based methods, achieving the best performance on two general-purpose image retrieval datasets. The source code is available at https://github.com/jingyj203/TSF-Enhance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Boudiaf, M., et al.: A unifying mutual information view of metric learning: cross-entropy vs. pairwise losses. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12351, pp. 548–564. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58539-6_33

    Chapter  Google Scholar 

  2. Branson, S., Van Horn, G., Belongie, S., Perona, P.: Bird species categorization using pose normalized deep convolutional nets. arXiv preprint arXiv:1406.2952 (2014)

  3. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  4. Jun, H., Ko, B., Kim, Y., Kim, I., Kim, J.: Combination of multiple global descriptors for image retrieval. arXiv preprint arXiv:1903.10663 (2019)

  5. Kim, S., Kim, D., Cho, M., Kwak, S.: Embedding transfer with label relaxation for improved metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3967–3976 (2021)

    Google Scholar 

  6. Kim, Y., Park, W.: Multi-level distance regularization for deep metric learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 1827–1835 (2021)

    Google Scholar 

  7. Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Cvision Workshops, pp. 554–561 (2013)

    Google Scholar 

  8. Li, P., Li, Y., Xie, H., Zhang, L.: Neighborhood-adaptive structure augmented metric learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 1367–1375 (2022)

    Google Scholar 

  9. Li, Z., et al.: Rethinking the optimization of average precision: only penalizing negative instances before positive ones is enough. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 1518–1526 (2022)

    Google Scholar 

  10. Lim, J., Yun, S., Park, S., Choi, J.Y.: Hypergraph-induced semantic tuplet loss for deep metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 212–222 (2022)

    Google Scholar 

  11. Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: DeepFashion: powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1096–1104 (2016)

    Google Scholar 

  12. Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 (2013)

  13. Moskvyak, O., Maire, F., Dayoub, F., Baktashmotlagh, M.: Keypoint-aligned embeddings for image retrieval and re-identification. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 676–685 (2021)

    Google Scholar 

  14. Oh Song, H., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4004–4012 (2016)

    Google Scholar 

  15. Roth, K., Vinyals, O., Akata, Z.: Integrating language guidance into vision-based deep metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16177–16189 (2022)

    Google Scholar 

  16. Roth, K., Vinyals, O., Akata, Z.: Non-isotropy regularization for proxy-based deep metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7420–7430 (2022)

    Google Scholar 

  17. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  18. Seidenschwarz, J.D., Elezi, I., Leal-Taixé, L.: Learning intra-batch connections for deep metric learning. In: International Conference on Machine Learning, pp. 9410–9421. PMLR (2021)

    Google Scholar 

  19. Shen, C., Zhou, C., Jin, Z., Chu, W., Jiang, R., Chen, Y., Hua, X.S.: Learning feature embedding with strong neural activations for fine-grained retrieval. In: Proceedings of the on Thematic Workshops of ACM Multimedia 2017, pp. 424–432 (2017)

    Google Scholar 

  20. Shen, Y., Sun, X., Wei, X.S., Jiang, Q.Y., Yang, J.: SEMICON: a learning-to-hash solution for large-scale fine-grained image retrieval. In: Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XIV, pp. 531–548. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19781-9_31

  21. Teh, E.W., DeVries, T., Taylor, G.W.: ProxyNCA++: revisiting and revitalizing proxy neighborhood component analysis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12369, pp. 448–464. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58586-0_27

    Chapter  Google Scholar 

  22. Wang, S., Wang, Z., Li, H., Ouyang, W.: Category-specific nuance exploration network for fine-grained object retrieval. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 2513–2521 (2022)

    Google Scholar 

  23. Wang, X., Zhang, H., Huang, W., Scott, M.R.: Cross-batch memory for embedding learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6388–6397 (2020)

    Google Scholar 

  24. Wei, X.S., Luo, J.H., Wu, J., Zhou, Z.H.: Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Trans. Image Process. 26(6), 2868–2881 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  25. Xu, F., Wang, M., Zhang, W., Cheng, Y., Chu, W.: Discrimination-aware mechanism for fine-grained representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 813–822 (2021)

    Google Scholar 

  26. Yang, Z., Bastan, M., Zhu, X., Gray, D., Samaras, D.: Hierarchical proxy-based loss for deep metric learning. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1859–1868 (2022)

    Google Scholar 

  27. Zeng, X., Liu, S., Wang, X., Zhang, Y., Chen, K., Li, D.: Hard decorrelated centralized loss for fine-grained image retrieval. Neurocomputing 453, 26–37 (2021)

    Article  Google Scholar 

  28. Zhai, A., Wu, H.Y.: Classification is a strong baseline for deep metric learning. arXiv preprint arXiv:1811.12649 (2018)

  29. Zheng, X., Ji, R., Sun, X., Wu, Y., Huang, F., Yang, Y.: Centralized ranking loss with weakly supervised localization for fine-grained object retrieval. In: IJCAI, pp. 1226–1233 (2018)

    Google Scholar 

Download references

Acknowledgements

This work is supported by the Sichuan Provincial Social Science Programs Project under Grants SC22EZD065 and the Fundamental Research Funds for the Central Universities under Grants XGBDFZ04 and ZYGX2019F005. Besides, our heartfelt thanks go to the anonymous reviewers for their valuable recommendations and thoughtful feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shenglin Gui .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jing, Y., Gui, S. (2024). Leveraging Two-Scale Features to Enhance Fine-Grained Object Retrieval. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1961. Springer, Singapore. https://doi.org/10.1007/978-981-99-8126-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8126-7_16

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8125-0

  • Online ISBN: 978-981-99-8126-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics