Leveraging Two-Scale Features to Enhance Fine-Grained Object Retrieval

Jing, Yingjie; Gui, Shenglin

doi:10.1007/978-981-99-8126-7_16

Yingjie Jing¹⁰ &
Shenglin Gui¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1961))

Included in the following conference series:

International Conference on Neural Information Processing

419 Accesses

Abstract

Constructing a discriminative embedding for an image based on the features extracted by a convolutional neural network (CNN) has become a common solution for fine-grained object retrieval (FGOR). However, existing methods construct the embedding based solely on features extracted by the last layer of CNN, neglecting the potential benefits of leveraging features from other layers. Based on the fact that features extracted by different layers of CNN represent different abstraction and semantic information on those levels, we believe that leveraging features from multiple layers of CNN can construct a more discriminative embedding. Upon this, we propose a simple yet efficient end-to-end model named TSF-Enhance, which leverages two-scale features extracted by the CNN to construct the discriminative embedding. Specifically, we extract features from the third and fourth layers of Resnet50 and construct an embedding based on features from these two layers respectively. When testing, we concatenate these two embeddings to get a more discriminative embedding for retrieval. Additionally, we design a Feature Enhancement Module (FEM) that consists of several common operations, such as layer normalization, to process the features. Finally, we achieve competitive results on three FGOR datasets, specifically exceeding the current state-of-the-art performance on the most challenging dataset CUB200. Furthermore, our model also demonstrates strong scalability compared to localization-based methods, achieving the best performance on two general-purpose image retrieval datasets. The source code is available at https://github.com/jingyj203/TSF-Enhance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Boudiaf, M., et al.: A unifying mutual information view of metric learning: cross-entropy vs. pairwise losses. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12351, pp. 548–564. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58539-6_33
Chapter Google Scholar
Branson, S., Van Horn, G., Belongie, S., Perona, P.: Bird species categorization using pose normalized deep convolutional nets. arXiv preprint arXiv:1406.2952 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Jun, H., Ko, B., Kim, Y., Kim, I., Kim, J.: Combination of multiple global descriptors for image retrieval. arXiv preprint arXiv:1903.10663 (2019)
Kim, S., Kim, D., Cho, M., Kwak, S.: Embedding transfer with label relaxation for improved metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3967–3976 (2021)
Google Scholar
Kim, Y., Park, W.: Multi-level distance regularization for deep metric learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 1827–1835 (2021)
Google Scholar
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Cvision Workshops, pp. 554–561 (2013)
Google Scholar
Li, P., Li, Y., Xie, H., Zhang, L.: Neighborhood-adaptive structure augmented metric learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 1367–1375 (2022)
Google Scholar
Li, Z., et al.: Rethinking the optimization of average precision: only penalizing negative instances before positive ones is enough. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 1518–1526 (2022)
Google Scholar
Lim, J., Yun, S., Park, S., Choi, J.Y.: Hypergraph-induced semantic tuplet loss for deep metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 212–222 (2022)
Google Scholar
Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: DeepFashion: powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1096–1104 (2016)
Google Scholar
Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 (2013)
Moskvyak, O., Maire, F., Dayoub, F., Baktashmotlagh, M.: Keypoint-aligned embeddings for image retrieval and re-identification. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 676–685 (2021)
Google Scholar
Oh Song, H., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4004–4012 (2016)
Google Scholar
Roth, K., Vinyals, O., Akata, Z.: Integrating language guidance into vision-based deep metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16177–16189 (2022)
Google Scholar
Roth, K., Vinyals, O., Akata, Z.: Non-isotropy regularization for proxy-based deep metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7420–7430 (2022)
Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)
Article MathSciNet Google Scholar
Seidenschwarz, J.D., Elezi, I., Leal-Taixé, L.: Learning intra-batch connections for deep metric learning. In: International Conference on Machine Learning, pp. 9410–9421. PMLR (2021)
Google Scholar
Shen, C., Zhou, C., Jin, Z., Chu, W., Jiang, R., Chen, Y., Hua, X.S.: Learning feature embedding with strong neural activations for fine-grained retrieval. In: Proceedings of the on Thematic Workshops of ACM Multimedia 2017, pp. 424–432 (2017)
Google Scholar
Shen, Y., Sun, X., Wei, X.S., Jiang, Q.Y., Yang, J.: SEMICON: a learning-to-hash solution for large-scale fine-grained image retrieval. In: Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XIV, pp. 531–548. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19781-9_31
Teh, E.W., DeVries, T., Taylor, G.W.: ProxyNCA++: revisiting and revitalizing proxy neighborhood component analysis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12369, pp. 448–464. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58586-0_27
Chapter Google Scholar
Wang, S., Wang, Z., Li, H., Ouyang, W.: Category-specific nuance exploration network for fine-grained object retrieval. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 2513–2521 (2022)
Google Scholar
Wang, X., Zhang, H., Huang, W., Scott, M.R.: Cross-batch memory for embedding learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6388–6397 (2020)
Google Scholar
Wei, X.S., Luo, J.H., Wu, J., Zhou, Z.H.: Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Trans. Image Process. 26(6), 2868–2881 (2017)
Article MathSciNet MATH Google Scholar
Xu, F., Wang, M., Zhang, W., Cheng, Y., Chu, W.: Discrimination-aware mechanism for fine-grained representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 813–822 (2021)
Google Scholar
Yang, Z., Bastan, M., Zhu, X., Gray, D., Samaras, D.: Hierarchical proxy-based loss for deep metric learning. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1859–1868 (2022)
Google Scholar
Zeng, X., Liu, S., Wang, X., Zhang, Y., Chen, K., Li, D.: Hard decorrelated centralized loss for fine-grained image retrieval. Neurocomputing 453, 26–37 (2021)
Article Google Scholar
Zhai, A., Wu, H.Y.: Classification is a strong baseline for deep metric learning. arXiv preprint arXiv:1811.12649 (2018)
Zheng, X., Ji, R., Sun, X., Wu, Y., Huang, F., Yang, Y.: Centralized ranking loss with weakly supervised localization for fine-grained object retrieval. In: IJCAI, pp. 1226–1233 (2018)
Google Scholar

Download references

Acknowledgements

This work is supported by the Sichuan Provincial Social Science Programs Project under Grants SC22EZD065 and the Fundamental Research Funds for the Central Universities under Grants XGBDFZ04 and ZYGX2019F005. Besides, our heartfelt thanks go to the anonymous reviewers for their valuable recommendations and thoughtful feedback.

Author information

Authors and Affiliations

University of Electronic Science and Technology of China, Chengdu, 611731, China
Yingjie Jing & Shenglin Gui

Authors

Yingjie Jing
View author publications
You can also search for this author in PubMed Google Scholar
Shenglin Gui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shenglin Gui .

Editor information

Editors and Affiliations

School of Automation, Central South University, Changsha, China
Biao Luo
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Long Cheng
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China
Zheng-Guang Wu
School of Automation, Guangdong University of Technology, Guangzhou, China
Hongyi Li
School of Electrical Engineering and Telecommunications, UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jing, Y., Gui, S. (2024). Leveraging Two-Scale Features to Enhance Fine-Grained Object Retrieval. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1961. Springer, Singapore. https://doi.org/10.1007/978-981-99-8126-7_16

Download citation

DOI: https://doi.org/10.1007/978-981-99-8126-7_16
Published: 13 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8125-0
Online ISBN: 978-981-99-8126-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Leveraging Two-Scale Features to Enhance Fine-Grained Object Retrieval