Abstract
Constructing a discriminative embedding for an image based on the features extracted by a convolutional neural network (CNN) has become a common solution for fine-grained object retrieval (FGOR). However, existing methods construct the embedding based solely on features extracted by the last layer of CNN, neglecting the potential benefits of leveraging features from other layers. Based on the fact that features extracted by different layers of CNN represent different abstraction and semantic information on those levels, we believe that leveraging features from multiple layers of CNN can construct a more discriminative embedding. Upon this, we propose a simple yet efficient end-to-end model named TSF-Enhance, which leverages two-scale features extracted by the CNN to construct the discriminative embedding. Specifically, we extract features from the third and fourth layers of Resnet50 and construct an embedding based on features from these two layers respectively. When testing, we concatenate these two embeddings to get a more discriminative embedding for retrieval. Additionally, we design a Feature Enhancement Module (FEM) that consists of several common operations, such as layer normalization, to process the features. Finally, we achieve competitive results on three FGOR datasets, specifically exceeding the current state-of-the-art performance on the most challenging dataset CUB200. Furthermore, our model also demonstrates strong scalability compared to localization-based methods, achieving the best performance on two general-purpose image retrieval datasets. The source code is available at https://github.com/jingyj203/TSF-Enhance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Boudiaf, M., et al.: A unifying mutual information view of metric learning: cross-entropy vs. pairwise losses. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12351, pp. 548–564. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58539-6_33
Branson, S., Van Horn, G., Belongie, S., Perona, P.: Bird species categorization using pose normalized deep convolutional nets. arXiv preprint arXiv:1406.2952 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Jun, H., Ko, B., Kim, Y., Kim, I., Kim, J.: Combination of multiple global descriptors for image retrieval. arXiv preprint arXiv:1903.10663 (2019)
Kim, S., Kim, D., Cho, M., Kwak, S.: Embedding transfer with label relaxation for improved metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3967–3976 (2021)
Kim, Y., Park, W.: Multi-level distance regularization for deep metric learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 1827–1835 (2021)
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Cvision Workshops, pp. 554–561 (2013)
Li, P., Li, Y., Xie, H., Zhang, L.: Neighborhood-adaptive structure augmented metric learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 1367–1375 (2022)
Li, Z., et al.: Rethinking the optimization of average precision: only penalizing negative instances before positive ones is enough. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 1518–1526 (2022)
Lim, J., Yun, S., Park, S., Choi, J.Y.: Hypergraph-induced semantic tuplet loss for deep metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 212–222 (2022)
Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: DeepFashion: powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1096–1104 (2016)
Maji, S., Rahtu, E., Kannala, J., Blaschko, M., Vedaldi, A.: Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151 (2013)
Moskvyak, O., Maire, F., Dayoub, F., Baktashmotlagh, M.: Keypoint-aligned embeddings for image retrieval and re-identification. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 676–685 (2021)
Oh Song, H., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4004–4012 (2016)
Roth, K., Vinyals, O., Akata, Z.: Integrating language guidance into vision-based deep metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16177–16189 (2022)
Roth, K., Vinyals, O., Akata, Z.: Non-isotropy regularization for proxy-based deep metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7420–7430 (2022)
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)
Seidenschwarz, J.D., Elezi, I., Leal-Taixé, L.: Learning intra-batch connections for deep metric learning. In: International Conference on Machine Learning, pp. 9410–9421. PMLR (2021)
Shen, C., Zhou, C., Jin, Z., Chu, W., Jiang, R., Chen, Y., Hua, X.S.: Learning feature embedding with strong neural activations for fine-grained retrieval. In: Proceedings of the on Thematic Workshops of ACM Multimedia 2017, pp. 424–432 (2017)
Shen, Y., Sun, X., Wei, X.S., Jiang, Q.Y., Yang, J.: SEMICON: a learning-to-hash solution for large-scale fine-grained image retrieval. In: Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XIV, pp. 531–548. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19781-9_31
Teh, E.W., DeVries, T., Taylor, G.W.: ProxyNCA++: revisiting and revitalizing proxy neighborhood component analysis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12369, pp. 448–464. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58586-0_27
Wang, S., Wang, Z., Li, H., Ouyang, W.: Category-specific nuance exploration network for fine-grained object retrieval. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 2513–2521 (2022)
Wang, X., Zhang, H., Huang, W., Scott, M.R.: Cross-batch memory for embedding learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6388–6397 (2020)
Wei, X.S., Luo, J.H., Wu, J., Zhou, Z.H.: Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Trans. Image Process. 26(6), 2868–2881 (2017)
Xu, F., Wang, M., Zhang, W., Cheng, Y., Chu, W.: Discrimination-aware mechanism for fine-grained representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 813–822 (2021)
Yang, Z., Bastan, M., Zhu, X., Gray, D., Samaras, D.: Hierarchical proxy-based loss for deep metric learning. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1859–1868 (2022)
Zeng, X., Liu, S., Wang, X., Zhang, Y., Chen, K., Li, D.: Hard decorrelated centralized loss for fine-grained image retrieval. Neurocomputing 453, 26–37 (2021)
Zhai, A., Wu, H.Y.: Classification is a strong baseline for deep metric learning. arXiv preprint arXiv:1811.12649 (2018)
Zheng, X., Ji, R., Sun, X., Wu, Y., Huang, F., Yang, Y.: Centralized ranking loss with weakly supervised localization for fine-grained object retrieval. In: IJCAI, pp. 1226–1233 (2018)
Acknowledgements
This work is supported by the Sichuan Provincial Social Science Programs Project under Grants SC22EZD065 and the Fundamental Research Funds for the Central Universities under Grants XGBDFZ04 and ZYGX2019F005. Besides, our heartfelt thanks go to the anonymous reviewers for their valuable recommendations and thoughtful feedback.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Jing, Y., Gui, S. (2024). Leveraging Two-Scale Features to Enhance Fine-Grained Object Retrieval. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1961. Springer, Singapore. https://doi.org/10.1007/978-981-99-8126-7_16
Download citation
DOI: https://doi.org/10.1007/978-981-99-8126-7_16
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8125-0
Online ISBN: 978-981-99-8126-7
eBook Packages: Computer ScienceComputer Science (R0)