skip to main content
10.1145/3664524.3675365acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Deep Fisher-Vector Descriptors for Image Retrieval and Scene Recognition

Published: 28 August 2024 Publication History

Abstract

This study presents a novel architecture that significantly enhances the capabilities of large-scale image retrieval and recognition systems. We introduce a novel multi-stream Fisher vector network that integrates a convolutional neural network (CNN) with a Fisher Vector (FV) framework to optimize feature extraction and aggregation. The CNN component generates dense, deep convolutional descriptors, which are subsequently aggregated by the Fisher Vector method to enhance recognition accuracy. Importantly, the CNN and Fisher Vector model parameters are learnt simultaneously in an end-to-end manner. This allows us to account for the evolving distribution of deep descriptors over the course of the learning process. This integrated learning strategy results in a robust model that achieves excellent performance in both image retrieval and recognition tasks, as demonstrated on standard datasets.

References

[1]
R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic. 2018. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 6 (June 2018), 1437–1451.
[2]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248–255.
[3]
Albert Gordo, Jon Almazán, Jerome Revaud, and Diane Larlus. 2017. End-to-End Learning of Deep Visual Representations for Image Retrieval. International Journal of Computer Vision 124, 2 (Sep 2017).
[4]
Syed Sameed Husain, Eng-Jon Ong, and Miroslaw Bober. 2019. ACTNET: End-to-End Learning of Feature Activations and Multi-stream Aggregation for Effective Instance Image Retrieval. International Journal of Computer Vision 129 (2019), 1432 – 1450.
[5]
Seongwon Lee, Suhyeon Lee, Hongje Seong, and Euntai Kim. 2023. Revisiting Self-Similarity: Structural Embedding for Image Retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 23412–23421.
[6]
Florent Perronnin, Yan Liu, Jorge Sanchez, and Herve Poirier. 2010. Large-scale image retrieval with compressed Fisher vectors. In IEEE Conference on Computer Vision and Pattern Recognition. 3384–3391.
[7]
F. Radenovic, A. Iscen, G. Tolias, Y. Avrithis, and O. Chum. 2018. Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking. In IEEE Conference on Computer Vision and Pattern Recognition. 5706–5715.
[8]
F. Radenovic, G. Tolias, and O. Chum. 2018. Fine-tuning CNN Image Retrieval with No Human Annotation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2018), 1–1.
[9]
Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). 6105–6114.
[10]
Marvin Teichmann, Andre Araujo, Menglong Zhu, and Jack Sim. 2018. Detect-to-Retrieve: Efficient Regional Aggregation for Image Search. CoRR (2018).
[11]
X. Wu, G. Irie, K. Hiramatsu, and K. Kashino. 2018. Weighted Generalized Mean Pooling for Deep Image Retrieval. In IEEE International Conference on Image Processing. 495–499.
[12]
Jian Xu, Chunheng Wang, Cunzhao Shi, and Baihua Xiao. 2018. Weakly Supervised Soft-detection-based Aggregation Method for Image Retrieval. CoRR (2018).
[13]
A. B. Yandex and V. Lempitsky. 2015. Aggregating Local Deep Features for Image Retrieval. In IEEE International Conference on Computer Vision. 1269–1277.
[14]
M. Yang, D. He, M. Fan, B. Shi, X. Xue, F. Li, E. Ding, and J. Huang. 2021. DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 11752–11761.

Index Terms

  1. Deep Fisher-Vector Descriptors for Image Retrieval and Scene Recognition

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MVRMLM '24: Proceedings of 2024 ACM ICMR Workshop on Multimodal Video Retrieval
    June 2024
    56 pages
    ISBN:9798400706844
    DOI:10.1145/3664524
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 August 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Deep Learning
    2. Descriptor Aggregation
    3. Fisher Vector
    4. Global Descriptors
    5. Image Retrieval
    6. Scene Recognition
    7. Visual Search

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    ICMR '24
    Sponsor:

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 62
      Total Downloads
    • Downloads (Last 12 months)62
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media