Fine-Grained Image Retrieval via Piecewise Cross Entropy loss

https://doi.org/10.1016/j.imavis.2019.10.006Get rights and content

Highlights

  • Fine-Grained Image Retrieval is an important problem in computer vision.

  • In this paper, PCE loss is proposed for Fine-Grained Image Retrieval.

  • Due to the proposed loss, our model obtains SOTA performances on two benchmarks.

Abstract

Fine-Grained Image Retrieval is an important problem in computer vision. It is more challenging than the task of content-based image retrieval because it has small diversity within the different classes but large diversity in the same class. Recently, the cross entropy loss can be utilized to make Convolutional Neural Network (CNN) generate distinguish feature for Fine-Grained Image Retrieval, and it can obtain further improvement with some extra operations, such as Normalize-Scale layer. In this paper, we propose a variant of the cross entropy loss, named Piecewise Cross Entropy loss function, for enhancing model generalization and promoting the retrieval performance. Besides, the Piecewise Cross Entropy loss is easy to implement. We evaluate the performance of the proposed scheme on two standard fine-grained retrieval benchmarks, and obtain significant improvements over the state-of-the-art, with 11.8% and 3.3% over the previous work on CARS196 and CUB-200-2011, respectively.

Introduction

Image Retrieval (IR) [[1], [2], [3], [4]] is a popular problem in computer vision, and it need to retrieve images that contain object instances of the same variety. Furthermore, Fine-Grained Image Retrieval (FGIR) is to search image through subordinate in the same visual category, like cars [5], birds [6] and products [7]. FGIR is a challenging task because the classes are similar to each other but the images of intra-class can have large difference like pose, illumination and the view point, and it attracted increasing research focus. To solve this problem, a discriminative feature is demanded to distinguish the subtle differences among fine-grained categories. A recent trend is to adopt convolutional neural network (CNN) with metric learning to extract the discriminative and generative features, which aim to distinguish high-dimensional features within/outside fine-grained categories.

However, using metric learning method, like pairwise loss and triplet loss, to train a CNN for FGIR is usually low accuracy and slow training, due to the losses are local structure and the losses with mean square error (MSE) are proved to get stuck in local optimum [8]. To this end, Centralized Ranking Loss (CRL) [9] and Decorrelated Global Centralized Ranking Loss (DGCRL) [10] are proposed, respectively. CRL is a global structure loss that uses a centralized anchor to replace the anchor of the triplet loss. And DGCRL use a fully connect layer to replace the centralized anchor, and then use cross entropy loss to train the CNN.

In this paper, we propose the Piecewise Cross Entropy loss to enhance model generalization. To further improve the performances in FGIR, we use the proposed loss to replace the cross entropy loss in DGCRL [10] and then propose the Decorrelated Global Piecewise Centralized Ranking Loss. In our experiments, we find that the proposed Piecewise Cross Entropy loss not only can enhance model generalization and the performance in FGIR, but also in FGVC. Furthermore, our methods outperform the state-of-the-arts in FGIR, and achieve 86.7 Recall@1 and 70.1 Recall@1 on CARS196 [5] and CUB-200-2011 [6] for Resnet50.

Section snippets

Fine-Grained Image Retrieval (FGIR)

Increasing research focus [[9], [10], [11], [12], [13]] has been attracted to FGIR in recently years. And there are two challenging problems in FGIR: 1) small inter-class variance; 2) large intra-class variance. Existing works in FGIR can be easily categorized into two main methods. The first one is based on classical hand-crafted features [11] and the second one utilizes deep metric learning [12,7,14] or attention module [12,9] to make CNN extract discriminative features. Recently, the

Method

As shown in Fig. 1, the proposed method contains training stage and testing stage. In the training stage, the model can be considered as a general classification model with Normalize-Scale Layer and is trained with the Decorrelated Global Piecewise Centralized Ranking Loss (DGPCRL). In testing stage, the model is a feature extractor and extracts image features for FGIR.

Datasets and evaluation protocols

We evaluate the retrieval performances on two widely-used benchmarks: CARS196 [5] and CUB-200-2011 [6]. CARS1961 contains 196 car classes with 16,185 images and CUB-200-20112 contains 200 bird classes with 11,788 images. Following the previous works [31,14,7,9,10], we employ the first 98 classes and first 100 classes for training in CARS196 and CUB-200-2011, respectively. And then we

Ablation study: the parameter γ

In our scheme, there are three important hyperparameters: the scale α, the weight λ and the threshold γ. The previous work [10] find that the performance of FGIR is stable when the hyperparameter α is higher than 16 and lesser than 128, so we set α = 100. λ is hyperparameter for the vertical constraint, and we set λ = 0.1, as same as [10]. The threshold γ is the most important hyperparameter of the proposed loss, and the classification output pyi will be shaking around it in training stage. And

Conclusion

In this paper, we propose a variant of cross entropy loss, named Piecewise Cross Entropy loss, for enhancing model generalization. The Piecewise Cross Entropy loss cannot only improve the performances in FGIR and FGVC without extra computation in testing stage, but also simply implement. Comparing to the previous works in FGIR, we obtain the best performance on CARS196 and CUB-200-2011.

Declaration of competing interest

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests:

Xianxian Zeng is the PhD student of Guangdong University of Technology.

Xiaodong Wang is the PhD student of Guangdong University of Technology.

Kairui Chen is the associate Professor of Guangzhou University.

Dong Li is the associate Professor of Guangdong University of Technology.

Weijun Yang is the associate Professor of Guangzhou City Polytechnic.

Acknowledgments

This work was supported by National Natural Science Foundation of China: 61503084, U1501251, Natural Science Foundation of Guangdong Province, China: 2016A030310348 and the Science and Technology Program of Guangzhou, China: 201804010098.

References (1)

    Cited by (37)

    • Improved fine-grained object retrieval with Hard Global Softmin Loss objective

      2022, Signal Processing: Image Communication
      Citation Excerpt :

      To eliminate the gap between the Euclidean distance and the inner product, DGCRL employed the Normalize-Scale layer and the softmax loss to address the issue of slow training in previous methods. Furthermore, [29] proposed Piecewise Cross Entropy loss to enhance the generalization of DGCRL model and achieved state-of-the-art performance in FGOR. The framework of our proposed methods is shown in Fig. 2.

    View all citing articles on Scopus

    This paper has been recommended for acceptance by S. Todorovic.

    View full text