A bag of constrained informative deep visual words for image retrieval

doi:10.1016/j.patrec.2019.11.011

Pattern Recognition Letters

Volume 129, January 2020, Pages 158-165

https://doi.org/10.1016/j.patrec.2019.11.011 Get rights and content

Highlights

•
A new model for image retrieval
•
Combination of deep features, information theory and constrained clustering
•
Unsupervised clustering constraints built from mutual information

Abstract

In this paper, we propose a bag of constrained informative deep visual words (BoCIDVW) model for image retrieval. Informative patches from each image are first obtained using patch entropy values. Each such patch is represented by deep features extracted through VGG16-Net. Two sets of constraints, namely, the must-link (ML) and the cannot-link (CL), are obtained for each deep informative patch in an unsupervised manner from its mutual information values (with other patches). The patches are then quantized using the Linear-time Constrained Vector Quantization Error (LCVQE), a fast yet accurate constrained K-means algorithm. The resulting clusters, which we term constrained informative deep visual words, are employed to label each patch. Finally, a bag (histogram) of constrained informative visual words is developed for image retrieval. Experiments on three different publicly available datasets demonstrate the merit of the proposed formulation.

Introduction

Image retrieval [7] is a well studied problem in the pattern recognition community. Bag of Visual Words (BoVW) and its variants are effectively being used for image retrieval for quite some time [28]. In the basic BoVW model, the constituent patches in an image are first represented by hand-crafted features, like SURF [1] or SIFT [15]. These patches are then quantized in the feature space by the K-means algorithm [13]. Finally, each image patch is marked with the label of the nearest cluster (visual words) and an image is represented by a bag (histogram) of visual words. Some works are reported on improved BoVW models in connection with image retrieval. For example, Dimitrovski et al. have applied BoVW model with predictive clustering tree for improving image retrieval [10]. On the other hand, deep learning based approaches have become increasingly popular for solving the retrieval problem as well [12], [26].

In this paper, we propose a new patch based model for image retrieval using deep features, information theoretic measures and constrained clustering. All patches extracted from an image do not contain significant information. Essentially, the patches from more homogeneous regions carry less information. So, entropy is used to select informative patches, i.e., from object regions with higher entropy values. These patches are then represented by deep features through VGG16-Net. To develop the constraints for the patches in a supervised manner, one needs to store all the patch labels. This becomes a cumbersome process. In this paper, mutual information is employed to develop these clustering constraints in an unsupervised manner. Linear-time Constrained Vector Quantization Error (LCVQE) [23], a fast yet accurate constrained K-means algorithm is used to quantize the informative image patches. We term the resulting image representation model as Bag of Constrained Informative Deep Visual Words (BoCIDVW). For a preliminary version of this work, please see [19], where we developed a Bag of Constrained Visual Words (BoCVW) model. There are several key differences between the previous conference version and the current journal version. In the conference version, we used SURF features. In contrast, in the journal version, we have replaced SURF features by deep features. Secondly, we have used information theory (entropy, mutual information) at different stages of the solution. Furthermore, we have now added an algorithm, time-complexity analysis and more theoretical expositions. Finally, exhaustive comparisons with several existing and more recent approaches are included. Now, we summarize the contributions of this work:

•
We use mean and standard deviation of entropy values of the patches appearing in an image to filter out the more informative patches. In a given image, we initially calculate entropy values of all the constituent patches and compute their mean and standard deviation. We filter out the informative patches, i.e., patches whose entropy values are greater than or equal to the mean entropy value plus standard deviation of the entropy values. These patches are then represented using deep features through the VGG16 network. So, we build deep informative patches.
•
Mutual information is employed to develop two sets of clustering constraints (must-link and cannot-link) in an unsupervised manner. Linear-time Constrained Vector Quantization Error (LCVQE), a constrained K-means clustering algorithm is then applied to quantize the deep informative patches. Finally, a Bag of Constrained Informative Deep Visual Words (BoCIDVW) is developed for the purpose of image retrieval. This proposed model, a combination of deep features, information theoretic measures, and constrained clustering, is shown to yield very competitive results on the publicly available Coil-100, Oxford5K and Paris-6K datasets in terms of mean average precision (mAP) values.

The rest of the paper is organized as follows: in Section 2, we discuss the related work. In Section 3, we describe our method. In Section 4, we present the experimental results. The paper is concluded in Section 5 with directions of future research.

Section snippets

Related work

Different classes of solutions exist for the image retrieval problem [7]. We first discuss existing BoVW models for image retrieval. Then, we present some important deep learning based approaches in image retrieval.

For initial research on BoVW based solutions, please see [28]. The authors in [3] improved the retrieval results by introducing a fuzzy visual word assignment model. Mukherjee et al. [17] designed an assignment model based on an affinity function between a patch and a cluster which

Proposed method

In this section, we describe in details different components of the proposed BoCIDVW model.

Experimental results

We have evaluated our proposed Image Retrievel framework on three benchmark datasets, namely, Coil 100 [20], Oxford-5K [24] and Paris6k [25]. All experiments are carried out in MATLAB R2018a environment on a desktop PC with Intel Xeon(R) CPU E5-2690 v4 @ 2.60GHz, 16 Core and 128GB of DDR2-memory with NVIDIA Quadro K2200 GPU.

Conclusion

In this work, we have proposed a new model of image retrieval based on deep features, information theory and constrained clustering. Entropy is used to filter out more informative patches. These more informative patches are then represented using deep features through a pre-trained VGG16-Net. Two types of clustering constraints, namely, must-link and cannot-link are captured in an unsupervised manner using mutual information. LCVQE algorithm is used to cluster the deep informative patches. The

Declaration of Competing Interest

We hereby declare that we do not have any conflict of interest for this manuscript.

References (33)

H. Bay et al.
Speeded-up robust features (surf)
Comput. Vision Image Underst.
(2008)
D. Datta et al.
Multimodal retrieval using mutual information based textual query reformulation
Expert Syst. Appl.
(2017)
I. Dimitrovski et al.
Improving bag-of-visual-words image retrieval with predictive clustering trees
Inf. Sci.
(2016)
J. Sun et al.
Image retrieval based on color distribution entropy
Pattern Recognit. Lett.
(2006)
Y. Wang et al.
Auto-encoder based dimensionality reduction
Neurocomputing
(2016)
M. Bilenko et al.
Integrating constraints and metric learning in semi-supervised clustering
Proceedings of the twenty-first international conference on Machine learning
(2004)
W. Bouachir et al.
Improving bag of visual words image retrieval: a fuzzy weighting scheme for efficient indexation
Signal-Image Technology & Internet-Based Systems (SITIS), 2009 Fifth International Conference on
(2009)
T. Caliński et al.
A dendrite method for cluster analysis
Commun. Stat.-Theory Methods
(1974)
T.F. Covões et al.
A study of k-means-based algorithms for constrained clustering
Intell. Data Anal.
(2013)
R. Datta et al.
Image retrieval: ideas, influences, and trends of the new age
ACM Comput. Surv. (Csur)
(2008)

I. Davidson et al.

Clustering with constraints: feasibility issues and the k-means algorithm

Proceedings of the 2005 SIAM international conference on data mining

(2005)

D.L. Davies et al.

A cluster separation measure

IEEE Trans. Pattern Anal. Mach. Intell.

(1979)

H. Fu et al.

Fast semantic image retrieval based on random forest

Proceedings of the 20th ACM international conference on Multimedia

(2012)

A. Gordo et al.

Deep image retrieval: learning global representations for image search

ECCV

(2016)

J.A. Hartigan et al.

Algorithm as 136: a k-means clustering algorithm

J. R. Stat. Soc. Ser. C

(1979)

X. Li et al.

Pairwise geometric matching for large-scale object retrieval

CVPR

(2015)

Cited by (12)

Encoding hieroglyph segments to represent hieroglyphs following the bag of visual word model for retrieval
2022, Expert Systems with Applications
The representation of hieroglyphs in retrieval systems represents a great challenge since these systems’ results highly depend on the used representation. In the literature, the most successful works in this area compute local descriptors from a set of points of interest taken from the entire hieroglyph and use them under the BoVW model to represent hieroglyphs. Unlike these works, this paper presents a way to extract segments from hieroglyphs and, by encoding the extracted segments through local descriptors, proposes to use them under the Bag of Visual Words (BoVW) model to represent hieroglyphs. Our experiments show that the proposed representation allows us to obtain retrieval results that overcome those reported by using state of the art representations.
Image retrieval for Structure-from-Motion via Graph Convolutional Network
2021, Information Sciences
Citation Excerpt :
Then, Term Frequency Inverse Document Frequency (TF-IDF) is utilized to efficiently score the similarity of images with inverted files. The research community has reached a higher level of maturity by improving the quantization procedure [10,11,22,25,45], adopting compact representations [15,16,26,35], incorporating geometric cues [7,15,25], applying query expansion [6,38], and conducting image re-ranks [36,40]. Although the vocabulary tree assists SfM pipelines in eliminating computational costs, substantial memory footprints are still required during both constructing and indexing processes.
Conventional image retrieval techniques for Structure-from-Motion (SfM) are limited in their ability to effectively distinguish symmetric or repetitive textured patterns and cannot guarantee an accurate generation of pairwise matches without costly redundancy. In this paper, we formulate the image retrieval task as a node binary classification problem with graph data: if a candidate node is marked as positive, it is believed to share the same scene with the query image. The key idea of our approach is that the local context in the feature space around a query image contains abundant information about the matchable relation between the image and its neighbours. By constructing a subgraph surrounding the query image as input data, we adopt a learnable Graph Convolutional Network (GCN) to determine whether nodes in the subgraph have overlapping regions with the query photograph. Experiments demonstrate that our method performs remarkably well on a challenging dataset of highly ambiguous and duplicated scenes. Furthermore, compared with state-of-the-art matchable retrieval methods, the proposed approach significantly reduces unnecessary attempted matches without sacrificing the accuracy and completeness of reconstruction.
Deep Learning for Instance Retrieval: A Survey
2023, IEEE Transactions on Pattern Analysis and Machine Intelligence
The Effect of Variance-Based Patch Selection on No-Reference Image Quality Assessment
2023, Proceedings of 2023 6th International Conference on Pattern Recognition and Image Analysis, IPRIA 2023
PPIS-JOIN: A Novel Privacy-Preserving Image Similarity Join Method
2022, Neural Processing Letters
Hybrid intelligent framework for automated medical learning
2022, Expert Systems

View all citing articles on Scopus

View full text

A bag of constrained informative deep visual words for image retrieval

Highlights

Abstract

Introduction

Section snippets

Related work

Proposed method

Experimental results

Conclusion

Declaration of Competing Interest

Comput. Vision Image Underst.

Expert Syst. Appl.

Inf. Sci.

Pattern Recognit. Lett.

Neurocomputing

Integrating constraints and metric learning in semi-supervised clustering

Proceedings of the twenty-first international conference on Machine learning

Improving bag of visual words image retrieval: a fuzzy weighting scheme for efficient indexation

Signal-Image Technology & Internet-Based Systems (SITIS), 2009 Fifth International Conference on

A dendrite method for cluster analysis

Commun. Stat.-Theory Methods

A study of k-means-based algorithms for constrained clustering

Intell. Data Anal.

Image retrieval: ideas, influences, and trends of the new age

ACM Comput. Surv. (Csur)