An efficient indexing method for content-based image retrieval

doi:10.1016/j.neucom.2012.10.021

Neurocomputing

Volume 106, 15 April 2013, Pages 103-114

https://doi.org/10.1016/j.neucom.2012.10.021 Get rights and content

Abstract

In this paper, we propose an efficient indexing method for content-based image retrieval. The proposed method introduces the ordered quantization to increase the distinction among the quantized feature descriptors. Thus, the feature point correspondences can be determined by the quantized feature descriptors, and they are used to measure the similarity between query image and database image. To implement the above scheme efficiently, a multi-dimensional inverted index is proposed to compute the number of feature point correspondences, and then approximate RANSAC is investigated to estimate the spatial correspondences of feature points between query image and candidate images returned from the multi-dimensional inverted index. The experimental results demonstrate that our indexing method improves the retrieval efficiency while ensuring the retrieval accuracy in the content-based image retrieval.

Highlights

► An efficient indexing method is proposed for content-based image retrieval. ► The indexing method employs feature point correspondences to retrieve images. ► The ordered quantization determines the feature point correspondences. ► Multi-dimensional inverted index counts the number of feature point correspondences. ► Approximate RANSAC estimates the spatial correspondences of feature points.

Introduction

Content-based image retrieval has attracted increasing interests in recent years. Given a query image, the image retrieval system obtains the images of the same object or scene from an image database. Due to large collections of images in the database, efficiency is an important factor for content-based image retrieval. Therefore, developing an efficient indexing method for content-based image retrieval is of great significance.

Recently, bag-of-visual-words model [25] is widely used in the indexing methods. In the bag-of-visual-words model, feature points are detected in the images and feature descriptors are computed to describe the feature points. Then a sample group of feature descriptors are clustered into a vocabulary of visual words. The cluster center is defined as a visual word, and each feature descriptor is assigned to the nearest visual word. After that, an image is viewed as a bag of visual words and represented by a frequency histogram of visual words. The similarity between query image and database image is computed by the normalized visual word histograms, which are defined as image vectors. To speed up the on-line query process, the database image vectors are generated in the off-line process.

Based on bag-of-visual-words model, many state-of-the-art indexing methods have been proposed to improve retrieval performance. In Ref. [23], Philbin et al. proposed an indexing method that employs Visual Word quantization and Traditional Inverted index (VWTI). In the visual word quantization, approximate k-means is used to cluster the feature descriptors of database images, and each feature descriptor is assigned to the nearest visual word. TF-IDF weighting [32] is then employed to compute the visual word frequencies and generate the image vectors. According to the sparsity of image vectors, the traditional inverted index is constructed to index the database images. When the query image vector is searched in the inverted index, the cosine similarity between query image vector and database image vector is computed, and the database images with the greatest similarity are returned as the retrieved results. In Ref. [24], Philbin et al. further proposed an indexing method that employs Soft quantization and Traditional Inverted index (STI). In the soft quantization, each feature descriptor is mapped to a weighted combination of visual words, and the weight value is computed by the Gaussian distance between feature descriptor and visual word. TF-IDF weighting is also employed to generate the image vectors, but the visual word frequency is replaced by the multiple normalized weight values for each visual word. To retrieve the most similar database images, the traditional inverted index is constructed by the database image vectors, and the cosine similarity between query image vector and database image vector is computed in the inverted index.

Although the above two indexing methods are simple and effective, they have some limitations for content-based image retrieval. In the VWTI method, the quantized feature descriptors which are represented by the same visual word cannot distinguish from each other, thus the distinction among the quantized feature descriptors is decreased. Moreover, the computation of visual word frequency adds the computational cost to the retrieval process. Because the cosine similarity between the image vectors is computed in the inverted index, it also increases the computational burden and reduces the retrieval efficiency. In the STI method, the quantized feature descriptors which are represented by a weighted combination of visual words decrease the sparsity of image vectors, and the computation of Gaussian distance adds the computational cost to the retrieval process. Because the computational efficiency of inverted index depends on the sparsity of image vectors, the decreased sparsity of image vectors further increases the computational cost in the inverted index and reduces the retrieval efficiency.

In this paper, we propose an indexing method that employs feature point correspondences to improve retrieval efficiency. Once the feature point correspondences are determined, there is no need to generate the image vectors and compute the cosine similarity between the image vectors. For this purpose, we introduce the ordered quantization which assigns the feature descriptors to the ordered visual words. The ordered visual words distinguish the quantized feature descriptors that originally belong to the same visual word, thereby increasing the distinction among the quantized feature descriptors. Thus, if the quantized feature descriptors are represented by the same ordered visual words, the feature point correspondences can be generated to compute the similarity between query image and database image. To compute the image similarity efficiently, a multi-dimensional inverted index is constructed to organize the database images and count the number of feature point correspondences. Afterwards, approximate RANSAC is investigated to estimate the spatial correspondences of feature points between query image and candidate images returned from the multi-dimensional inverted index. The final retrieved images are the candidate images with the greatest spatial similarity.

The remaining of this paper is organized as follows. Section 2 reviews the related work in the indexing methods. Section 3 details our indexing method. Section 4 presents experiments performed to evaluate the indexing method. Section 5 summaries this paper.

Section snippets

Related work

With the development of text retrieval and local invariant features, the indexing method Video Google [25] introduces bag-of-visual-words model into content-based image retrieval. Based on the framework of Video Google, numerous indexing methods have been proposed for content-based image retrieval.

There are many indexing methods that employ inverted index in the content-based image retrieval. In Ref. [22], hierarchical k-means is proposed to quantize the feature descriptors, and TF-IDF

Motivations

Our indexing method is inspired by image matching approach [17]. According to the known feature point correspondences, we can easily compute the similarity between query image and database image, and quickly retrieve the most similar database images. However, due to a large number of images in the database, it is not feasible to match the images by directly utilizing the feature descriptors. To solve this problem, the feature point correspondences are generated by the quantized feature

Datasets

In the experiments, three image datasets are used to test our indexing method.

Magazine dataset [7] includes 7665 images scanned from 200 magazines. To evaluate the retrieval performance, the 300 query images are captured by different camera phones from these magazines. Because the query images are taken under various conditions, they are affected by many factors such as rotation, scale, illumination and noise contamination.

UKbench dataset [22] consists of 2550 groups of 4 images. Each group

Conclusion

We have presented an efficient indexing method that employs feature point correspondences to retrieve images. The feature point correspondences are generated by the quantized feature descriptors, which are represented by the same ordered visual words. To compute the feature point correspondences between query image and database image, the multi-dimensional inverted index is proposed to count the number of feature point correspondences, and approximate RANSAC is investigated to further estimate

Acknowledgments

The authors thank the anonymous reviewers for their valuable comments. This work was supported by the Committee of Science and Technology, Shanghai (No.11530700200) and National Natural Science Foundation, China (No.61273258 and No.61105001).

Deying Feng received B.S. degree from Shandong University of Technology and M.S. degree from Shanghai Maritime University, China, respectively in 2005 and 2008. He is now a Ph.D. student in Shanghai Jiao Tong University, China. His research interests include image retrieval and similarity search.

References (34)

C. Hong et al.
An efficient approach to content-based object retrieval in videos
Neurocomputing
(2011)
C. Wang et al.
Image retrieval using nonlinear manifold embedding
Neurocomputing
(2009)
A.Z. Broder, On the resemblance and containment of documents, in: Proceedings of Compression and Complexity of...
O. Chum, A. Mikulik, M. Perdoch, J. Matas, Total recall II: query expansion revisited, in: Proceedings of IEEE Computer...
O. Chum, J. Philbin, J. Sivic, M. Isard, A. Zisserman, Total recall: automatic query expansion with a generative...
O. Chum, J. Philbin, A. Zisserman, Near duplicate image detection: min-Hash and TF-IDF weighting, in: Proceedings of...
Y. Cao, C. Wang, Z. Li, L.Q. Zhang, L. Zhang, Spatial-bag-of-features, in: Proceedings of IEEE Computer Society...
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, ImageNet: A large-scale hierarchical image database, in:...
D. Feng, J. Yang, C. Yang, Efficient indexing for mobile image retrieval, in: Proceedings of IEEE International...
M. Fischler et al.
Random sample consensus: a paradigm for model fitting with application to image analysis and automated cartography
Communications of the ACM
(1981)

A. Gionis, P. Indyk, R. Motwani, Similarity search in high dimensions via hashing, in: Proceedings of the 25th...

N. Guan et al.

Non-negative patch alignment framework

IEEE Transactions on Neural Networks

(2011)

N. Guan et al.

Online nonnegative matrix factorization with robust stochastic approximation

IEEE Transactions on Neural Networks and Learning Systems

(2012)

N. Guan et al.

NeNMF: an optimal gradient method for non-negative matrix factorization

IEEE Transactions on Signal Processing

(2012)

X. Hou, L. Zhang, Saliency detection: a spectral residual approach, in: Proceedings of IEEE Computer Society Conference...

Y.G. Jiang, C.W. Ngo, J. Yang, Towards optimal bag-of-features for object categorization and semantic video retrieval,...

D.G. Lowe, Object recognition from local scale-invariant features, in: Proceedings of IEEE International Conference on...

Cited by (18)

Sparse Multi-Modal Topical Coding for Image Annotation
2016, Neurocomputing
Citation Excerpt :
The flourish development of the Internet technology has offered us numerous web images, which in turns makes the effective image retrieval more challenging. Approaches of image retrieval can be classified into two categories: Content Based Image Retrieval [1,2] (CBIR) and Text Based Image Retrieval (TBIR) [3,4]. CBIR approaches rely on the low-level visual features that can be directly extracted from images, such as color, shape and texture.
Image annotation plays a significant role in large scale image understanding, indexing and retrieval. The Probability Topic Models (PTMs) attempt to address this issue by learning latent representations of input samples, and have been shown to be effective by existing studies. Though useful, PTM has some limitations in interpreting the latent representations of images and texts, which if addressed would broaden its applicability. In this paper, we introduce sparsity to PTM to improve the interpretability of the inferred latent representations. Extending the Sparse Topical Coding that originally designed for unimodal documents learning, we propose a non-probabilistic formulation of PTM for automatic image annotation, namely Sparse Multi-Modal Topical Coding. Beyond controlling the sparsity, our model can capture more compact correlations between words and image regions. Empirical results on some benchmark datasets show that our model achieves better performance on automatic image annotation and text-based image retrieval over the baseline models.
Light-weight binary code embedding of local feature distribution in image search
2016, Neurocomputing
Citation Excerpt :
Content-based image search is a key technique in many real-world applications such as copy detection [1,2], logo detection [3] and visual content recognition [4].
Binary code embedding, which aims to generate compact and discriminative binary codes from local image features, can remarkably improve the image search performance by compensating the quantization error in Bag-of-Words (BoW) model. However, the relationship between local features and their neighbors are often ignored by existing embedding schemes, while such information of spatial distribution can greatly improve the discriminative ability of binary codes. Toward this end, this paper proposes two light-weight schemes for binary code embedding that take the spatial distribution of local features into account. These two schemes, including the Content Similarity Embedding (CSE) and Scale Similarity Embedding (SSE), are highly flexible in balancing the computational cost as well as the image search performance. Experimental results on several public benchmarks show that, with the proposed two embedding schemes, image search achieves comparable performance with state-of-the-arts with much lower computational cost and memory usage.
SERVE: Soft and Equalized Residual VEctors for image retrieval
2016, Neurocomputing
Citation Excerpt :
Represented as a set of orderless codewords, the BOVW model has become a paradigm for real-world object and image retrieval. In the context of large-scale retrieval tasks, visual codewords can be efficiently structured offline and indexed by inverted files [1,14]. Fast image search can be achieved by traversing the inverted file, and distinctive similarity evaluation is performed by using the TF-IDF weighting scheme [15].
In the last decade, a wide variety of image signatures, e.g., Bag-of-Visual-Words (BOVW), Fisher Vector (FV), and Vector of Locally Aggregated Descriptor (VLAD), have been developed for effective image retrieval. These image signatures, however, are either computationally expensive or simplified for the purpose of trading accuracy for efficiency. To simultaneously guarantee efficiency and effectiveness, we propose a novel image signature termed Soft and Equalized Residual VEctors (SERVE) which is more discriminatively formulated and maintains higher accuracy. It improves VLAD by encoding the variability in within-cluster feature points into the summation of Residual Vectors (RV) while manifesting superiority in computational efficiency over FV. To find the latent low-dimensional manifolds underlying in the SERVE feature space, we propose to partition the original feature space into separate subspaces by random projections and employ multi-graph embedding to obtain additional performance gain. In particular, we make use of two fusion strategies for graph ensemble to generate a holistic representation. Extensive empirical studies carried out on the three retrieval-specific public benchmarks reveal that our method outperforms existing state-of-the-art methods and provides a promising paradigm for the image retrieval task.
Fast beta wavelet network-based feature extraction for image copy detection
2016, Neurocomputing
Citation Excerpt :
The field of Multimedia Information Retrieval aims to satisfy users' queries by finding the correct results. It can be classified into two approaches: Text-based searching which uses textual information (word or composition of words), and Content Based Retrieval (CBR) which is based on the multimedia content itself [2,3]. As a part of CBR and more specifically CBIR (Content Based Image Retrieval), the field of image identification can be defined to provide the ownership of digital media and certifying whether an image has been modified from its socket.
The applications of authors' rights protection against illegal generation of images do not cease to evolve. However, most of them impose a very high computational cost especially when working with a database containing thousands of images. This paper addresses the problem of authors' rights violation and presents an original scheme, for Content-Based Image Copy Detection (CBICD), based on two screens: approximation screen and details screen. These two screens, based on Fast Beta Wavelet Transform (FBWT), aimed to filter the original images based first on their approximation and then on detail appearances respectively to display the corresponding original image to a given query one (a copy image). Extensive experiments of 8118 images from Copydays and Holidays datasets generated by INREA¹ proved the effectiveness as well as the search speed of our approach in CBICD.
Feature Extraction Based Deep Indexing by Deep Fuzzy Clustering for Image Retrieval Using Jaro Winkler Distance
2023, Computer Journal
Dual Phase CBIR Model using Hybrid Feature Extraction and Manhattan Distance Measure
2021, International Journal of Intelligent Engineering and Systems

View all citing articles on Scopus

Jie Yang received Ph.D. degree in computer science from Hamburg University, Germany, in 1994. Currently, he is a professor of the Institute of Image Processing and Pattern Recognition in Shanghai Jiao Tong University, China. His research interests include image retrieval, object detection and recognition, data mining and medical image processing.

Congxin Liu received B.S. degree from Wuhan University of Hydraulic and Electrical Engineering and M.S. degree from Three Gorges University, China, respectively in 1997 and 2004. He is now a Ph.D. student in Shanghai Jiao Tong University, China. His research interests include local invariant feature and image matching.

View full text

An efficient indexing method for content-based image retrieval

Abstract

Highlights

Introduction

Section snippets

Related work

Motivations

Datasets

Conclusion

Acknowledgments

Neurocomputing

Neurocomputing

Random sample consensus: a paradigm for model fitting with application to image analysis and automated cartography

Communications of the ACM

Non-negative patch alignment framework

IEEE Transactions on Neural Networks

Online nonnegative matrix factorization with robust stochastic approximation

IEEE Transactions on Neural Networks and Learning Systems

NeNMF: an optimal gradient method for non-negative matrix factorization

IEEE Transactions on Signal Processing