Elsevier

Neurocomputing

Volume 106, 15 April 2013, Pages 103-114
Neurocomputing

An efficient indexing method for content-based image retrieval

https://doi.org/10.1016/j.neucom.2012.10.021Get rights and content

Abstract

In this paper, we propose an efficient indexing method for content-based image retrieval. The proposed method introduces the ordered quantization to increase the distinction among the quantized feature descriptors. Thus, the feature point correspondences can be determined by the quantized feature descriptors, and they are used to measure the similarity between query image and database image. To implement the above scheme efficiently, a multi-dimensional inverted index is proposed to compute the number of feature point correspondences, and then approximate RANSAC is investigated to estimate the spatial correspondences of feature points between query image and candidate images returned from the multi-dimensional inverted index. The experimental results demonstrate that our indexing method improves the retrieval efficiency while ensuring the retrieval accuracy in the content-based image retrieval.

Highlights

► An efficient indexing method is proposed for content-based image retrieval. ► The indexing method employs feature point correspondences to retrieve images. ► The ordered quantization determines the feature point correspondences. ► Multi-dimensional inverted index counts the number of feature point correspondences. ► Approximate RANSAC estimates the spatial correspondences of feature points.

Introduction

Content-based image retrieval has attracted increasing interests in recent years. Given a query image, the image retrieval system obtains the images of the same object or scene from an image database. Due to large collections of images in the database, efficiency is an important factor for content-based image retrieval. Therefore, developing an efficient indexing method for content-based image retrieval is of great significance.

Recently, bag-of-visual-words model [25] is widely used in the indexing methods. In the bag-of-visual-words model, feature points are detected in the images and feature descriptors are computed to describe the feature points. Then a sample group of feature descriptors are clustered into a vocabulary of visual words. The cluster center is defined as a visual word, and each feature descriptor is assigned to the nearest visual word. After that, an image is viewed as a bag of visual words and represented by a frequency histogram of visual words. The similarity between query image and database image is computed by the normalized visual word histograms, which are defined as image vectors. To speed up the on-line query process, the database image vectors are generated in the off-line process.

Based on bag-of-visual-words model, many state-of-the-art indexing methods have been proposed to improve retrieval performance. In Ref. [23], Philbin et al. proposed an indexing method that employs Visual Word quantization and Traditional Inverted index (VWTI). In the visual word quantization, approximate k-means is used to cluster the feature descriptors of database images, and each feature descriptor is assigned to the nearest visual word. TF-IDF weighting [32] is then employed to compute the visual word frequencies and generate the image vectors. According to the sparsity of image vectors, the traditional inverted index is constructed to index the database images. When the query image vector is searched in the inverted index, the cosine similarity between query image vector and database image vector is computed, and the database images with the greatest similarity are returned as the retrieved results. In Ref. [24], Philbin et al. further proposed an indexing method that employs Soft quantization and Traditional Inverted index (STI). In the soft quantization, each feature descriptor is mapped to a weighted combination of visual words, and the weight value is computed by the Gaussian distance between feature descriptor and visual word. TF-IDF weighting is also employed to generate the image vectors, but the visual word frequency is replaced by the multiple normalized weight values for each visual word. To retrieve the most similar database images, the traditional inverted index is constructed by the database image vectors, and the cosine similarity between query image vector and database image vector is computed in the inverted index.

Although the above two indexing methods are simple and effective, they have some limitations for content-based image retrieval. In the VWTI method, the quantized feature descriptors which are represented by the same visual word cannot distinguish from each other, thus the distinction among the quantized feature descriptors is decreased. Moreover, the computation of visual word frequency adds the computational cost to the retrieval process. Because the cosine similarity between the image vectors is computed in the inverted index, it also increases the computational burden and reduces the retrieval efficiency. In the STI method, the quantized feature descriptors which are represented by a weighted combination of visual words decrease the sparsity of image vectors, and the computation of Gaussian distance adds the computational cost to the retrieval process. Because the computational efficiency of inverted index depends on the sparsity of image vectors, the decreased sparsity of image vectors further increases the computational cost in the inverted index and reduces the retrieval efficiency.

In this paper, we propose an indexing method that employs feature point correspondences to improve retrieval efficiency. Once the feature point correspondences are determined, there is no need to generate the image vectors and compute the cosine similarity between the image vectors. For this purpose, we introduce the ordered quantization which assigns the feature descriptors to the ordered visual words. The ordered visual words distinguish the quantized feature descriptors that originally belong to the same visual word, thereby increasing the distinction among the quantized feature descriptors. Thus, if the quantized feature descriptors are represented by the same ordered visual words, the feature point correspondences can be generated to compute the similarity between query image and database image. To compute the image similarity efficiently, a multi-dimensional inverted index is constructed to organize the database images and count the number of feature point correspondences. Afterwards, approximate RANSAC is investigated to estimate the spatial correspondences of feature points between query image and candidate images returned from the multi-dimensional inverted index. The final retrieved images are the candidate images with the greatest spatial similarity.

The remaining of this paper is organized as follows. Section 2 reviews the related work in the indexing methods. Section 3 details our indexing method. Section 4 presents experiments performed to evaluate the indexing method. Section 5 summaries this paper.

Section snippets

Related work

With the development of text retrieval and local invariant features, the indexing method Video Google [25] introduces bag-of-visual-words model into content-based image retrieval. Based on the framework of Video Google, numerous indexing methods have been proposed for content-based image retrieval.

There are many indexing methods that employ inverted index in the content-based image retrieval. In Ref. [22], hierarchical k-means is proposed to quantize the feature descriptors, and TF-IDF

Motivations

Our indexing method is inspired by image matching approach [17]. According to the known feature point correspondences, we can easily compute the similarity between query image and database image, and quickly retrieve the most similar database images. However, due to a large number of images in the database, it is not feasible to match the images by directly utilizing the feature descriptors. To solve this problem, the feature point correspondences are generated by the quantized feature

Datasets

In the experiments, three image datasets are used to test our indexing method.

Magazine dataset [7] includes 7665 images scanned from 200 magazines. To evaluate the retrieval performance, the 300 query images are captured by different camera phones from these magazines. Because the query images are taken under various conditions, they are affected by many factors such as rotation, scale, illumination and noise contamination.

UKbench dataset [22] consists of 2550 groups of 4 images. Each group

Conclusion

We have presented an efficient indexing method that employs feature point correspondences to retrieve images. The feature point correspondences are generated by the quantized feature descriptors, which are represented by the same ordered visual words. To compute the feature point correspondences between query image and database image, the multi-dimensional inverted index is proposed to count the number of feature point correspondences, and approximate RANSAC is investigated to further estimate

Acknowledgments

The authors thank the anonymous reviewers for their valuable comments. This work was supported by the Committee of Science and Technology, Shanghai (No.11530700200) and National Natural Science Foundation, China (No.61273258 and No.61105001).

Deying Feng received B.S. degree from Shandong University of Technology and M.S. degree from Shanghai Maritime University, China, respectively in 2005 and 2008. He is now a Ph.D. student in Shanghai Jiao Tong University, China. His research interests include image retrieval and similarity search.

References (34)

  • C. Hong et al.

    An efficient approach to content-based object retrieval in videos

    Neurocomputing

    (2011)
  • C. Wang et al.

    Image retrieval using nonlinear manifold embedding

    Neurocomputing

    (2009)
  • A.Z. Broder, On the resemblance and containment of documents, in: Proceedings of Compression and Complexity of...
  • O. Chum, A. Mikulik, M. Perdoch, J. Matas, Total recall II: query expansion revisited, in: Proceedings of IEEE Computer...
  • O. Chum, J. Philbin, J. Sivic, M. Isard, A. Zisserman, Total recall: automatic query expansion with a generative...
  • O. Chum, J. Philbin, A. Zisserman, Near duplicate image detection: min-Hash and TF-IDF weighting, in: Proceedings of...
  • Y. Cao, C. Wang, Z. Li, L.Q. Zhang, L. Zhang, Spatial-bag-of-features, in: Proceedings of IEEE Computer Society...
  • J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, ImageNet: A large-scale hierarchical image database, in:...
  • D. Feng, J. Yang, C. Yang, Efficient indexing for mobile image retrieval, in: Proceedings of IEEE International...
  • M. Fischler et al.

    Random sample consensus: a paradigm for model fitting with application to image analysis and automated cartography

    Communications of the ACM

    (1981)
  • A. Gionis, P. Indyk, R. Motwani, Similarity search in high dimensions via hashing, in: Proceedings of the 25th...
  • N. Guan et al.

    Non-negative patch alignment framework

    IEEE Transactions on Neural Networks

    (2011)
  • N. Guan et al.

    Online nonnegative matrix factorization with robust stochastic approximation

    IEEE Transactions on Neural Networks and Learning Systems

    (2012)
  • N. Guan et al.

    NeNMF: an optimal gradient method for non-negative matrix factorization

    IEEE Transactions on Signal Processing

    (2012)
  • X. Hou, L. Zhang, Saliency detection: a spectral residual approach, in: Proceedings of IEEE Computer Society Conference...
  • Y.G. Jiang, C.W. Ngo, J. Yang, Towards optimal bag-of-features for object categorization and semantic video retrieval,...
  • D.G. Lowe, Object recognition from local scale-invariant features, in: Proceedings of IEEE International Conference on...
  • Cited by (18)

    • Sparse Multi-Modal Topical Coding for Image Annotation

      2016, Neurocomputing
      Citation Excerpt :

      The flourish development of the Internet technology has offered us numerous web images, which in turns makes the effective image retrieval more challenging. Approaches of image retrieval can be classified into two categories: Content Based Image Retrieval [1,2] (CBIR) and Text Based Image Retrieval (TBIR) [3,4]. CBIR approaches rely on the low-level visual features that can be directly extracted from images, such as color, shape and texture.

    • Light-weight binary code embedding of local feature distribution in image search

      2016, Neurocomputing
      Citation Excerpt :

      Content-based image search is a key technique in many real-world applications such as copy detection [1,2], logo detection [3] and visual content recognition [4].

    • SERVE: Soft and Equalized Residual VEctors for image retrieval

      2016, Neurocomputing
      Citation Excerpt :

      Represented as a set of orderless codewords, the BOVW model has become a paradigm for real-world object and image retrieval. In the context of large-scale retrieval tasks, visual codewords can be efficiently structured offline and indexed by inverted files [1,14]. Fast image search can be achieved by traversing the inverted file, and distinctive similarity evaluation is performed by using the TF-IDF weighting scheme [15].

    • Fast beta wavelet network-based feature extraction for image copy detection

      2016, Neurocomputing
      Citation Excerpt :

      The field of Multimedia Information Retrieval aims to satisfy users' queries by finding the correct results. It can be classified into two approaches: Text-based searching which uses textual information (word or composition of words), and Content Based Retrieval (CBR) which is based on the multimedia content itself [2,3]. As a part of CBR and more specifically CBIR (Content Based Image Retrieval), the field of image identification can be defined to provide the ownership of digital media and certifying whether an image has been modified from its socket.

    • Dual Phase CBIR Model using Hybrid Feature Extraction and Manhattan Distance Measure

      2021, International Journal of Intelligent Engineering and Systems
    View all citing articles on Scopus

    Deying Feng received B.S. degree from Shandong University of Technology and M.S. degree from Shanghai Maritime University, China, respectively in 2005 and 2008. He is now a Ph.D. student in Shanghai Jiao Tong University, China. His research interests include image retrieval and similarity search.

    Jie Yang received Ph.D. degree in computer science from Hamburg University, Germany, in 1994. Currently, he is a professor of the Institute of Image Processing and Pattern Recognition in Shanghai Jiao Tong University, China. His research interests include image retrieval, object detection and recognition, data mining and medical image processing.

    Congxin Liu received B.S. degree from Wuhan University of Hydraulic and Electrical Engineering and M.S. degree from Three Gorges University, China, respectively in 1997 and 2004. He is now a Ph.D. student in Shanghai Jiao Tong University, China. His research interests include local invariant feature and image matching.

    View full text