Skip to main content
Log in

Tensor index for large scale image retrieval

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Recently, the bag-of-words representation is widely applied in the image retrieval applications. In this model, visual word is a core component. However, compared with text retrieval, one major problem associated with image retrieval consists in the visual word ambiguity, i.e., a trade-off between precision and recall of visual matching. To address this problem, this paper proposes a tensor index structure to improve precision and recall simultaneously. Essentially, the tensor index is a multi-dimensional index structure. It combines the strengths of two state-of-the-art indexing strategies, i.e., the inverted multi-index [Babenko and Lempitsky (Computer vision and pattern recognition (CVPR), 2012 IEEE Conference, 3069–3076, 2012)] as well as the joint inverted index [Xia et al. (ICCV, 2013)] which are initially designed for approximate nearest neighbor search problems. This paper, instead, exploits their usage in the scenario of image retrieval and provides insights into how to combine them effectively. We show that on the one hand, the multi-index enhances the discriminative power of visual words, thus improving precision; on the other hand, the introduction of multiple codebooks corrects quantization artifacts, thus improving recall. Extensive experiments on two benchmark datasets demonstrate that tensor index significantly improves the baseline approach. Moreover, when incorporating methods such as Hamming embedding, we achieve competitive performances compared to the state-of-the-art ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Arandjelovic, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference, pp. 2911–2918. IEEE (2012)

  2. Babenko, A., Lempitsky, V.: The inverted multi-index. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference, pp. 3069–3076. IEEE (2012)

  3. Bai, S., Wang, X., Yao, C., Bai, X.: Multiple stage residual model for accurate image classification. In: Computer Vision-ACCV 2012. Springer (2014)

  4. Boix, X., Roig, G., Leistner, C., Van Gool, L.: Nested sparse quantization for efficient feature coding. In: Computer Vision-ECCV 2012, pp. 744–758. Springer (2012)

  5. Cai, J., Liu, Q., Chen, F., Joshi, D., Tian, Q.: Scalable image search with multiple index tables. In: Proceedings of International Conference on Multimedia Retrieval, p. 407. ACM (2014)

  6. Cai, Y., Tong, W., Yang, L., Hauptmann, A.G.: Constrained keypoint quantization: towards better bag-of-words model for large-scale multimedia retrieval. In: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, p. 16. ACM (2012)

  7. van Gemert, J.C., Veenman, C.J., Smeulders, A.W., Geusebroek, J.M.: Visual word ambiguity. Pattern Anal. Mach. Intell. IEEE Trans. 32(7), 1271–1283 (2010)

    Article  Google Scholar 

  8. Huiskes, M.J., Thomee, B., Lew, M.S.: New trends and ideas in visual concept detection: the mir flickr retrieval evaluation initiative. In: Proceedings of the international conference on Multimedia information retrieval, pp. 527–536. ACM (2010)

  9. Jégou, H., Chum, O.: Negative evidences and co-occurences in image retrieval: The benefit of pca and whitening. In: Computer Vision-ECCV 2012, pp. 774–787. Springer (2012)

  10. Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Computer Vision-ECCV 2008, pp. 304–317. Springer (2008)

  11. Jégou, H., Douze, M., Schmid, C.: On the burstiness of visual elements. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 1169–1176. IEEE (2009)

  12. Jégou, H., Douze, M., Schmid, C.: Improving bag-of-features for large scale image search. Int. J. Comput. Vis. 87(3), 316–336 (2010)

    Article  Google Scholar 

  13. Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. Pattern Anal. Mach. Intell. IEEE Trans. 33(1), 117–128 (2011)

    Article  Google Scholar 

  14. Jegou, H., Schmid, C., Harzallah, H., Verbeek, J.: Accurate image search using the contextual dissimilarity measure. ern Anal. Mach. Intell. IEEE Trans. 32(1), 2–11 (2010)

    Article  Google Scholar 

  15. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  16. Liu, J., Wang, S.: Salient region detection via simple local and global contrast representation. Neurocomputing 147, 435–443 (2015)

  17. Liu, S., Cui, P., Zhu, W., Yang, S., Tian, Q.: Social embedding image distance learning. In: Proceedings of the 20th ACM international conference on Multimedia (2014)

  18. Liu, Z., Li, H., Zhou, W., Zhao, R., Tian, Q.: Contextual hashing for large-scale image search. Image Process. IEEE Trans. 23(4), 1606–1614 (2014)

    Article  MathSciNet  Google Scholar 

  19. Liu, Z., Wang, S., Zheng, L., Tian, Q.: Visual reranking with improved image graph. In: ICASSP, pp. 6889–6893. IEEE (2014)

  20. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  21. Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference, vol. 2, pp. 2161–2168. IEEE (2006)

  22. Niu, Z., Hua, G., Gao, X., Tian, Q.: Context aware topic model for scene recognition. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference, pp. 2743–2750. IEEE (2012)

  23. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference, pp. 1–8. IEEE (2007)

  24. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: Improving particular object retrieval in large scale image databases. In: Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference, pp. 1–8. IEEE (2008)

  25. Qin, D., Wengert, C., Van Gool, L.: Query adaptive similarity for large scale object retrieval. In: Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference, pp. 1610–1617. IEEE (2013)

  26. Shahbaz Khan, F., Anwer, R.M., van de Weijer, J., Bagdanov, A.D., Vanrell, M., Lopez, A.M.: Color attributes for object detection. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference, pp. 3306–3313. IEEE (2012)

  27. Shen, X., Lin, Z., Brandt, J., Avidan, S., Wu, Y.: Object retrieval and localization with spatially-constrained similarity measure and k-nn re-ranking. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference, pp. 3013–3020. IEEE (2012)

  28. Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: Computer Vision, 2003. Proceedings. Ninth IEEE International Conference, pp. 1470–1477. IEEE (2003)

  29. Su, B., Ding, X., Peng, L., Liu, C.: A novel baseline-independent feature set for arabic handwriting recognition. In: Document Analysis and Recognition (ICDAR), 2013 12th International Conference, pp. 1250–1254. IEEE (2013)

  30. Su, Y., Fu, Y., Gao, X., Tian, Q.: Discriminant learning through multiple principal angles for visual recognition. Image Process. IEEE Trans. 21(3), 1381–1390 (2012)

    Article  MathSciNet  Google Scholar 

  31. Su, Y., Tao, D., Li, X., Gao, X.: Texture representation in aam using gabor wavelet and local binary patterns. In: Systems, Man and Cybernetics, 2009. SMC 2009. IEEE International Conference, pp. 3274–3279. IEEE (2009)

  32. Wang, D., Lu, H., Yang, M.H.: Online object tracking with sparse prototypes. Image Process. IEEE Trans. 22(1), 314–325 (2013)

    Article  MathSciNet  Google Scholar 

  33. Wang, X., Yang, M., Cour, T., Zhu, S., Yu, K., Han, T.X.: Contextual weighting for vocabulary tree based image retrieval. In: Computer Vision (ICCV), 2011 IEEE International Conference, pp. 209–216. IEEE (2011)

  34. Wang, Y., Liu, C., Ding, X.: Similar pattern discriminant analysis for improving chinese character recognition accuracy. In: Document Analysis and Recognition (ICDAR), 2013 12th International Conference, pp. 1056–1060. IEEE (2013)

  35. Wengert, C., Douze, M., Jégou, H.: Bag-of-colors for improved image search. In: Proceedings of the 19th ACM international conference on Multimedia, pp. 1437–1440. ACM (2011)

  36. Xia, Y., He, K., Wen, F., Sun, J.: Joint inverted index. In: ICCV (2013)

  37. Xie, L., Tian, Q., Zhang, B.: Spatial pooling of heterogeneous features for image classification. Image Process. IEEE Trans. 23(5), 1994–2008 (2013)

    MathSciNet  Google Scholar 

  38. Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference, pp. 1794–1801. IEEE (2009)

  39. Yang, Y., Liu, J.: Exploring the large-scale tdoa feature space for speaker diarization. In: HCI International 2014-Posters Extended Abstracts, pp. 551–556. Springer (2014)

  40. Yuan, H., Qian, Y., Zhao, J., Liu, J.: Mispronunciation detection with an optimized detection network and multi-layer perception based features. J. Tsinghua Univ. (Sci. Technol.) 4, 027 (2012)

  41. Zhang, S., Yang, M., Cour, T., Yu, K., Metaxas, D.N.: Query specific fusion for image retrieval. In: Computer Vision-ECCV 2012, pp. 660–673. Springer (2012)

  42. Zhang, S., Yang, M., Wang, X., Lin, Y., Tian, Q.: Semantic-aware co-indexing for image retrieval. In: Computer Vision (ICCV), 2013 IEEE International Conference, pp. 1673–1680. IEEE (2013)

  43. Zhang, Y., Jia, Z., Chen, T.: Image retrieval with geometry-preserving visual phrases. In: Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference, pp. 809–816. IEEE (2011)

  44. Zheng, L., Wang, S.: Visual phraselet: Refining spatial constraints for large scale image search. Signal Process. Lett. IEEE 20(4), 391–394 (2013)

    Article  Google Scholar 

  45. Zheng, L., Wang, S., Tian, Q.: Coupled binary embedding for large-scale image retrieval. Image Process. IEEE Trans. 23(8), 3368–3380 (2014)

    Article  MathSciNet  Google Scholar 

  46. Zheng, L., Wang, S., Tian, Q.: Lp-norm idf for scalable image retrieval. Image Process. IEEE Trans. 23(8), 3604–3617 (2014)

    Article  MathSciNet  Google Scholar 

  47. Zheng, L., Wang, S., Zhou, W., Tian, Q.: Bayes merging of multiple vocabularies for scalable image retrieval. In: CVPR (2014)

  48. Zhou, W., Lu, Y., Li, H., Tian, Q.: Scalar quantization for large scale image search. In: Proceedings of the 20th ACM international conference on Multimedia, pp. 169–178. ACM (2012)

Download references

Acknowledgements

This work was supported by the National High Technology Research and Development Program of China (863 program) under Grant No. 2012AA011004 and the National Science and Technology Support Program under Grant No. 2013BAK02B04. This work was supported in part to Dr. Qi Tian by ARO grant W911NF-12-1-0057 and Faculty Research Awards by NEC Laboratories of America. This work was supported in part by National Science Foundation of China (NSFC) 61429201.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Shengjin Wang or Qi Tian.

Additional information

Communicated by F. Wu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, L., Wang, S., Guo, P. et al. Tensor index for large scale image retrieval. Multimedia Systems 21, 569–579 (2015). https://doi.org/10.1007/s00530-014-0415-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-014-0415-8

Keywords

Navigation