Skip to main content
Log in

Subspace-based multi-view fusion for instance-level image retrieval

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

In this paper, we address the problem of multiple features fusion in instance-level image retrieval. Achieving tremendous success in recent retrieval task, convolutional neural network (CNN) features are capable of encoding high-level image contents and demonstrate unrivaled superiority to the hand-crafted shallow image signatures. However, the shallow features still play a beneficial role in visual matching particularly when dramatic variances in viewpoint and scale are present, since they inherit certain invariance from the local robust descriptor, e.g., scale-invariant feature transform (SIFT). Thus, it is important to leverage the mutual correlation between these two heterogeneous signatures for effective visual representation. Since it is still an open problem, in this paper, we propose a subspace-based multi-view fusion strategy where a shared subspace is uncovered from the original high-dimensional features yielding a compact latent representation. Experiments on six public benchmark datasets reveal the proposed method works better than other classical fusion approaches and achieve the state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Alzu’bi, A., Amira, A., Ramzan, N.: Content-based image retrieval with compact deep convolutional features. Neurocomputing 249, 95–105 (2017)

    Article  Google Scholar 

  2. Arandjelović, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: CVPR (2016)

  3. Arandjelović, R., Zisserman, A.: All about VLAD. In: CVPR (2013)

  4. Babenko, A., Lempitsky, V.: Aggregating local deep features for image retrieval. In: ICCV (2015)

  5. Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: ECCV, pp. 584–599 (2014)

  6. Bay, H., Ess, A., Tuytelaars, T., Gool, L.V.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008)

    Article  Google Scholar 

  7. Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: Binary robust independent elementary features. In: ECCV, pp. 778–792 (2010)

  8. Chatzichristofis, S.A., Iakovidou, C., Boutalis, Y., Marques, O.: Co.vi.wo.: color visual words based on non-predefined size codebooks. IEEE Trans. Cybern. 43(1), 192–205 (2013)

    Article  Google Scholar 

  9. Chaudhuri, K., Kakade, S.M., Livescu, K., Sridharan, K.: Multi-view clustering via canonical correlation analysis. In: ICML, pp. 129–136 (2009)

  10. Chen, Z., Jacobson, A., Sünderhauf, N., Upcroft, B., Liu, L., Shen, C., Reid, I., Milford, M.: Deep learning features at scale for visual place recognition. In: ICRA, pp. 3223–3230 (2017)

  11. Chum, O., Philbin, J., Zisserman, A.: Near duplicate image detection: min-Hash and tf-idf weighting. In: BMVC, pp. 812–815 (2008)

  12. Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: ECCVW, pp. 1–22 (2004)

  13. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. 886–893 (2005)

  14. Delhumeau, J., Gosselin, P.H., Jégou, H., Pérez, P.: Revisiting the VLAD image representation. In: ACM MM, pp. 653–656 (2013)

  15. Deselaers, T., Keysers, D., Ney, H.: Features for image retrieval: an experimental comparison. Inf. Retr. 11(2), 77–107 (2008)

    Article  Google Scholar 

  16. Dhillon, P., Foster, D.P., Ungar, L.H.: Multi-view learning of word embeddings via CCA. In: NIPS, pp. 199–207 (2011)

  17. Douze, M., Ramisa, A., Schmid, C.: Combining attributes and Fisher vectors for efficient image retrieval. In: CVPR, pp. 745–752 (2011)

  18. Gálvez-López, D., Tardós, J.D.: Bags of binary words for fast place recognition in image sequences. IEEE Trans. Robot. 28(5), 1188–1197 (2012)

    Article  Google Scholar 

  19. Gong, Y., Lazebnik, S.: Iterative quantization: a procrustean approach to learning binary codes. In: CVPR, pp. 817–824 (2011)

  20. Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features. In: ECCV, pp. 392–407 (2014)

  21. Gordo, A., Rodríguez-Serrano, J.A., Perronnin, F., Valveny, E.: Leveraging category-level labels for instance-level image retrieval. In: CVPR, pp. 3045–3052 (2012)

  22. Graves, A., r. Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: ICASSP, pp. 6645–6649 (2013)

  23. Haghighat, M., Abdel-Mottaleb, M., Alhalabi, W.: Fully automatic face normalization and single sample face recognition in unconstrained environments. Expert Syst. Appl. 47, 23–34 (2016)

    Article  Google Scholar 

  24. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

  25. Hou, Y., Zhang, H., Zhou, S.: BoCNF: efficient image matching with bag of ConvNet features for scalable and robust visual place recognition. Auton. Robot. 42(6), 1169–1185 (2018)

    Article  Google Scholar 

  26. Jégou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: ECCV, pp. 304–317 (2008)

  27. Jégou, H., Douze, M., Schmid, C.: On the burstiness of visual elements. In: CVPR, pp. 1169–1176 (2009)

  28. Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: CVPR, pp. 3304–3311 (2010)

  29. Jégou, H., Zisserman, A.: Triangulation embedding and democratic aggregation for image search. In: CVPR (2014)

  30. Ji, Z., Pang, Y., Li, X.: Relevance preserving projection and ranking for web image search reranking. IEEE Trans. Image Process. 24(11), 4137–4147 (2015)

    Article  MathSciNet  Google Scholar 

  31. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: ACM MM, pp. 675–678 (2014)

  32. Kalantidis, Y., Mellina, C., Osindero, S.: Cross-dimensional weighting for aggregated deep convolutional features. In: ECCV Workshops, pp. 685–701 (2016)

  33. Karakasis, E., Amanatiadis, A., Gasteratos, A., Chatzichristofis, S.: Image moment invariants as local features for content based image retrieval using the bag-of-visual-words model. Pattern Recogn. Lett. 55, 22–27 (2015)

    Article  Google Scholar 

  34. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)

  35. Kumar, A., Rai, P., Daumé, H.: Co-regularized multi-view spectral clustering. In: NIPS, pp. 1413–1421 (2011)

  36. Leutenegger, S., Chli, M., Siegwart, R.Y.: BRISK: Binary robust invariant scalable keypoints. In: ICCV, pp. 2548–2555 (2011)

  37. Li, J., Xu, C., Gong, M., Xing, J., Yang, W., Sun, C.: SERVE: soft and equalized residual vectors for image retrieval. Neurocomputing 207, 202–212 (2016)

    Article  Google Scholar 

  38. Li, J., Xu, C., Yang, W., Sun, C.: SPA: spatially pooled attributes for image retrieval. Neurocomputing 257, 47–58 (2017)

    Article  Google Scholar 

  39. Li, J., Xu, C., Yang, W., Sun, C., Ramamohanarao, K., Tao, D.: ROMIR: robust multi-view image re-ranking. IEEE Trans. Knowl. Data Eng. 31(12), 2393–2406 (2019)

    Article  Google Scholar 

  40. Li, J., Yang, B., Yang, W., Sun, C., Zhang, H.: When deep meets shallow: subspace-based multi-view fusion for instance-level image retrieval. In: 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 486–492 (2018)

  41. Liu, Y., Zhang, H.: Visual loop closure detection with a compact image descriptor. In: IROS, pp. 1051–1056 (2012)

  42. Liu, Z., Li, H., Zhou, W., Rui, T., Tian, Q.: Making residual vector distribution uniform for distinctive image representation. IEEE Trans. Circuits Syst. Video Technol. 26(2), 375–384 (2016)

    Article  Google Scholar 

  43. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  44. Lowry, S., Sünderhauf, N., Newman, P., Leonard, J.J., Cox, D., Corke, P., Milford, M.J.: Visual place recognition: a survey. IEEE Trans. Robot. 32(1), 1–19 (2016)

    Article  Google Scholar 

  45. Negrel, R., Picard, D., Gosselin, P.H.: Compact tensor based image representation for similarity search. In: ICIP, pp. 2425–2428 (2012)

  46. Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), pp. 2161–2168 (2006)

  47. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)

    Article  Google Scholar 

  48. Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: CVPR, pp. 1–8 (2007)

  49. Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Large-scale image retrieval with compressed Fisher vectors. In: CVPR, pp. 3384–3391 (2010)

  50. Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher kernel for large-scale image classification. In: ECCV, pp. 143–156 (2010)

  51. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR, pp. 1–8 (2007)

  52. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: improving particular object retrieval in large scale image databases. In: CVPR, pp. 1–8 (2008)

  53. Pradhan, J., Kumar, S., Pal, A.K., Banka, H.: A hierarchical CBIR framework using adaptive tetrolet transform and novel histograms from color and shape features. Digit. Signal Process. 82, 258–281 (2018)

    Article  Google Scholar 

  54. Pradhan, J., Kumar, S., Pal, A.K., Banka, H.: Texture and color visual features based CBIR using 2D DT-CWT and histograms. In: International Conference on Mathematics and Computing, pp. 84–96 (2018)

  55. Pradhan, J., Pal, A.K., Banka, H.: Principal texture direction based block level image reordering and use of color edge features for application of object based image retrieval. Multimed. Tools Appl. 78(2), 1685–1717 (2019)

    Article  Google Scholar 

  56. Raghuwanshi, G., Tyagi, V.: A novel technique for content based image retrieval based on region-weight assignment. Multimed. Tools Appl. 78(2), 1889–1911 (2019)

    Article  Google Scholar 

  57. Reddy Mopuri, K., Venkatesh Babu, R.: Object level deep feature pooling for compact image representation. In: CVPRW (2015)

  58. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)

  59. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: ICCV, pp. 2564–2571 (2011)

  60. Salvador, A., Giró-i Nieto, X., Marqués, F., Satoh, S.: Faster R-CNN features for instance search. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2016)

  61. Schaefer, G., Stich, M.: UCID: an uncompressed color image database. Storage Retr Methods Appl Multimed 2004, 472–481 (2003)

    Google Scholar 

  62. Shakeri, M., Zhang, H.: Illumination invariant representation of natural images for visual place recognition. In: IROS, pp. 466–472 (2016)

  63. Sharif Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: CVPRW (2014)

  64. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)

  65. Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: ICCV, pp. 1470–1477 (2003)

  66. Snoek, C.G.M., Worring, M., Smeulders, A.W.M.: Early versus late fusion in semantic video analysis. In: Proceedings of the ACM International Conference on Multimedia, pp. 399–402 (2005)

  67. Tao, D., Tang, X., Li, X., Wu, X.: Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 28(7), 1088–1099 (2006)

    Article  Google Scholar 

  68. Varish, N., Pradhan, J., Pal, A.K.: Image retrieval based on non-uniform bins of color histogram and dual tree complex wavelet transform. Multimed. Tools Appl. 76(14), 15885–15921 (2017)

    Article  Google Scholar 

  69. Varma, M., Babu, B.R.: More generality in efficient multiple kernel learning. In: ICML, pp. 1065–1072 (2009)

  70. Wan, J., Wang, D., Hoi, S.C.H., Wu, P., Zhu, J., Zhang, Y., Li, J.: Deep learning for content-based image retrieval: a comprehensive study. In: ACM MM, pp. 157–166 (2014)

  71. Wang, Z., Di, W., Bhardwaj, A., Jagadeesh, V., Piramuthu, R.: Geometric VLAD for large scale image search. CoRR (2014)

  72. Wu, J., Zhang, H., Guan, Y.: An efficient visual loop closure detection method in a map of 20 million key locations. In: ICRA, pp. 861–866 (2014)

  73. Xu, C., Tao, D., Xu, C.: Multi-view intact space learning. IEEE Trans. Pattern Anal. Mach. Intell. 37(12), 2531–2544 (2015)

    Article  Google Scholar 

  74. Xu, C., Tao, D., Xu, C.: Multi-view learning with incomplete views. IEEE Trans. Image Process. 24(12), 5812–5825 (2015)

    Article  MathSciNet  Google Scholar 

  75. Yu, J., Rui, Y., Chen, B.: Exploiting click constraints and multi-view features for image re-ranking. IEEE Trans. Multimed. 16(1), 159–168 (2014)

    Article  Google Scholar 

  76. Yue-Hei Ng, J., Yang, F., Davis, L.S.: Exploiting local features from deep networks for image retrieval. In: CVPRW (2015)

  77. Zhang, H.: BoRF: Loop-closure detection with scale invariant visual features. In: ICRA, pp. 3125–3130 (2011)

  78. Zhang, Y., Jia, Z., Chen, T.: Image retrieval with geometry-preserving visual phrases. In: CVPR, pp. 809–816 (2011)

  79. Zheng, L., Wang, S., Liu, Z., Tian, Q.: Lp-norm IDF for large scale image search. In: CVPR (2013)

  80. Zheng, L., Wang, S., Liu, Z., Tian, Q.: Packing and padding: coupled multi-index for accurate image retrieval. In: CVPR (2014)

Download references

Acknowledgements

The authors would like to thank the associate editor and all anonymous reviewers for their valuable and constructive comments to improve the quality of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is supported by the National Natural Science Foundation of China under Grants 61703096, 61921004, U1713209, 61773117, 61803212, the Natural Science Foundation of Jiangsu Province under Grants BK20170691, BK20180744 and the Natural Science Foundation of Jiangsu Higher Education Institutions of China under Grant 18KJB520034.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Yang, B., Yang, W. et al. Subspace-based multi-view fusion for instance-level image retrieval. Vis Comput 37, 619–633 (2021). https://doi.org/10.1007/s00371-020-01828-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-020-01828-2

Keywords

Navigation