Subspace-based multi-view fusion for instance-level image retrieval

Li, Jun; Yang, Bo; Yang, Wankou; Sun, Changyin; Xu, Jianhua

doi:10.1007/s00371-020-01828-2

Subspace-based multi-view fusion for instance-level image retrieval

Original Article
Published: 28 February 2020

Volume 37, pages 619–633, (2021)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Jun Li¹,
Bo Yang ORCID: orcid.org/0000-0001-8929-8365²,
Wankou Yang³,
Changyin Sun³ &
…
Jianhua Xu¹

598 Accesses
15 Citations
Explore all metrics

Abstract

In this paper, we address the problem of multiple features fusion in instance-level image retrieval. Achieving tremendous success in recent retrieval task, convolutional neural network (CNN) features are capable of encoding high-level image contents and demonstrate unrivaled superiority to the hand-crafted shallow image signatures. However, the shallow features still play a beneficial role in visual matching particularly when dramatic variances in viewpoint and scale are present, since they inherit certain invariance from the local robust descriptor, e.g., scale-invariant feature transform (SIFT). Thus, it is important to leverage the mutual correlation between these two heterogeneous signatures for effective visual representation. Since it is still an open problem, in this paper, we propose a subspace-based multi-view fusion strategy where a shared subspace is uncovered from the original high-dimensional features yielding a compact latent representation. Experiments on six public benchmark datasets reveal the proposed method works better than other classical fusion approaches and achieve the state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 3

Efficient Binary Multi-view Subspace Learning for Instance-Level Image Retrieval

Incomplete multi-view subspace clustering with adaptive instance-sample mapping and deep feature fusion

Article 10 January 2021

Mengying Xie, Zehui Ye, … Xiaolan Liu

Unifying Deep Local and Global Features for Image Search

References

Alzu’bi, A., Amira, A., Ramzan, N.: Content-based image retrieval with compact deep convolutional features. Neurocomputing 249, 95–105 (2017)
Article Google Scholar
Arandjelović, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: CVPR (2016)
Arandjelović, R., Zisserman, A.: All about VLAD. In: CVPR (2013)
Babenko, A., Lempitsky, V.: Aggregating local deep features for image retrieval. In: ICCV (2015)
Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: ECCV, pp. 584–599 (2014)
Bay, H., Ess, A., Tuytelaars, T., Gool, L.V.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008)
Article Google Scholar
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: Binary robust independent elementary features. In: ECCV, pp. 778–792 (2010)
Chatzichristofis, S.A., Iakovidou, C., Boutalis, Y., Marques, O.: Co.vi.wo.: color visual words based on non-predefined size codebooks. IEEE Trans. Cybern. 43(1), 192–205 (2013)
Article Google Scholar
Chaudhuri, K., Kakade, S.M., Livescu, K., Sridharan, K.: Multi-view clustering via canonical correlation analysis. In: ICML, pp. 129–136 (2009)
Chen, Z., Jacobson, A., Sünderhauf, N., Upcroft, B., Liu, L., Shen, C., Reid, I., Milford, M.: Deep learning features at scale for visual place recognition. In: ICRA, pp. 3223–3230 (2017)
Chum, O., Philbin, J., Zisserman, A.: Near duplicate image detection: min-Hash and tf-idf weighting. In: BMVC, pp. 812–815 (2008)
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: ECCVW, pp. 1–22 (2004)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. 886–893 (2005)
Delhumeau, J., Gosselin, P.H., Jégou, H., Pérez, P.: Revisiting the VLAD image representation. In: ACM MM, pp. 653–656 (2013)
Deselaers, T., Keysers, D., Ney, H.: Features for image retrieval: an experimental comparison. Inf. Retr. 11(2), 77–107 (2008)
Article Google Scholar
Dhillon, P., Foster, D.P., Ungar, L.H.: Multi-view learning of word embeddings via CCA. In: NIPS, pp. 199–207 (2011)
Douze, M., Ramisa, A., Schmid, C.: Combining attributes and Fisher vectors for efficient image retrieval. In: CVPR, pp. 745–752 (2011)
Gálvez-López, D., Tardós, J.D.: Bags of binary words for fast place recognition in image sequences. IEEE Trans. Robot. 28(5), 1188–1197 (2012)
Article Google Scholar
Gong, Y., Lazebnik, S.: Iterative quantization: a procrustean approach to learning binary codes. In: CVPR, pp. 817–824 (2011)
Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features. In: ECCV, pp. 392–407 (2014)
Gordo, A., Rodríguez-Serrano, J.A., Perronnin, F., Valveny, E.: Leveraging category-level labels for instance-level image retrieval. In: CVPR, pp. 3045–3052 (2012)
Graves, A., r. Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: ICASSP, pp. 6645–6649 (2013)
Haghighat, M., Abdel-Mottaleb, M., Alhalabi, W.: Fully automatic face normalization and single sample face recognition in unconstrained environments. Expert Syst. Appl. 47, 23–34 (2016)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Hou, Y., Zhang, H., Zhou, S.: BoCNF: efficient image matching with bag of ConvNet features for scalable and robust visual place recognition. Auton. Robot. 42(6), 1169–1185 (2018)
Article Google Scholar
Jégou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: ECCV, pp. 304–317 (2008)
Jégou, H., Douze, M., Schmid, C.: On the burstiness of visual elements. In: CVPR, pp. 1169–1176 (2009)
Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: CVPR, pp. 3304–3311 (2010)
Jégou, H., Zisserman, A.: Triangulation embedding and democratic aggregation for image search. In: CVPR (2014)
Ji, Z., Pang, Y., Li, X.: Relevance preserving projection and ranking for web image search reranking. IEEE Trans. Image Process. 24(11), 4137–4147 (2015)
Article MathSciNet Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: ACM MM, pp. 675–678 (2014)
Kalantidis, Y., Mellina, C., Osindero, S.: Cross-dimensional weighting for aggregated deep convolutional features. In: ECCV Workshops, pp. 685–701 (2016)
Karakasis, E., Amanatiadis, A., Gasteratos, A., Chatzichristofis, S.: Image moment invariants as local features for content based image retrieval using the bag-of-visual-words model. Pattern Recogn. Lett. 55, 22–27 (2015)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)
Kumar, A., Rai, P., Daumé, H.: Co-regularized multi-view spectral clustering. In: NIPS, pp. 1413–1421 (2011)
Leutenegger, S., Chli, M., Siegwart, R.Y.: BRISK: Binary robust invariant scalable keypoints. In: ICCV, pp. 2548–2555 (2011)
Li, J., Xu, C., Gong, M., Xing, J., Yang, W., Sun, C.: SERVE: soft and equalized residual vectors for image retrieval. Neurocomputing 207, 202–212 (2016)
Article Google Scholar
Li, J., Xu, C., Yang, W., Sun, C.: SPA: spatially pooled attributes for image retrieval. Neurocomputing 257, 47–58 (2017)
Article Google Scholar
Li, J., Xu, C., Yang, W., Sun, C., Ramamohanarao, K., Tao, D.: ROMIR: robust multi-view image re-ranking. IEEE Trans. Knowl. Data Eng. 31(12), 2393–2406 (2019)
Article Google Scholar
Li, J., Yang, B., Yang, W., Sun, C., Zhang, H.: When deep meets shallow: subspace-based multi-view fusion for instance-level image retrieval. In: 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 486–492 (2018)
Liu, Y., Zhang, H.: Visual loop closure detection with a compact image descriptor. In: IROS, pp. 1051–1056 (2012)
Liu, Z., Li, H., Zhou, W., Rui, T., Tian, Q.: Making residual vector distribution uniform for distinctive image representation. IEEE Trans. Circuits Syst. Video Technol. 26(2), 375–384 (2016)
Article Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Lowry, S., Sünderhauf, N., Newman, P., Leonard, J.J., Cox, D., Corke, P., Milford, M.J.: Visual place recognition: a survey. IEEE Trans. Robot. 32(1), 1–19 (2016)
Article Google Scholar
Negrel, R., Picard, D., Gosselin, P.H.: Compact tensor based image representation for similarity search. In: ICIP, pp. 2425–2428 (2012)
Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), pp. 2161–2168 (2006)
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)
Article Google Scholar
Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: CVPR, pp. 1–8 (2007)
Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Large-scale image retrieval with compressed Fisher vectors. In: CVPR, pp. 3384–3391 (2010)
Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher kernel for large-scale image classification. In: ECCV, pp. 143–156 (2010)
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR, pp. 1–8 (2007)
Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: improving particular object retrieval in large scale image databases. In: CVPR, pp. 1–8 (2008)
Pradhan, J., Kumar, S., Pal, A.K., Banka, H.: A hierarchical CBIR framework using adaptive tetrolet transform and novel histograms from color and shape features. Digit. Signal Process. 82, 258–281 (2018)
Article Google Scholar
Pradhan, J., Kumar, S., Pal, A.K., Banka, H.: Texture and color visual features based CBIR using 2D DT-CWT and histograms. In: International Conference on Mathematics and Computing, pp. 84–96 (2018)
Pradhan, J., Pal, A.K., Banka, H.: Principal texture direction based block level image reordering and use of color edge features for application of object based image retrieval. Multimed. Tools Appl. 78(2), 1685–1717 (2019)
Article Google Scholar
Raghuwanshi, G., Tyagi, V.: A novel technique for content based image retrieval based on region-weight assignment. Multimed. Tools Appl. 78(2), 1889–1911 (2019)
Article Google Scholar
Reddy Mopuri, K., Venkatesh Babu, R.: Object level deep feature pooling for compact image representation. In: CVPRW (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: ICCV, pp. 2564–2571 (2011)
Salvador, A., Giró-i Nieto, X., Marqués, F., Satoh, S.: Faster R-CNN features for instance search. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2016)
Schaefer, G., Stich, M.: UCID: an uncompressed color image database. Storage Retr Methods Appl Multimed 2004, 472–481 (2003)
Google Scholar
Shakeri, M., Zhang, H.: Illumination invariant representation of natural images for visual place recognition. In: IROS, pp. 466–472 (2016)
Sharif Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: CVPRW (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: ICCV, pp. 1470–1477 (2003)
Snoek, C.G.M., Worring, M., Smeulders, A.W.M.: Early versus late fusion in semantic video analysis. In: Proceedings of the ACM International Conference on Multimedia, pp. 399–402 (2005)
Tao, D., Tang, X., Li, X., Wu, X.: Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 28(7), 1088–1099 (2006)
Article Google Scholar
Varish, N., Pradhan, J., Pal, A.K.: Image retrieval based on non-uniform bins of color histogram and dual tree complex wavelet transform. Multimed. Tools Appl. 76(14), 15885–15921 (2017)
Article Google Scholar
Varma, M., Babu, B.R.: More generality in efficient multiple kernel learning. In: ICML, pp. 1065–1072 (2009)
Wan, J., Wang, D., Hoi, S.C.H., Wu, P., Zhu, J., Zhang, Y., Li, J.: Deep learning for content-based image retrieval: a comprehensive study. In: ACM MM, pp. 157–166 (2014)
Wang, Z., Di, W., Bhardwaj, A., Jagadeesh, V., Piramuthu, R.: Geometric VLAD for large scale image search. CoRR (2014)
Wu, J., Zhang, H., Guan, Y.: An efficient visual loop closure detection method in a map of 20 million key locations. In: ICRA, pp. 861–866 (2014)
Xu, C., Tao, D., Xu, C.: Multi-view intact space learning. IEEE Trans. Pattern Anal. Mach. Intell. 37(12), 2531–2544 (2015)
Article Google Scholar
Xu, C., Tao, D., Xu, C.: Multi-view learning with incomplete views. IEEE Trans. Image Process. 24(12), 5812–5825 (2015)
Article MathSciNet Google Scholar
Yu, J., Rui, Y., Chen, B.: Exploiting click constraints and multi-view features for image re-ranking. IEEE Trans. Multimed. 16(1), 159–168 (2014)
Article Google Scholar
Yue-Hei Ng, J., Yang, F., Davis, L.S.: Exploiting local features from deep networks for image retrieval. In: CVPRW (2015)
Zhang, H.: BoRF: Loop-closure detection with scale invariant visual features. In: ICRA, pp. 3125–3130 (2011)
Zhang, Y., Jia, Z., Chen, T.: Image retrieval with geometry-preserving visual phrases. In: CVPR, pp. 809–816 (2011)
Zheng, L., Wang, S., Liu, Z., Tian, Q.: Lp-norm IDF for large scale image search. In: CVPR (2013)
Zheng, L., Wang, S., Liu, Z., Tian, Q.: Packing and padding: coupled multi-index for accurate image retrieval. In: CVPR (2014)

Download references

Acknowledgements

The authors would like to thank the associate editor and all anonymous reviewers for their valuable and constructive comments to improve the quality of the manuscript.

Author information

Authors and Affiliations

The School of Computer Science and Technology, Nanjing Normal University, Nanjing, 210023, China
Jun Li & Jianhua Xu
The School of Instrument Science and Engineering, Southeast University, Nanjing, 210096, China
Bo Yang
The School of Automation, Southeast University, Nanjing, 210096, China
Wankou Yang & Changyin Sun

Authors

Jun Li
View author publications
You can also search for this author in PubMed Google Scholar
Bo Yang
View author publications
You can also search for this author in PubMed Google Scholar
Wankou Yang
View author publications
You can also search for this author in PubMed Google Scholar
Changyin Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is supported by the National Natural Science Foundation of China under Grants 61703096, 61921004, U1713209, 61773117, 61803212, the Natural Science Foundation of Jiangsu Province under Grants BK20170691, BK20180744 and the Natural Science Foundation of Jiangsu Higher Education Institutions of China under Grant 18KJB520034.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Yang, B., Yang, W. et al. Subspace-based multi-view fusion for instance-level image retrieval. Vis Comput 37, 619–633 (2021). https://doi.org/10.1007/s00371-020-01828-2

Download citation

Published: 28 February 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s00371-020-01828-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Subspace-based multi-view fusion for instance-level image retrieval

Abstract

Access this article

Similar content being viewed by others

Efficient Binary Multi-view Subspace Learning for Instance-Level Image Retrieval

Incomplete multi-view subspace clustering with adaptive instance-sample mapping and deep feature fusion

Unifying Deep Local and Global Features for Image Search

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Subspace-based multi-view fusion for instance-level image retrieval

Abstract

Access this article

Similar content being viewed by others

Efficient Binary Multi-view Subspace Learning for Instance-Level Image Retrieval

Incomplete multi-view subspace clustering with adaptive instance-sample mapping and deep feature fusion

Unifying Deep Local and Global Features for Image Search

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation