Label guided correlation hashing for large-scale cross-modal retrieval

Dong, Guohua; Zhang, Xiang; Lan, Long; Wang, Shiwei; Luo, Zhigang

doi:10.1007/s11042-019-7192-5

Label guided correlation hashing for large-scale cross-modal retrieval

Published: 06 February 2019

Volume 78, pages 30895–30922, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Guohua Dong^1,2,
Xiang Zhang^2,3,
Long Lan^2,3,
Shiwei Wang^1,2 &
…
Zhigang Luo^1,2

452 Accesses
8 Citations
Explore all metrics

Abstract

With the explosive growth of multimedia data such as text and image, large-scale cross-modal retrieval has attracted more attention from vision community. But it still confronts the problems of the so-called “media gap” and search efficiency. Looking into the literature, we find that one leading type of existing cross-modal retrieval methods has been broadly investigated to alleviate the above problems by capturing the correlations across modalities as well as learning hashing codes. However, supervised label information is usually independently considered in the process of either generating hashing codes or learning hashing function. To this, we propose a label guided correlation cross-modal hashing method (LGCH), which investigates an alternative way to exploit label information for effective cross-modal retrieval from two aspects: 1) LGCH learns the discriminative common latent representation across modalities through joint generalized canonical correlation analysis (GCCA) and a linear classifier; 2) to simultaneously generate binary codes and hashing function, LGCH introduces an adaptive parameter to effectively fuse the common latent representation and the label guided representation for effective cross-modal retrieval. Moreover, each subproblem of LGCH has the elegant analytical solution. Experiments of cross-modal retrieval on three multi-media datasets show LGCH performs favorably against many well-established baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-auxiliary Hashing for Unsupervised Cross Modal Retrieval

Discrete Hashing Based Supervised Matrix Factorization for Cross-Modal Retrieval

Semi-supervised discrete hashing for efficient cross-modal retrieval

Article 01 July 2020

References

Akaho S (2006) A kernel method for canonical correlation analysis. arXiv:0609071
Andrew G, Arora R, Bilmes J, Livescu K (2013) Deep canonical correlation analysis. In: International conference on machine learning, pp III–1247
Bay H, Tuytelaars T, Gool LJV (2006) SURF: speeded up robust features. In; European conference on computer vision, pp 404–417
Bay H, Ess A, Tuytelaars T, Gool LJV (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359
Article Google Scholar
Benton A, Khayrallah H, Gujral B, Reisinger D, Zhang S, Arora R (2017) Deep generalized canonical correlation analysis. arXiv:1702.02519
Bronstein MM, Bronstein AM, Michel F, Paragios N (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: IEEE Conference on computer vision and pattern recognition, pp 3594–3601
Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng YT (2009) Nus-wide: a real-world web image database from national university of singapore. In: Proceedings of the ACM international conference on image and video retrieval, p 48
Clinchant S, Ah-Pine J, Csurka G (2011) Semantic combination of textual and visual information in multimedia retrieval. In: International conference on multimedia retrieval, p 44
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Conference on computer vision and pattern recognition, pp 886–893
Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the 20th ACM symposium on computational geometry, pp 253–262
Deng C, Chen Z, Liu X, Gao X, Tao D (2018) Triplet-based deep hashing network for cross-modal retrieval. IEEE Trans Image Process 27(8):3893–3903
Article MathSciNet Google Scholar
Dong G, Zhang X, Lan L, Huang X, Luo Z (2018) Discrete graph hashing via affine transformation. In: IEEE International conference on multimedia and expo
Fu Y, Wei Y, Zhou Y, Shi H, Huang G, Wang X, Yao Z, Huang TS (2018) Horizontal pyramid matching for person re-identification. In: AAAI Conference on artificial intelligence
Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vis 106 (2):210–233
Article Google Scholar
Horst P (1961) Generalized canonical correlations and their applications to experimental data. J Clin Psychol 17(4):331–347
Article Google Scholar
Hotelling H (1936) Relations between two sets of variates. Biometrika 28 (3/4):321–377
Article Google Scholar
Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation. In: Proceedings of the ACM SIGMM international conference on multimedia information retrieval, pp 39–43
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Neural information processing systems, pp 1106–1114
Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. In: International joint conference on artificial intelligence, pp 1360–1365
Long M, Cao Y, Wang J, Yu PS (2016) Composite correlation quantization for efficient multimodal retrieval. In: Annual International ACM SIGIR conference on research and development in information retrieval, pp 579–588
Lowe DG (1999) Object recognition from local scale-invariant features. In: IEEE International conference on computer vision, pp 1150–1157
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Ma D, Zhai X, Peng Y (2013) Cross-media retrieval by cluster-based correlation analysis. In: IEEE International conference on image processing, pp 3986–3990
Mirsky L (1975) A trace inequality of john von neumann. Monatshefte Fu̇r Mathematik 79(4):303–306
Article MathSciNet Google Scholar
Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Multimodal deep learning. In: International conference on machine learning, pp 689–696
Ojala T, Pietikȧinen M, Harwood D (1994) Performance evaluation of texture measures with classification based on kullback discrimination of distributions. In: International conference on pattern recognition, pp 582–585
Peng Y, Huang X, Qi J (2016) Cross-media shared representation by hierarchical learning with multiple deep networks. In: International joint conference on artificial intelligence, pp 3846–3853
Peng Y, Huang X, Zhao Y (2017) An overview of cross-media retrieval: concepts, methodologies, benchmarks and challenges. IEEE Transactions on circuits and systems for video technology
Ranjan V, Rasiwasia N, Jawahar CV (2015) Multi-label cross-modal retrieval. In: IEEE International conference on computer vision, pp 4094–4102
Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet G, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the ACM international conference on multimedia, pp 251–260
Rasiwasia N, Mahajan D, Mahadevan V, Aggarwal G (2014) Cluster canonical correlation analysis. In: Proceedings of the seventeenth international conference on artificial intelligence and statistics, pp 823–831
Shen X, Shen F, Sun Q, Yang Y, Yuan Y, Shen HT (2017) Semi-paired discrete hashing: learning latent hash codes for semi-paired cross-view retrieval. IEEE Trans Cybern 47(12):4275–4288
Article Google Scholar
Shen X, Liu W, Tsang IW, Sun Q, Ong Y (2018) Multilabel prediction via cross-view search. IEEE Trans Neural Netw Learn Syst 29(9):4324–4338
Article Google Scholar
Shen X, Shen F, Liu L, Yuan Y, Liu W, Sun Q (2018) Multiview discrete hashing for scalable multimedia search. ACM Trans Intell Syst Technol 9(5):53:1–53:21
Article Google Scholar
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: IEEE International conference on computer vision, pp 1470–1477
Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 785–796
Srivastava N, Salakhutdinov R (2012) Multimodal learning with deep Boltzmann machines. In: Neural information processing systems, pp 2231–2239
Srivastava N, Salakhutdinov RR (2014) Multimodal learning with deep Boltzmann machines. J Mach Learn Res 15(1):2949–2980
MathSciNet MATH Google Scholar
Tong H, He J, Li M, Zhang C, Ma W (2005) Graph based multi-modality learning. In: Proceedings of the ACM international conference on multimedia, pp 862–871
Wang Y, Wu L (2018) Beyond low-rank representations: orthogonal clustering basis reconstruction with optimized graph structure for multi-view spectral clustering. Neural Netw 103:1–8
Article Google Scholar
Wang X, Li Z, Tao D (2011) Subspaces indexing model on grassmann manifold for image search. IEEE Trans Image Process 20(9):2627–2635
Article MathSciNet Google Scholar
Wang X, Li Z, Zhang L, Yuan J (2011) Grassmann hashing for approximate nearest neighbor search in high dimensional space. In: IEEE International conference on multimedia and expo, pp 1–6
Wang X, Bian W, Tao D (2013) Grassmannian regularized structured multi-view embedding for image classification. IEEE Trans Image Process 22(7):2646–2660
Article MathSciNet Google Scholar
Wang Y, Lin X, Wu L, Zhang W, Zhang Q (2015) LBMCH: learning bridging mapping for cross-modal hashing. In: Annual international ACM SIGIR conference on research and development in information retrieval, pp 999–1002
Wang Y, Lin X, Wu L, Zhang W, Zhang Q, Huang X (2015) Robust subspace clustering for multi-view data by exploiting correlation consensus. IEEE Trans Image Process 24(11):3939–3949
Article MathSciNet Google Scholar
Wang B, Yang Y, Xu X, Hanjalic A, Shen HT (2017) Adversarial cross-modal retrieval. In: Proceedings of the ACM international conference on multimedia, pp 154–162
Wang Y, Lin X, Wu L, Zhang W (2017) Effective multi-query expansions: collaborative deep networks for robust landmark retrieval. IEEE Trans Image Process 26(3):1393–1404
Article MathSciNet Google Scholar
Wang Y, Zhang W, Wu L, Lin X, Zhao X (2017) Unsupervised metric fusion over multiview data by graph random walk-based cross-view diffusion. IEEE Trans Neural Netw Learn Syst 28(1):57–70
Article Google Scholar
Wang Y, Wu L, Lin X, Gao J (2018) Multiview spectral clustering via structured low-rank matrix factorization. IEEE Trans Neural Netw Learn Syst 29 (10):4833–4843
Article Google Scholar
Weiss Y, Torralba A, Fergus R (2009) Spectral hashing. In: Neural information processing systems, pp 1753–1760
Wu L, Wang Y (2017) Robust hashing for multi-view data: jointly learning low-rank kernelized similarity consensus and hash functions. Image Vis Comput 57:58–66
Article Google Scholar
Wu B, Yang Q, Zheng WS, Wang Y, Wang J (2015) Quantized correlation hashing for fast cross-modal search. In: International joint conference on artificial intelligence, pp 3946–3952
Wu L, Wang Y, Ge Z, Hu Q, Li X (2018) Structured deep hashing with convolutional neural networks for fast person re-identification. Comput Vis Image Underst 167:63–73
Article Google Scholar
Wu L, Wang Y, Li X, Gao J (2018) Deep attention-based spatially recursive networks for fine-grained visual recognition. IEEE Trans Cybern, 1–12
Wu L, Wang Y, Shao L (2019) Cycle-consistent deep generative hashing for cross-modal retrieval. IEEE Trans Image Process 28(4):1602–1612
Article MathSciNet Google Scholar
Yang E, Deng C, Liu T, Liu W, Tao D (2018) Semantic structure-based unsupervised deep hashing. In: International joint conference on artificial intelligence, pp 1064–1070
Zhang D, Li W (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: AAAI Conference on artificial intelligence, pp 2177–2183
Zhang X, Dong G, Du Y, Wu C, Luo Z, Yang C (2018) Collaborative subspace graph hashing for cross-modal retrieval. In: International conference on multimedia retrieval, pp 213–221
Zhuang Y, Yang Y, Wu F (2008) Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Trans Multimed 10(2):221–229
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China [61806213, U1435222]

Author information

Authors and Affiliations

Science and Technology on Parallel and Distributed Laboratory, National University of Defense Technology, Changsha, Hunan, 410073, People’s Republic of China
Guohua Dong, Shiwei Wang & Zhigang Luo
College of computer, National University of Defense Technology, Changsha, Hunan, 410073, People’s Republic of China
Guohua Dong, Xiang Zhang, Long Lan, Shiwei Wang & Zhigang Luo
Institute for Quantum Information & State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha, Hunan, 410073, People’s Republic of China
Xiang Zhang & Long Lan

Authors

Guohua Dong
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Long Lan
View author publications
You can also search for this author in PubMed Google Scholar
Shiwei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhigang Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Xiang Zhang, Long Lan or Zhigang Luo.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dong, G., Zhang, X., Lan, L. et al. Label guided correlation hashing for large-scale cross-modal retrieval. Multimed Tools Appl 78, 30895–30922 (2019). https://doi.org/10.1007/s11042-019-7192-5

Download citation

Received: 02 September 2018
Revised: 10 December 2018
Accepted: 09 January 2019
Published: 06 February 2019
Issue Date: November 2019
DOI: https://doi.org/10.1007/s11042-019-7192-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Label guided correlation hashing for large-scale cross-modal retrieval

Abstract

Access this article

Similar content being viewed by others

Self-auxiliary Hashing for Unsupervised Cross Modal Retrieval

Discrete Hashing Based Supervised Matrix Factorization for Cross-Modal Retrieval

Semi-supervised discrete hashing for efficient cross-modal retrieval

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Label guided correlation hashing for large-scale cross-modal retrieval

Abstract

Access this article

Similar content being viewed by others

Self-auxiliary Hashing for Unsupervised Cross Modal Retrieval

Discrete Hashing Based Supervised Matrix Factorization for Cross-Modal Retrieval

Semi-supervised discrete hashing for efficient cross-modal retrieval

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation