Deep fused two-step cross-modal hashing with multiple semantic supervision

Kang, Peipei; Lin, Zehang; Yang, Zhenguo; Bronstein, Alexander M.; Li, Qing; Liu, Wenyin

doi:10.1007/s11042-022-12187-6

Deep fused two-step cross-modal hashing with multiple semantic supervision

Published: 28 February 2022

Volume 81, pages 15653–15670, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Peipei Kang^1,2,
Zehang Lin³,
Zhenguo Yang¹,
Alexander M. Bronstein²,
Qing Li³ &
…
Wenyin Liu¹

422 Accesses
1 Altmetric
Explore all metrics

Abstract

Existing cross-modal hashing methods ignore the informative multimodal joint information and cannot fully exploit the semantic labels. In this paper, we propose a deep fused two-step cross-modal hashing (DFTH) framework with multiple semantic supervision. In the first step, DFTH learns unified hash codes for instances by a fusion network. Semantic label and similarity reconstruction have been introduced to acquire binary codes that are informative, discriminative and semantic similarity preserving. In the second step, two modality-specific hash networks are learned under the supervision of common hash codes reconstruction, label reconstruction, and intra-modal and inter-modal semantic similarity reconstruction. The modality-specific hash networks can generate semantic preserving binary codes for out-of-sample queries. To deal with the vanishing gradients of binarization, continuous differentiable tanh is introduced to approximate the discrete sign function, making the networks able to back-propagate by automatic gradient computation. Extensive experiments on MIRFlickr25K and NUS-WIDE show the superiority of DFTH over state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised multi-perspective fusing semantic alignment for cross-modal hashing retrieval

Article 09 January 2024

Pseudo-label driven deep hashing for unsupervised cross-modal retrieval

Article 11 May 2023

Deep Consistency Preserving Network for Unsupervised Cross-Modal Hashing

References

Cao Y, Liu B, Long M, Wang J (2018) Cross-modal hamming hashing. In: ECCV, pp 202–218
Cao Z, Long M, Wang J, Yu PS (2017) Hashnet: Deep learning to hash by continuation. In: ICCV, pp 5608–5617
Chen ZD, Wang Y, Li H, Luo X, Nie L, Xu X (2019) A two-step cross-modal hashing by exploiting label correlations and preserving similarity in both steps. In: ACMMM, pp 1694–1702
Chen ZD, Yu WJ, Li C, Nie L, Xu X (2018) Dual deep neural networks cross-modal hashing. In: AAAI, pp 274–281
Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of singapore. In: CIVR, pp 1–9
Ding G, Guo Y, Zhou J, Gao Y (2016) Large-scale cross-modality search via collective matrix factorization hashing. TIP 25(11):5427–5440
MathSciNet MATH Google Scholar
Erin Liong V, Lu J, Tan YP, Zhou J (2017) Cross-modal deep variational hashing. In: ICCV, pp 4077–4085
Hu D, Nie F, Li X (2018) Deep binary reconstruction for cross-modal hashing. TMM 21(4):973–985
Google Scholar
Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation. In: ACMMM, pp 39–43
Jiang QY, Li W (2017) Deep cross-modal hashing. In: CVPR, pp 3232–3240
Kang P, Lin Z, Yang Z, Fang X, Li Q, Liu W (2019) Deep semantic space with intra-class low-rank constraint for cross-modal retrieval. In: ICMR, pp 226–234
Li C, Deng C, Li N, Liu W, Gao X, Tao D (2018) Self-supervised adversarial hashing networks for cross-modal retrieval. In: CVPR, pp 4242–4251
Li C, Chen ZD, Zhang PF, Luo X, Nie L, Zhang W, Xu X (2018) Scratch: A scalable discrete matrix factorization hashing for cross-modal retrieval. In: ACMMM, pp 1–9
Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: CVPR, pp 3864–3872
Liu H, Ji R, Wu Y, Huang F, Zhang B (2017) Cross-modality binary code learning via fusion similarity hashing. In: CVPR, pp 7380–7388
Liu X, Hu Z, Ling H, Cheung Ym (2019) Mtfh: a matrix tri-factorization hashing framework for efficient cross-modal retrieval TPAMI
Liu X, Nie X, Zeng W, Cui C, Zhu L, Yin Y (2018) Fast discrete cross-modal hashing with regressing from semantic labels. In: ACMMM, pp 1662–1669
Liu X, Yu G, Domeniconi C, Wang J, Ren Y, Guo M (2019) Ranking-based deep cross-modal hashing. In: AAAI, pp 4400–4407
Luo X, Yin XY, Nie L, Song X, Wang Y, Xu X (2018) Sdmch: supervised discrete manifold-embedded cross-modal hashing. In: IJCAI, pp 2518–2524
Peng Y, Huang X, Zhao Y (2017) An overview of cross-media retrieval: concepts, methodologies, benchmarks, and challenges. TCSVT 28(9):2372–2385
Google Scholar
Shen HT, Liu L, Yang Y, Xu X, Huang Z, Shen F, Hong R (2020) Exploiting subspace relation in semantic labels for cross-modal hashing TKDE
Su S, Zhong Z, Zhang C (2019) Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In: ICCV, pp 3027–3035
Wang D, Wang Q, An Y, Gao X, Tian Y (2020) Online collective matrix factorization hashing for large-scale cross-media retrieval. In: SIGIR, pp 1409–1418
Wang H, Chen H, Meng M, Wu J (2019) Robust multi-view hashing for cross-modal retrieval. In: ICME, pp 1012–1017
Wang K, Yin Q, Wang W, Wu S, Wang L (2016) A comprehensive survey on cross-modal retrieval. arXiv:1607.06215
Wang L, Zhu L, Yu E, Sun J, Zhang H (2019) Fusion-supervised deep cross-modal hashing. In: ICME, pp 37–42
Wang Q, Huang W, Zhang X, Li X (2020) Word-sentence framework for remote sensing image captioning. IEEE Transactions on Geoscience and Remote Sensing
Wang Q, Zhu G, Yuan Y (2014) Statistical quantization for similarity search. Comput Vis Image Underst 124:22–30
Article Google Scholar
Wang X, Liu X, Hu Z, Wang N, Fan W, Du JX (2019) Semi-supervised semantic-preserving hashing for efficient cross-modal retrieval. In: ICME, pp 1006–1011
Wang Y, Luo X, Nie L, Song J, Zhang W, Xu X (2020) Batch: a scalable asymmetric discrete cross-modal hashing TKDE
Wu G, Lin Z, Han J, Liu L, Ding G, Zhang B, Shen J (2018) Unsupervised deep hashing via binary latent factor models for large-scale cross-modal retrieval. In: IJCAI, pp 2854–2860
Xu X, Shen F, Yang Y, Shen HT, Li X (2017) Learning discriminative binary codes for large-scale cross-modal retrieval. TIP 26(5):2494–2507
MathSciNet MATH Google Scholar
Yang D, Wu D, Zhang W, Zhang H, Li B, Wang W (2020) Deep semantic-alignment hashing for unsupervised cross-modal retrieval. In: ICMR, pp 44–52
Yang E, Deng C, Liu W, Liu X, Tao D, Gao X (2017) Pairwise relationship guided deep hashing for cross-modal retrieval. In: AAAI, pp 1618–1625
Yang Z, Lin Z, Kang P, Lv J, Li Q, Liu W (2020) Learning shared semantic space with correlation alignment for cross-modal event retrieval. TOMM 16(1):1–22
Article Google Scholar
Yang Z, Long J, Zhu L, Huang W (2020) Nonlinear robust discrete hashing for cross-modal retrieval. In: SIGIR, pp 1349–1358
Zhang D, Li W (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: AAAI, pp 2177–2183
Zhen L, Hu P, Wang X, Peng D (2019) Deepsupervised cross-modal retrieval. In: CVPR, pp 10,394–10,403

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 62076073, No.61902077, No.62006048, No.61772141, and No.61972102), the Guangdong Basic and Applied Basic Research Foundation (No. 2020A1515010616), Science and Technology Program of Guangzhou (No.202102020524, No.201903010107, No.201802010042), the Guangdong Innovative Research Team Program (No.2014ZT05G157), Special Funds for the Cultivation of Guangdong College Students’ Scientific and Technological Innovation (pdjh2020a0173), and the Key-Area Research and Development Program of Guangdong Province (2019B010136001), and the Science and Technology Planning Project of Guangdong Province (No.LZC0023, No.2019B020208001 and No.2019B110210002).

Author information

Authors and Affiliations

School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China
Peipei Kang, Zhenguo Yang & Wenyin Liu
Computer Science Department, Technion - Israel Institute of Technology, Haifa, Israel
Peipei Kang & Alexander M. Bronstein
Department of Computing, Hong Kong Polytechnic University, Hong Kong, China
Zehang Lin & Qing Li

Authors

Peipei Kang
View author publications
You can also search for this author inPubMed Google Scholar
Zehang Lin
View author publications
You can also search for this author inPubMed Google Scholar
Zhenguo Yang
View author publications
You can also search for this author inPubMed Google Scholar
Alexander M. Bronstein
View author publications
You can also search for this author inPubMed Google Scholar
Qing Li
View author publications
You can also search for this author inPubMed Google Scholar
Wenyin Liu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to Zhenguo Yang or Wenyin Liu.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kang, P., Lin, Z., Yang, Z. et al. Deep fused two-step cross-modal hashing with multiple semantic supervision. Multimed Tools Appl 81, 15653–15670 (2022). https://doi.org/10.1007/s11042-022-12187-6

Download citation

Received: 21 December 2020
Revised: 11 July 2021
Accepted: 10 January 2022
Published: 28 February 2022
Issue Date: May 2022
DOI: https://doi.org/10.1007/s11042-022-12187-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep fused two-step cross-modal hashing with multiple semantic supervision

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Unsupervised multi-perspective fusing semantic alignment for cross-modal hashing retrieval

Pseudo-label driven deep hashing for unsupervised cross-modal retrieval

Deep Consistency Preserving Network for Unsupervised Cross-Modal Hashing

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now