Skip to main content
Log in

Deep fused two-step cross-modal hashing with multiple semantic supervision

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Existing cross-modal hashing methods ignore the informative multimodal joint information and cannot fully exploit the semantic labels. In this paper, we propose a deep fused two-step cross-modal hashing (DFTH) framework with multiple semantic supervision. In the first step, DFTH learns unified hash codes for instances by a fusion network. Semantic label and similarity reconstruction have been introduced to acquire binary codes that are informative, discriminative and semantic similarity preserving. In the second step, two modality-specific hash networks are learned under the supervision of common hash codes reconstruction, label reconstruction, and intra-modal and inter-modal semantic similarity reconstruction. The modality-specific hash networks can generate semantic preserving binary codes for out-of-sample queries. To deal with the vanishing gradients of binarization, continuous differentiable tanh is introduced to approximate the discrete sign function, making the networks able to back-propagate by automatic gradient computation. Extensive experiments on MIRFlickr25K and NUS-WIDE show the superiority of DFTH over state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Cao Y, Liu B, Long M, Wang J (2018) Cross-modal hamming hashing. In: ECCV, pp 202–218

  2. Cao Z, Long M, Wang J, Yu PS (2017) Hashnet: Deep learning to hash by continuation. In: ICCV, pp 5608–5617

  3. Chen ZD, Wang Y, Li H, Luo X, Nie L, Xu X (2019) A two-step cross-modal hashing by exploiting label correlations and preserving similarity in both steps. In: ACMMM, pp 1694–1702

  4. Chen ZD, Yu WJ, Li C, Nie L, Xu X (2018) Dual deep neural networks cross-modal hashing. In: AAAI, pp 274–281

  5. Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of singapore. In: CIVR, pp 1–9

  6. Ding G, Guo Y, Zhou J, Gao Y (2016) Large-scale cross-modality search via collective matrix factorization hashing. TIP 25(11):5427–5440

    MathSciNet  MATH  Google Scholar 

  7. Erin Liong V, Lu J, Tan YP, Zhou J (2017) Cross-modal deep variational hashing. In: ICCV, pp 4077–4085

  8. Hu D, Nie F, Li X (2018) Deep binary reconstruction for cross-modal hashing. TMM 21(4):973–985

    Google Scholar 

  9. Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation. In: ACMMM, pp 39–43

  10. Jiang QY, Li W (2017) Deep cross-modal hashing. In: CVPR, pp 3232–3240

  11. Kang P, Lin Z, Yang Z, Fang X, Li Q, Liu W (2019) Deep semantic space with intra-class low-rank constraint for cross-modal retrieval. In: ICMR, pp 226–234

  12. Li C, Deng C, Li N, Liu W, Gao X, Tao D (2018) Self-supervised adversarial hashing networks for cross-modal retrieval. In: CVPR, pp 4242–4251

  13. Li C, Chen ZD, Zhang PF, Luo X, Nie L, Zhang W, Xu X (2018) Scratch: A scalable discrete matrix factorization hashing for cross-modal retrieval. In: ACMMM, pp 1–9

  14. Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: CVPR, pp 3864–3872

  15. Liu H, Ji R, Wu Y, Huang F, Zhang B (2017) Cross-modality binary code learning via fusion similarity hashing. In: CVPR, pp 7380–7388

  16. Liu X, Hu Z, Ling H, Cheung Ym (2019) Mtfh: a matrix tri-factorization hashing framework for efficient cross-modal retrieval TPAMI

  17. Liu X, Nie X, Zeng W, Cui C, Zhu L, Yin Y (2018) Fast discrete cross-modal hashing with regressing from semantic labels. In: ACMMM, pp 1662–1669

  18. Liu X, Yu G, Domeniconi C, Wang J, Ren Y, Guo M (2019) Ranking-based deep cross-modal hashing. In: AAAI, pp 4400–4407

  19. Luo X, Yin XY, Nie L, Song X, Wang Y, Xu X (2018) Sdmch: supervised discrete manifold-embedded cross-modal hashing. In: IJCAI, pp 2518–2524

  20. Peng Y, Huang X, Zhao Y (2017) An overview of cross-media retrieval: concepts, methodologies, benchmarks, and challenges. TCSVT 28(9):2372–2385

    Google Scholar 

  21. Shen HT, Liu L, Yang Y, Xu X, Huang Z, Shen F, Hong R (2020) Exploiting subspace relation in semantic labels for cross-modal hashing TKDE

  22. Su S, Zhong Z, Zhang C (2019) Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In: ICCV, pp 3027–3035

  23. Wang D, Wang Q, An Y, Gao X, Tian Y (2020) Online collective matrix factorization hashing for large-scale cross-media retrieval. In: SIGIR, pp 1409–1418

  24. Wang H, Chen H, Meng M, Wu J (2019) Robust multi-view hashing for cross-modal retrieval. In: ICME, pp 1012–1017

  25. Wang K, Yin Q, Wang W, Wu S, Wang L (2016) A comprehensive survey on cross-modal retrieval. arXiv:1607.06215

  26. Wang L, Zhu L, Yu E, Sun J, Zhang H (2019) Fusion-supervised deep cross-modal hashing. In: ICME, pp 37–42

  27. Wang Q, Huang W, Zhang X, Li X (2020) Word-sentence framework for remote sensing image captioning. IEEE Transactions on Geoscience and Remote Sensing

  28. Wang Q, Zhu G, Yuan Y (2014) Statistical quantization for similarity search. Comput Vis Image Underst 124:22–30

    Article  Google Scholar 

  29. Wang X, Liu X, Hu Z, Wang N, Fan W, Du JX (2019) Semi-supervised semantic-preserving hashing for efficient cross-modal retrieval. In: ICME, pp 1006–1011

  30. Wang Y, Luo X, Nie L, Song J, Zhang W, Xu X (2020) Batch: a scalable asymmetric discrete cross-modal hashing TKDE

  31. Wu G, Lin Z, Han J, Liu L, Ding G, Zhang B, Shen J (2018) Unsupervised deep hashing via binary latent factor models for large-scale cross-modal retrieval. In: IJCAI, pp 2854–2860

  32. Xu X, Shen F, Yang Y, Shen HT, Li X (2017) Learning discriminative binary codes for large-scale cross-modal retrieval. TIP 26(5):2494–2507

    MathSciNet  MATH  Google Scholar 

  33. Yang D, Wu D, Zhang W, Zhang H, Li B, Wang W (2020) Deep semantic-alignment hashing for unsupervised cross-modal retrieval. In: ICMR, pp 44–52

  34. Yang E, Deng C, Liu W, Liu X, Tao D, Gao X (2017) Pairwise relationship guided deep hashing for cross-modal retrieval. In: AAAI, pp 1618–1625

  35. Yang Z, Lin Z, Kang P, Lv J, Li Q, Liu W (2020) Learning shared semantic space with correlation alignment for cross-modal event retrieval. TOMM 16(1):1–22

    Article  Google Scholar 

  36. Yang Z, Long J, Zhu L, Huang W (2020) Nonlinear robust discrete hashing for cross-modal retrieval. In: SIGIR, pp 1349–1358

  37. Zhang D, Li W (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: AAAI, pp 2177–2183

  38. Zhen L, Hu P, Wang X, Peng D (2019) Deepsupervised cross-modal retrieval. In: CVPR, pp 10,394–10,403

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 62076073, No.61902077, No.62006048, No.61772141, and No.61972102), the Guangdong Basic and Applied Basic Research Foundation (No. 2020A1515010616), Science and Technology Program of Guangzhou (No.202102020524, No.201903010107, No.201802010042), the Guangdong Innovative Research Team Program (No.2014ZT05G157), Special Funds for the Cultivation of Guangdong College Students’ Scientific and Technological Innovation (pdjh2020a0173), and the Key-Area Research and Development Program of Guangdong Province (2019B010136001), and the Science and Technology Planning Project of Guangdong Province (No.LZC0023, No.2019B020208001 and No.2019B110210002).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhenguo Yang or Wenyin Liu.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kang, P., Lin, Z., Yang, Z. et al. Deep fused two-step cross-modal hashing with multiple semantic supervision. Multimed Tools Appl 81, 15653–15670 (2022). https://doi.org/10.1007/s11042-022-12187-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12187-6

Keywords

Navigation