Abstract
Existing cross-modal hashing methods ignore the informative multimodal joint information and cannot fully exploit the semantic labels. In this paper, we propose a deep fused two-step cross-modal hashing (DFTH) framework with multiple semantic supervision. In the first step, DFTH learns unified hash codes for instances by a fusion network. Semantic label and similarity reconstruction have been introduced to acquire binary codes that are informative, discriminative and semantic similarity preserving. In the second step, two modality-specific hash networks are learned under the supervision of common hash codes reconstruction, label reconstruction, and intra-modal and inter-modal semantic similarity reconstruction. The modality-specific hash networks can generate semantic preserving binary codes for out-of-sample queries. To deal with the vanishing gradients of binarization, continuous differentiable tanh is introduced to approximate the discrete sign function, making the networks able to back-propagate by automatic gradient computation. Extensive experiments on MIRFlickr25K and NUS-WIDE show the superiority of DFTH over state-of-the-art methods.





Similar content being viewed by others
References
Cao Y, Liu B, Long M, Wang J (2018) Cross-modal hamming hashing. In: ECCV, pp 202–218
Cao Z, Long M, Wang J, Yu PS (2017) Hashnet: Deep learning to hash by continuation. In: ICCV, pp 5608–5617
Chen ZD, Wang Y, Li H, Luo X, Nie L, Xu X (2019) A two-step cross-modal hashing by exploiting label correlations and preserving similarity in both steps. In: ACMMM, pp 1694–1702
Chen ZD, Yu WJ, Li C, Nie L, Xu X (2018) Dual deep neural networks cross-modal hashing. In: AAAI, pp 274–281
Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of singapore. In: CIVR, pp 1–9
Ding G, Guo Y, Zhou J, Gao Y (2016) Large-scale cross-modality search via collective matrix factorization hashing. TIP 25(11):5427–5440
Erin Liong V, Lu J, Tan YP, Zhou J (2017) Cross-modal deep variational hashing. In: ICCV, pp 4077–4085
Hu D, Nie F, Li X (2018) Deep binary reconstruction for cross-modal hashing. TMM 21(4):973–985
Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation. In: ACMMM, pp 39–43
Jiang QY, Li W (2017) Deep cross-modal hashing. In: CVPR, pp 3232–3240
Kang P, Lin Z, Yang Z, Fang X, Li Q, Liu W (2019) Deep semantic space with intra-class low-rank constraint for cross-modal retrieval. In: ICMR, pp 226–234
Li C, Deng C, Li N, Liu W, Gao X, Tao D (2018) Self-supervised adversarial hashing networks for cross-modal retrieval. In: CVPR, pp 4242–4251
Li C, Chen ZD, Zhang PF, Luo X, Nie L, Zhang W, Xu X (2018) Scratch: A scalable discrete matrix factorization hashing for cross-modal retrieval. In: ACMMM, pp 1–9
Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: CVPR, pp 3864–3872
Liu H, Ji R, Wu Y, Huang F, Zhang B (2017) Cross-modality binary code learning via fusion similarity hashing. In: CVPR, pp 7380–7388
Liu X, Hu Z, Ling H, Cheung Ym (2019) Mtfh: a matrix tri-factorization hashing framework for efficient cross-modal retrieval TPAMI
Liu X, Nie X, Zeng W, Cui C, Zhu L, Yin Y (2018) Fast discrete cross-modal hashing with regressing from semantic labels. In: ACMMM, pp 1662–1669
Liu X, Yu G, Domeniconi C, Wang J, Ren Y, Guo M (2019) Ranking-based deep cross-modal hashing. In: AAAI, pp 4400–4407
Luo X, Yin XY, Nie L, Song X, Wang Y, Xu X (2018) Sdmch: supervised discrete manifold-embedded cross-modal hashing. In: IJCAI, pp 2518–2524
Peng Y, Huang X, Zhao Y (2017) An overview of cross-media retrieval: concepts, methodologies, benchmarks, and challenges. TCSVT 28(9):2372–2385
Shen HT, Liu L, Yang Y, Xu X, Huang Z, Shen F, Hong R (2020) Exploiting subspace relation in semantic labels for cross-modal hashing TKDE
Su S, Zhong Z, Zhang C (2019) Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In: ICCV, pp 3027–3035
Wang D, Wang Q, An Y, Gao X, Tian Y (2020) Online collective matrix factorization hashing for large-scale cross-media retrieval. In: SIGIR, pp 1409–1418
Wang H, Chen H, Meng M, Wu J (2019) Robust multi-view hashing for cross-modal retrieval. In: ICME, pp 1012–1017
Wang K, Yin Q, Wang W, Wu S, Wang L (2016) A comprehensive survey on cross-modal retrieval. arXiv:1607.06215
Wang L, Zhu L, Yu E, Sun J, Zhang H (2019) Fusion-supervised deep cross-modal hashing. In: ICME, pp 37–42
Wang Q, Huang W, Zhang X, Li X (2020) Word-sentence framework for remote sensing image captioning. IEEE Transactions on Geoscience and Remote Sensing
Wang Q, Zhu G, Yuan Y (2014) Statistical quantization for similarity search. Comput Vis Image Underst 124:22–30
Wang X, Liu X, Hu Z, Wang N, Fan W, Du JX (2019) Semi-supervised semantic-preserving hashing for efficient cross-modal retrieval. In: ICME, pp 1006–1011
Wang Y, Luo X, Nie L, Song J, Zhang W, Xu X (2020) Batch: a scalable asymmetric discrete cross-modal hashing TKDE
Wu G, Lin Z, Han J, Liu L, Ding G, Zhang B, Shen J (2018) Unsupervised deep hashing via binary latent factor models for large-scale cross-modal retrieval. In: IJCAI, pp 2854–2860
Xu X, Shen F, Yang Y, Shen HT, Li X (2017) Learning discriminative binary codes for large-scale cross-modal retrieval. TIP 26(5):2494–2507
Yang D, Wu D, Zhang W, Zhang H, Li B, Wang W (2020) Deep semantic-alignment hashing for unsupervised cross-modal retrieval. In: ICMR, pp 44–52
Yang E, Deng C, Liu W, Liu X, Tao D, Gao X (2017) Pairwise relationship guided deep hashing for cross-modal retrieval. In: AAAI, pp 1618–1625
Yang Z, Lin Z, Kang P, Lv J, Li Q, Liu W (2020) Learning shared semantic space with correlation alignment for cross-modal event retrieval. TOMM 16(1):1–22
Yang Z, Long J, Zhu L, Huang W (2020) Nonlinear robust discrete hashing for cross-modal retrieval. In: SIGIR, pp 1349–1358
Zhang D, Li W (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: AAAI, pp 2177–2183
Zhen L, Hu P, Wang X, Peng D (2019) Deepsupervised cross-modal retrieval. In: CVPR, pp 10,394–10,403
Acknowledgements
This work is supported by the National Natural Science Foundation of China (No. 62076073, No.61902077, No.62006048, No.61772141, and No.61972102), the Guangdong Basic and Applied Basic Research Foundation (No. 2020A1515010616), Science and Technology Program of Guangzhou (No.202102020524, No.201903010107, No.201802010042), the Guangdong Innovative Research Team Program (No.2014ZT05G157), Special Funds for the Cultivation of Guangdong College Students’ Scientific and Technological Innovation (pdjh2020a0173), and the Key-Area Research and Development Program of Guangdong Province (2019B010136001), and the Science and Technology Planning Project of Guangdong Province (No.LZC0023, No.2019B020208001 and No.2019B110210002).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kang, P., Lin, Z., Yang, Z. et al. Deep fused two-step cross-modal hashing with multiple semantic supervision. Multimed Tools Appl 81, 15653–15670 (2022). https://doi.org/10.1007/s11042-022-12187-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12187-6