Skip to main content
Log in

Narrowing the variance of variational cross-encoder for cross-modal hashing

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Cross-modal hashing which embeds data to binary codes is an efficient tool for retrieving heterogeneous but correlated multimedia data. In real applications, the sizes of queries are much larger than that of the training set and the queries may be dissimilar to training data, which lays bare the shortage of generalization of deterministic models, such as cross-encoder and autoencoder. In this paper, we design a variational cross-encoder (VCE), a generative model, to tackle this problem. At the bottleneck layer, the VCE outputs distributions parameterized by means and variances. As VCE can generate diversified data using noises, the proposed model can perform better in testing data. Ideally, each distribution is expected to describe a category of data and samples of this distribution are expected to generate data in the same category. Under this expectation, the means and variances can be used as real codes for input data. However, the generated data generally are not belonging to the same category as the input data. Hence, we add a penalty term on variance output of VCE and use means as real codes for further generating hashing codes. Experiments on three widely used datasets validate the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. http://www.svcl.ucsd.edu/projects/crossmodal/

  2. https://press.liacs.nl/mirflickr/mirdownload.html

  3. https://lms.comp.nus.edu.sg/wp-content/uploads/2019/research/nuswide/NUS-WIDE.html

References

  1. Tian, D., Wei, Y., Zhou, D.: Learning decorrelated hashing codes with label relaxation for multimodal retrieval. IEEE Access, 1 (2020)

  2. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)

    Article  Google Scholar 

  3. Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. Adv. Neural. Inf. Process. Syst. 21, 53–1760 (2008)

    Google Scholar 

  4. Liu, W., Wang, J., Chang, S.-f.: Hashing with graphs. In: International Conference on Machine Learning (2011)

  5. Yunchao, G., Lazebnik, S.: Iterative quantization: a procrustean approach to learning binary codes. IEEE Conf. Comput. Vis. Patt. Recogn. 35, 2916 (2011)

    Google Scholar 

  6. Shen, F., Shen, C., Shi, Q., Hengel, A.V.D., Tang, Z.: Inductive hashing on manifolds. IEEE Conf. Comput. Vis. Patt. Recogn. (2013). https://doi.org/10.4855/arXiv.1303.7043

    Article  MATH  Google Scholar 

  7. Zhu, X., Huang, Z., Shen, H.T., Zhao, X.: Linear cross-modal hashing for efficient multimedia search. In: Proceedings of ACM International Conference on Multimedia, pp. 143–152 (2013)

  8. Bronstein, M.M., Bronstein, A.M., Michel, F., Paragios, N.: Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3594–3601 (2010)

  9. Kumar, S., Udupa, R.: Learning hash functions for cross-view similarity search. Proceed. Int .Conf Artif. Intell. (2011). https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-230

    Article  Google Scholar 

  10. Zhen, Y., Yeung, D.-Y.: A probabilistic model for multimodal hash function learning. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 940–948 (2012)

  11. Lin, Z., Ding, G., Han, J., Wang, J.: Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Trans. Cybern. 47(12), 4342–4355 (2017)

    Article  Google Scholar 

  12. Zhang, D., Li, W.-J.: Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, pp. 2177–2183 (2014)

  13. Wang, D., Gao, X., Wang, X., He, L., Yuan, B.: Multimodal discriminative binary embedding for large-scale cross-modal retrieval. IEEE Trans. Image Process. 25(10), 4540–4554 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  14. Chen, Z., Zhong, F., Min, G., Leng, Y., Ying, Y.: Supervised intra- and inter-modality similarity preserving hashing for cross-modal retrieval. IEEE Access 6, 27796–27808 (2018)

    Article  Google Scholar 

  15. Liu, Y., Ji, S., Fu, Q., Chiu, D.K.W., Gong, M.: An efficient dual semantic preserving hashing for cross-modal retrieval. Neurocomputing 492, 264–277 (2022). https://doi.org/10.1016/j.neucom.2022.04.011

    Article  Google Scholar 

  16. Qin, J., Fei, L., Zhang, Z., Wen, J., Xu, Y., Zhang, D.: Joint specifics and consistency hash learning for large-scale cross-modal retrieval. IEEE Trans. Image Process. 31, 5343–5358 (2022). https://doi.org/10.1109/TIP.2022.3195059

    Article  Google Scholar 

  17. Wang, Y., Chen, Z.-D., Luo, X., Xu, X.-S.: A high-dimensional sparse hashing framework for cross-modal retrieval. IEEE Trans. Circuits Syst. Video Technol. 32(12), 8822–8836 (2022). https://doi.org/10.1109/TCSVT.2022.3195874

    Article  Google Scholar 

  18. Zhang, D., Wu, X.-J., Xu, T., Kittler, J.: Watch: Two-stage discrete cross-media hashing. IEEE Trans. Knowl. Data Eng. 35(6), 6461–6474 (2023). https://doi.org/10.1109/TKDE.2022.3159131

    Article  Google Scholar 

  19. Hoang, T., Do, T.-T., Nguyen, T.V., Cheung, N.-M.: Multimodal mutual information maximization: a novel approach for unsupervised deep cross-modal hashing. IEEE Trans. Neural Netw. Learn. Syst. (2022). https://doi.org/10.1109/TNNLS.2021.3135420

    Article  Google Scholar 

  20. Wang, L., Zhu, L., Yu, E., Sun, J., Zhang, H.: Fusion-supervised deep cross-modal hashing. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 37–42 (2019). https://doi.org/10.1109/ICME.2019.00015

  21. Yu, E., Ma, J., Sun, J., Chang, X., Zhang, H., Hauptmann, A.G.: Deep discrete cross-modal hashing with multiple supervision. Neurocomput. 486(C), 215–224 (2022). https://doi.org/10.1016/j.neucom.2021.11.035

    Article  Google Scholar 

  22. Li, J., Yu, E., Ma, J., Chang, X., Zhang, H., Sun, J.: Discrete fusion adversarial hashing for cross-modal retrieval. Knowl. Based Syst. 253, 109503 (2022). https://doi.org/10.1016/j.knosys.2022.109503

    Article  Google Scholar 

  23. Peng, Y., Huang, X., Qi, J.: Cross-media shared representation byhierarchical learning with multiple deep networks. Twenty-Fifth Int. Joint Conf. Artif. Intell. 3846, 3846–3853 (2016)

    Google Scholar 

  24. Yan, F., Mikolajczyk, K.: Deep correlation for matching images and text. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3441–3450 (2015)

  25. Hu, P., Wang, X., Zhen, L., Peng, D.: Separated variational hashing networks for cross-modal retrieval. In: Proceedings of the 27th ACM International Conference on Multimedia. MM ’19, pp. 1721–1729. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3343031.3351078

  26. Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence autoencoder. In: Proceedings of the 22nd ACM International Conference on Multimedia. MM ’14, pp. 7–16. Association for Computing Machinery, New York, NY, USA (2014). https://doi.org/10.1145/2647868.2654902

  27. Kingma, D.P., Welling, M.: Auto-Encoding Variational Bayes (2013)

  28. Liong, V.E., Lu, J., Duan, L., Tan, Y.: Deep variational and structural hashing. IEEE Trans. Pattern Anal. Mach. Intell. 42(3), 580–595 (2020)

    Article  Google Scholar 

  29. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch. In: NIPS-W (2017)

  30. Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization (2014)

  31. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)

    Article  Google Scholar 

  32. Huiskes, M.J., Lew, M.S.: The MIR flickr retrieval evaluation. In: Proceedings of the ACM International Conference on Multimedia Information Retrieval (2008)

  33. He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition (2015)

  34. Chua, T.-S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.-T.: Nus-wide: A real-world web image database from national university of singapore. In: Proceedings of ACM Conference on Image and Video Retrieval, pp. 48–1489 (2009)

  35. Ding, G., Guo, Y., Zhou, J.: Collective matrix factorization hashing for multimodal data. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2083–2090 (2014)

  36. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  37. Wang, D., Cui, P., Ou, M., Zhu, W.: Deep multimodal hashing with orthogonal regularization. In: Proceedings of International Joint Conference on Artificial Intelligence, pp. 2291–2297 (2015)

  38. Su, S., Zhong, Z., Zhang, C.: Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval, pp. 3027–3035 (2019)

  39. Li, C., Deng, C., Li, N., Liu, W., Gao, X., Tao, D.: Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval (2018)

  40. Zhang, D., Wu, X.-J.: Robust and discrete matrix factorization hashing for cross-modal retrieval. Pattern Recogn. (2022). https://doi.org/10.1016/j.patcog.2021.108343

    Article  Google Scholar 

  41. Zhang, C., Li, H., Gao, Y., Chen, C.: Weakly-supervised enhanced semantic-aware hashing for cross-modal retrieval. IEEE Trans. Knowl. Data Eng. 35(6), 6475–6488 (2023). https://doi.org/10.1109/TKDE.2022.3172216

    Article  Google Scholar 

Download references

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62076204; in part by the Natural Science Foundation of Shaanxi Province under Grant 2020JQ-197.

Author information

Authors and Affiliations

Authors

Contributions

DT conceived of the presented idea and performed the analytic calculations. DT, YC, YW and DZ contributed to the design and implementation of the research, to the analysis of the results and to the writing of the manuscript. All authors reviewed the manuscript

Corresponding author

Correspondence to Yiqin Cao.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Communicated by Y. Zhang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tian, D., Cao, Y., Wei, Y. et al. Narrowing the variance of variational cross-encoder for cross-modal hashing. Multimedia Systems 29, 3421–3430 (2023). https://doi.org/10.1007/s00530-023-01161-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-023-01161-3

Keywords