Narrowing the variance of variational cross-encoder for cross-modal hashing

Tian, Dayong; Cao, Yiqin; Wei, Yiwen; Zhou, Deyun

doi:10.1007/s00530-023-01161-3

Narrowing the variance of variational cross-encoder for cross-modal hashing

Regular Paper
Published: 31 August 2023

Volume 29, pages 3421–3430, (2023)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Dayong Tian¹,
Yiqin Cao¹,
Yiwen Wei² &
…
Deyun Zhou¹

153 Accesses
Explore all metrics

Abstract

Cross-modal hashing which embeds data to binary codes is an efficient tool for retrieving heterogeneous but correlated multimedia data. In real applications, the sizes of queries are much larger than that of the training set and the queries may be dissimilar to training data, which lays bare the shortage of generalization of deterministic models, such as cross-encoder and autoencoder. In this paper, we design a variational cross-encoder (VCE), a generative model, to tackle this problem. At the bottleneck layer, the VCE outputs distributions parameterized by means and variances. As VCE can generate diversified data using noises, the proposed model can perform better in testing data. Ideally, each distribution is expected to describe a category of data and samples of this distribution are expected to generate data in the same category. Under this expectation, the means and variances can be used as real codes for input data. However, the generated data generally are not belonging to the same category as the input data. Hence, we add a penalty term on variance output of VCE and use means as real codes for further generating hashing codes. Experiments on three widely used datasets validate the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Class Concentration with Twin Variational Autoencoders for Unsupervised Cross-Modal Hashing

A Binary Variational Autoencoder for Hashing

Self-supervised Bernoulli Autoencoders for Semi-supervised Hashing

Notes

References

Tian, D., Wei, Y., Zhou, D.: Learning decorrelated hashing codes with label relaxation for multimodal retrieval. IEEE Access, 1 (2020)
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117–122 (2008)
Article Google Scholar
Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. Adv. Neural. Inf. Process. Syst. 21, 53–1760 (2008)
Google Scholar
Liu, W., Wang, J., Chang, S.-f.: Hashing with graphs. In: International Conference on Machine Learning (2011)
Yunchao, G., Lazebnik, S.: Iterative quantization: a procrustean approach to learning binary codes. IEEE Conf. Comput. Vis. Patt. Recogn. 35, 2916 (2011)
Google Scholar
Shen, F., Shen, C., Shi, Q., Hengel, A.V.D., Tang, Z.: Inductive hashing on manifolds. IEEE Conf. Comput. Vis. Patt. Recogn. (2013). https://doi.org/10.4855/arXiv.1303.7043
Article MATH Google Scholar
Zhu, X., Huang, Z., Shen, H.T., Zhao, X.: Linear cross-modal hashing for efficient multimedia search. In: Proceedings of ACM International Conference on Multimedia, pp. 143–152 (2013)
Bronstein, M.M., Bronstein, A.M., Michel, F., Paragios, N.: Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3594–3601 (2010)
Kumar, S., Udupa, R.: Learning hash functions for cross-view similarity search. Proceed. Int .Conf Artif. Intell. (2011). https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-230
Article Google Scholar
Zhen, Y., Yeung, D.-Y.: A probabilistic model for multimodal hash function learning. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 940–948 (2012)
Lin, Z., Ding, G., Han, J., Wang, J.: Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Trans. Cybern. 47(12), 4342–4355 (2017)
Article Google Scholar
Zhang, D., Li, W.-J.: Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, pp. 2177–2183 (2014)
Wang, D., Gao, X., Wang, X., He, L., Yuan, B.: Multimodal discriminative binary embedding for large-scale cross-modal retrieval. IEEE Trans. Image Process. 25(10), 4540–4554 (2016)
Article MathSciNet MATH Google Scholar
Chen, Z., Zhong, F., Min, G., Leng, Y., Ying, Y.: Supervised intra- and inter-modality similarity preserving hashing for cross-modal retrieval. IEEE Access 6, 27796–27808 (2018)
Article Google Scholar
Liu, Y., Ji, S., Fu, Q., Chiu, D.K.W., Gong, M.: An efficient dual semantic preserving hashing for cross-modal retrieval. Neurocomputing 492, 264–277 (2022). https://doi.org/10.1016/j.neucom.2022.04.011
Article Google Scholar
Qin, J., Fei, L., Zhang, Z., Wen, J., Xu, Y., Zhang, D.: Joint specifics and consistency hash learning for large-scale cross-modal retrieval. IEEE Trans. Image Process. 31, 5343–5358 (2022). https://doi.org/10.1109/TIP.2022.3195059
Article Google Scholar
Wang, Y., Chen, Z.-D., Luo, X., Xu, X.-S.: A high-dimensional sparse hashing framework for cross-modal retrieval. IEEE Trans. Circuits Syst. Video Technol. 32(12), 8822–8836 (2022). https://doi.org/10.1109/TCSVT.2022.3195874
Article Google Scholar
Zhang, D., Wu, X.-J., Xu, T., Kittler, J.: Watch: Two-stage discrete cross-media hashing. IEEE Trans. Knowl. Data Eng. 35(6), 6461–6474 (2023). https://doi.org/10.1109/TKDE.2022.3159131
Article Google Scholar
Hoang, T., Do, T.-T., Nguyen, T.V., Cheung, N.-M.: Multimodal mutual information maximization: a novel approach for unsupervised deep cross-modal hashing. IEEE Trans. Neural Netw. Learn. Syst. (2022). https://doi.org/10.1109/TNNLS.2021.3135420
Article Google Scholar
Wang, L., Zhu, L., Yu, E., Sun, J., Zhang, H.: Fusion-supervised deep cross-modal hashing. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 37–42 (2019). https://doi.org/10.1109/ICME.2019.00015
Yu, E., Ma, J., Sun, J., Chang, X., Zhang, H., Hauptmann, A.G.: Deep discrete cross-modal hashing with multiple supervision. Neurocomput. 486(C), 215–224 (2022). https://doi.org/10.1016/j.neucom.2021.11.035
Article Google Scholar
Li, J., Yu, E., Ma, J., Chang, X., Zhang, H., Sun, J.: Discrete fusion adversarial hashing for cross-modal retrieval. Knowl. Based Syst. 253, 109503 (2022). https://doi.org/10.1016/j.knosys.2022.109503
Article Google Scholar
Peng, Y., Huang, X., Qi, J.: Cross-media shared representation byhierarchical learning with multiple deep networks. Twenty-Fifth Int. Joint Conf. Artif. Intell. 3846, 3846–3853 (2016)
Google Scholar
Yan, F., Mikolajczyk, K.: Deep correlation for matching images and text. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3441–3450 (2015)
Hu, P., Wang, X., Zhen, L., Peng, D.: Separated variational hashing networks for cross-modal retrieval. In: Proceedings of the 27th ACM International Conference on Multimedia. MM ’19, pp. 1721–1729. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3343031.3351078
Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence autoencoder. In: Proceedings of the 22nd ACM International Conference on Multimedia. MM ’14, pp. 7–16. Association for Computing Machinery, New York, NY, USA (2014). https://doi.org/10.1145/2647868.2654902
Kingma, D.P., Welling, M.: Auto-Encoding Variational Bayes (2013)
Liong, V.E., Lu, J., Duan, L., Tan, Y.: Deep variational and structural hashing. IEEE Trans. Pattern Anal. Mach. Intell. 42(3), 580–595 (2020)
Article Google Scholar
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch. In: NIPS-W (2017)
Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Article Google Scholar
Huiskes, M.J., Lew, M.S.: The MIR flickr retrieval evaluation. In: Proceedings of the ACM International Conference on Multimedia Information Retrieval (2008)
He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition (2015)
Chua, T.-S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.-T.: Nus-wide: A real-world web image database from national university of singapore. In: Proceedings of ACM Conference on Image and Video Retrieval, pp. 48–1489 (2009)
Ding, G., Guo, Y., Zhou, J.: Collective matrix factorization hashing for multimodal data. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2083–2090 (2014)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Wang, D., Cui, P., Ou, M., Zhu, W.: Deep multimodal hashing with orthogonal regularization. In: Proceedings of International Joint Conference on Artificial Intelligence, pp. 2291–2297 (2015)
Su, S., Zhong, Z., Zhang, C.: Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval, pp. 3027–3035 (2019)
Li, C., Deng, C., Li, N., Liu, W., Gao, X., Tao, D.: Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval (2018)
Zhang, D., Wu, X.-J.: Robust and discrete matrix factorization hashing for cross-modal retrieval. Pattern Recogn. (2022). https://doi.org/10.1016/j.patcog.2021.108343
Article Google Scholar
Zhang, C., Li, H., Gao, Y., Chen, C.: Weakly-supervised enhanced semantic-aware hashing for cross-modal retrieval. IEEE Trans. Knowl. Data Eng. 35(6), 6475–6488 (2023). https://doi.org/10.1109/TKDE.2022.3172216
Article Google Scholar

Download references

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62076204; in part by the Natural Science Foundation of Shaanxi Province under Grant 2020JQ-197.

Author information

Authors and Affiliations

School of Electronics and Information, Northwestern Polytechnical University, Dongxiang Road, Xi’an, 710129, Shaanxi, China
Dayong Tian, Yiqin Cao & Deyun Zhou
School of Physics and Optoelectronic Engineering, Xidian University, Taibai Road, Xi’an, 710071, Shaanxi, China
Yiwen Wei

Authors

Dayong Tian
View author publications
You can also search for this author inPubMed Google Scholar
Yiqin Cao
View author publications
You can also search for this author inPubMed Google Scholar
Yiwen Wei
View author publications
You can also search for this author inPubMed Google Scholar
Deyun Zhou
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

DT conceived of the presented idea and performed the analytic calculations. DT, YC, YW and DZ contributed to the design and implementation of the research, to the analysis of the results and to the writing of the manuscript. All authors reviewed the manuscript

Corresponding author

Correspondence to Yiqin Cao.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Communicated by Y. Zhang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tian, D., Cao, Y., Wei, Y. et al. Narrowing the variance of variational cross-encoder for cross-modal hashing. Multimedia Systems 29, 3421–3430 (2023). https://doi.org/10.1007/s00530-023-01161-3

Download citation

Received: 09 April 2022
Accepted: 10 August 2023
Published: 31 August 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s00530-023-01161-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Narrowing the variance of variational cross-encoder for cross-modal hashing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Class Concentration with Twin Variational Autoencoders for Unsupervised Cross-Modal Hashing

A Binary Variational Autoencoder for Hashing

Self-supervised Bernoulli Autoencoders for Semi-supervised Hashing

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now