Self-supervised learning-based weight adaptive hashing for fast cross-modal retrieval

Li, Yifan; Wang, Xuan; Qi, Shuhan; Huang, Chengkai; Jiang, Zoe. L; Liao, Qing; Guan, Jian; Zhang, Jiajia

doi:10.1007/s11760-019-01534-0

Self-supervised learning-based weight adaptive hashing for fast cross-modal retrieval

Original Paper
Published: 19 July 2019

Volume 15, pages 673–680, (2021)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Yifan Li¹,
Xuan Wang¹,
Shuhan Qi ORCID: orcid.org/0000-0002-6903-145X¹,
Chengkai Huang¹,
Zoe. L Jiang¹,
Qing Liao¹,
Jian Guan² &
…
Jiajia Zhang³

704 Accesses
6 Citations
Explore all metrics

Abstract

Due to the low storage cost and fast search speed, hashing is widely used in cross-modal retrieval. However, there still remain some crucial bottlenecks: Firstly, there are not suitable big datasets for multimodal data. Secondly, imbalance instances will affect the accuracy of the retrieval system. In this paper, we propose an end-to-end self-supervised learning-based weight adaptive hashing method for cross-modal retrieval. For the restriction of datasets, we use the self-supervised fashion to directly extract fine-grained features from labels and use them to supervise the hashing learning of other modalities. To overcome the problem of imbalance instances, we design an adaptive weight loss to flexibly adjust the weight of training samples according to their proportions. Besides these, we also use a binary approximation regularization term to reduce the regularization error. Experiments on MIRFLICKR-25K and NUS-WIDE datasets show that our method can improve 3% performance compared to other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Supervised Discriminative Discrete Hashing for Cross-Modal Retrieval

Semi-supervised discrete hashing for efficient cross-modal retrieval

Article 01 July 2020

Xingzhi Wang, Xin Liu, … Ji-Xiang Du

Online Discriminative Semantic-Preserving Hashing for Large-Scale Cross-Modal Retrieval

References

Zhao, S., Ding, G., Gao, Y., et al.: Learning visual emotion distributions via multi-modal features fusion. In: Proceedings of the ACM International Conference on Multimedia, pp. 369–377. ACM (2017)
Pereira, J.C., Coviello, E., Doyle, G., et al.: On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 521–535 (2014)
Article Google Scholar
Ranjan, V., Rasiwasia, N., Jawahar, C.V.: Multi-label cross-modal retrieval. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4094–4102. IEEE (2015)
Rasiwasia, N., Costa Pereira, J., Coviello, E., et al.: A new approach to cross-modal multimedia retrieval. In: Proceedings of the ACM International Conference on Multimedia, pp. 251–260. ACM (2010)
Hotelling, H.: Relations between two sets of variates. Biometrika 28(3/4), 321–377 (1936)
Article Google Scholar
Grangier, D., Bengio, S.: A discriminative kernel-based approach to rank images from text queries. IEEE Trans. Pattern Anal. Mach. Intell. 30(8), 1371–1384 (2008)
Article Google Scholar
Wu, F., Lu, X., Zhang, Z., et al.: Cross-media semantic representation via bi-directional learning to rank. In: Proceedings of the ACM International Conference on Multimedia, pp. 877–886. ACM (2013)
Jiang, X., Wu, F., Li, X., et al.: Deep compositional cross-modal learning to rank via local-global alignment. In: Proceedings of the ACM International Conference on Multimedia, pp. 69–78. ACM (2015)
Wu, F., Jiang, X., Li, X., et al.: Cross-modal learning to rank via latent joint representation. IEEE Trans. Image Process. 24(5), 1497–1509 (2015)
Article MathSciNet Google Scholar
Kumar, S., Udupa, R.: Learning hash functions for cross-view similarity search. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1360–1365 (2011)
Zhang, D., Wang, F., Si, L.: Composite hashing with multiple information sources. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 225–234. ACM (2011)
Yu, Z., Wu, F., Yang, Y., et al.: Discriminative coupled dictionary hashing for fast cross-media retrieval. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 395–404. ACM (2014)
Long, M., Cao, Y., Wang, J., et al.: Composite correlation quantization for efficient multimodal retrieval. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 579–588. ACM (2016)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Wang, F., Qi, S., Gao, G., et al.: Logo information recognition in large-scale social media data. Multimed. Syst. 22(1), 63–73 (2016)
Article Google Scholar
Yao, H., Zhao, S., Zhao, S., et al.: Multi-modal microblog classification via multi-task learning. Multimed. Tools Appl. 75(15), 8921–8938 (2016)
Article Google Scholar
Zhao, S., Ding, G., Gao, Y., et al.: Approximating discrete probability distribution of image emotions by multi-modal features fusion. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 4669–4675 (2017)
Jiang, Q.Y., Li, W.J.: Deep cross-modal hashing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3232–3240. IEEE (2017)
Zhuang, Y., Yu, Z., Wang, W., et al.: Cross-media hashing with neural networks. In: Proceedings of the ACM International Conference on Multimedia, pp. 901-904. ACM (2014)
Wang, D., et al.: Deep multimodal hashing with orthogonal regularization. Proc. Int. Joint Conf. Artif. Intell. 367, 2291–2297 (2015)
Google Scholar
Zhao, S., Gao, Y., Ding, G., et al.: Real-time multimedia social event detection in microblog. IEEE Trans. Cybern. 99, 1–14 (2017)
Google Scholar
Cao, Y., Long, M., Wang, J., et al.: Correlation autoencoder hashing for supervised cross-modal search. In: Proceedings of the ACM International Conference on Multimedia Retrieval, pp. 197–204. ACM (2016)
Wang, D., Cui, P., Ou, M., et al.: Learning compact hash codes for multimodal representations using orthogonal deep structure. IEEE Trans. Multimed. 17(9), 1404–1416 (2015)
Article Google Scholar
Zhen, Y., Yeung, D. Y.: Co-regularized hashing for multimodal data. In: Proceedings of the Advances in Neural Information Processing Systems, pp. 1376–1384. IEEE (2012)
Sun, L., Ji, S., Ye, J.: A least squares formulation for canonical correlation analysis. In: Proceedings of the International Conference on Machine Learning, pp. 1024–1031. ACM (2008)
Ding, G., Guo, Y., Zhou, J.: Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2075–2082. IEEE (2014)
Song, J., Yang, Y., Yang, Y., et al.: Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 785–796. ACM (2013)
Bronstein, M.M., Bronstein, A.M., Michel, F., et al.: Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3594–3601. IEEE (2010)
Zhang, D., Li, W.: Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the AAAI Conference on Artificial Intelligence. AAAI (2014)
Cao, Y., Long, M., Wang, J., et al.: Deep visual-semantic hashing for cross-modal retrieval. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1445–1454. ACM (2016)
Yang, E., Deng, C., Liu, W., et al.: Pairwise relationship guided deep hashing for cross-modal retrieval. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1618–1625 (2017)
Pathak, D., Agrawal, P., Efros, A.A., et al.: Curiosity-driven exploration by self-supervised prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 16–17 (2017)
Doersch, C., Zisserman, A.: Multi-task self-supervised visual learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2051–2060 (2017)
Fernando, B., Bilen, H., Gavves, E., et al.: Self-supervised video representation learning with odd-one-out networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3636–3645 (2017)
Zeng, A., Yu, K.T., Song, S., et al.: Multi-view self-supervised deep learning for 6d pose estimation in the amazon picking challenge. In: IEEE International Conference on Robotics and Automation, pp. 1386–1383. IEEE (2017)
Deng, J., Dong, W., Socher, R., et al.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Lin, T.Y., Maire, M., Belongie, S., et al.: Microsoft coco: common objects in context. In: Proceedings of the European Conference on Computer Vision, pp. 740–755. Springer, Cham (2014)
Lin, Z., Ding, G., Hu, M., et al.: Semantics-preserving hashing for cross-view retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3864–3872. IEEE (2015)
Wang, D., Gao, X., Wang, X., et al.: Semantic topic multimodal hashing for cross-media retrieval. In: Proceedings of the International Joint Conference on Artificial Intelligence, pp. 3890–3896 (2015)
Huiskes, M.J., Lew, M.S.: The MIR flickr retrieval evaluation. In: Proceedings of the ACM International Conference on Multimedia Information Retrieval, pp. 39–43. ACM (2008)
Chua, T.S., Tang, J., Hong, R., et al.: NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of the ACM International Conference on Image and Video Retrieval, p. 48. ACM (2009)

Download references

Acknowledgements

This work is supported by Science and Technology Planning Project of Guangdong Province (2016A040403046) and Key Technology Program of Shenzhen, China (No. JSGG20170823152809704, No. JSGG20170824163239586).

Author information

Authors and Affiliations

Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China
Yifan Li, Xuan Wang, Shuhan Qi, Chengkai Huang, Zoe. L Jiang & Qing Liao
College of Computer Science and Technology, Harbin Engineering University, Harbin, China
Jian Guan
Shenzhen Cyberspace Laboratory, Shenzhen, China
Jiajia Zhang

Authors

Yifan Li
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shuhan Qi
View author publications
You can also search for this author in PubMed Google Scholar
Chengkai Huang
View author publications
You can also search for this author in PubMed Google Scholar
Zoe. L Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Qing Liao
View author publications
You can also search for this author in PubMed Google Scholar
Jian Guan
View author publications
You can also search for this author in PubMed Google Scholar
Jiajia Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuhan Qi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Y., Wang, X., Qi, S. et al. Self-supervised learning-based weight adaptive hashing for fast cross-modal retrieval. SIViP 15, 673–680 (2021). https://doi.org/10.1007/s11760-019-01534-0

Download citation

Received: 31 January 2019
Revised: 24 June 2019
Accepted: 03 July 2019
Published: 19 July 2019
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11760-019-01534-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-supervised learning-based weight adaptive hashing for fast cross-modal retrieval

Abstract

Access this article

Similar content being viewed by others

Supervised Discriminative Discrete Hashing for Cross-Modal Retrieval

Semi-supervised discrete hashing for efficient cross-modal retrieval

Online Discriminative Semantic-Preserving Hashing for Large-Scale Cross-Modal Retrieval

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Self-supervised learning-based weight adaptive hashing for fast cross-modal retrieval

Abstract

Access this article

Similar content being viewed by others

Supervised Discriminative Discrete Hashing for Cross-Modal Retrieval

Semi-supervised discrete hashing for efficient cross-modal retrieval

Online Discriminative Semantic-Preserving Hashing for Large-Scale Cross-Modal Retrieval

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation