Semantics-preserving hashing based on multi-scale fusion for cross-modal retrieval

Zhang, Hong; Pan, Min

doi:10.1007/s11042-020-09869-4

Semantics-preserving hashing based on multi-scale fusion for cross-modal retrieval

Published: 02 November 2020

Volume 80, pages 17299–17314, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Hong Zhang^1,2 &
Min Pan^1,2

394 Accesses
Explore all metrics

Abstract

Research on hash-based cross-modal retrieval has been a hotspot in the field of content-based multimedia retrieval research. Most deep cross-modal hashing methods only consider inter-modal loss that can remain local information of training data, and ignore the loss within data samples of the same modality that can remain the global information of dataset. In addition, they also ignore the factor that different scales of single modal data contain different semantic information, which affects the representation of data features. In this paper, we propose a semantics-preserving hashing method based on multi-scale fusion. More concretely, a multi-scale fusion pooling model is proposed for both image feature training network and text feature training network. Therefore, we can extract the multi-scale features of image dataset and solve the sparsity problem of text BOW vectors. When constructing the loss function, we consider intra-modal loss while considering inter-modal loss. Therefore, the output hash code retains both global and local underlying semantic correlation when image and text feature training network are trained. Experiment results on NUS-WIDE and MIRFlickr-25 K prove that against other existing methods, our algorithm improves cross-modal retrieval accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic Preservation and Hash Fusion Network for Unsupervised Cross-Modal Retrieval

CLIP-based fusion-modal reconstructing hashing for large-scale unsupervised cross-modal retrieval

Article 22 February 2023

Self-auxiliary Hashing for Unsupervised Cross Modal Retrieval

References

Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval[M]. ACM press, New York
Google Scholar
Bronstein M M, Bronstein A M, Michel F, et al. (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing[C]//2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, 3594-3601
Chua T S, Tang J, Hong R, et al. (2009) NUS-WIDE: a real-world web image database from National University of Singapore[C]//Proceedings of the ACM international conference on image and video retrieval. 1-9
Han Y, Wu F, Tian Q, Zhuang Y (2012) Graph-Guided Sparse Reconstruction for Region Tagging. IEEE Conference on Computer Vision and Pattern Recognition
He K, Zhang X, Ren S et al (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence 37(9):1904–1916
Article Google Scholar
He X, Peng Y, Xie L (2019) A new benchmark and approach for fine-grained cross-media retrieval[C]//Proceedings of the 27th ACM International Conference on Multimedia. 1740-1748
Huiskes MJ, Lew MS (2008) The MIR flickr retrieval evaluation[C]//Proceedings of the 1st ACM international conference on Multimedia information retrieval. 39-43
J Zhang J, Peng Y (2018) Query-adaptive image retrieval by deep-weighted hashing[J]. IEEE Transactions on Multimedia 20(9):2400–2414
Article Google Scholar
Jiang QY, Li WJ (2017) Deep cross-modal hashing[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 3232-3240
Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search[C]//Twenty-Second International Joint Conference on Artificial Intelligence
Li C, Deng C, Li N et al. (2018) Self-supervised adversarial hashing networks for cross-modal retrieval[C]//Proce-edings of the IEEE conference on computer vision and pattern recognition. 4242-4251
Lin Y, Zheng Z, Zhang H, et al. Bayesian query expansion for multi-camera person re-identification[J]. Pattern Recognition Letters, 2018.
Lin Z, Ding G, Hu M, et al. Semantics-preserving hashing for cross-view retrieval[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 3864-3872.
Long M, Cao Y, Wang J, et al. Composite correlation quantization for efficient multimodal retrieval[C]////Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 2016: 579-588.
Lu X, Chen Y, Li X (2017) Hierarchical recurrent neural hashing for image retrieval with hierarchical convolutional features[J]. IEEE Transactions on Image Processing 27(1):106–120
Article MathSciNet Google Scholar
Mu N, Xu X, Zhang X et al (2018) Salient object detection using a covariance-based CNN model in low-contrast images[J]. Neural Computing and Applications 29(8):181–192
Article Google Scholar
Peng Y, Huang X, Zhao Y (2017) An overview of cross-media retrieval: Concepts, methodologies, benchmarks, and challenges[J]. IEEE Transactions on circuits and systems for video technology 28(9):2372–2385
Article Google Scholar
Peng Y, Zhang J, Ye Z. Deep reinforcement learning for image hashing[J]. IEEE Transactions on Multimedia, 2019.
Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge[J]. International journal of computer vision 115(3):211–252
Article MathSciNet Google Scholar
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
Wang B, Yang Y, Xu X, et al. Adversarial cross-modal retrieval[C]//Proceedings of the 25th ACM international conference on Multimedia. 2017: 154-162.
Wu F, Han Y, Liu X et al (2012) The heterogeneous feature selection with structural sparsity for multimedia annotation and hashing: a survey[J]. International Journal of Multimedia Information Retrieval 1(1):3–15
Article MathSciNet Google Scholar
Xu Y, Han Y, Hong R et al (2018) Sequential video VLAD: Training the aggregation locally and temporally[J]. IEEE Transactions On Image Processing 27(10):4933–4944
Article MathSciNet Google Scholar
Yang E, Deng C, Liu W, et al. Pairwise relationship guided deep hashing for cross-modal retrieval[C]// Thirty-first AAAI conference on artificial intelligence. 2017.
Yang Y, Ma Z, Hauptmann AG et al (2012) Feature selection for multimedia analysis by sharing information among multiple tasks[J]. IEEE Transactions on Multimedia 15(3):661–669
Article Google Scholar
Ye Z, Peng Y. Multi-scale correlation for sequential cross-modal hashing learning[C]//Proceedings of the 26th ACM international conference on Multimedia. 2018: 852-860.
Zhaoda Ye and Yuxin Peng. 2019. Sequential Cross-Modal Hashing Learning via Multi-scale Correlation Mining. ACM Trans. Multimedia Comput. Commun. Appl. 15, 4, Article 105 (December 2019), 20 pages.
Yuan M, Peng Y. Text-to-image synthesis via symmetrical distillation networks[C]//Proceedings of the 26th ACM international conference on Multimedia. 2018: 1407-1415.
Yuwono B, Lee DL. Server ranking for distributed text retrieval systems on the internet[M]//Database Systems For Advanced Applications' 97. 1997: 41-49.
Zhang D, Li W J. Large-scale supervised multimodal hashing with semantic correlation maximization[C]//Twenty-Eighth AAAI Conference on Artificial Intelligence. 2014.
Zhang H, Wang T, Dai G (2020) Semi-supervised cross-modal common representation learning with vector-valued manifold regularization[J]. Pattern Recognition Letters 130:335–344
Article Google Scholar
Zhang J, Han Y, Jiang J (2017) Semi-supervised tensor learning for image classification[J]. Multimedia Systems 23(1):63–73
Article Google Scholar
Zhang J, Peng Y (2017) SSDH: semi-supervised deep hashing for large scale image retrieval[J]. IEEE Transactions on Circuits and Systems for Video Technology 29(1):212–225
Article Google Scholar
Zhang J, Peng Y (2019) Multi-pathway generative adversarial hashing for unsupervised cross-modal retrieval[J]. IEEE Transactions on Multimedia 22(1):174–187
Article MathSciNet Google Scholar
Zhuang Y, Yu Z, Wang W, et al. Cross-media hashing with neural networks[C]//Proceedings of the 22nd ACM international conference on Multimedia. 2014: 901-904.

Download references

Author information

Authors and Affiliations

College of Computer Science & Technology, Wuhan University of Science & Technology, Wuhan, 430081, China
Hong Zhang & Min Pan
Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan, China
Hong Zhang & Min Pan

Authors

Hong Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Min Pan
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Hong Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, H., Pan, M. Semantics-preserving hashing based on multi-scale fusion for cross-modal retrieval. Multimed Tools Appl 80, 17299–17314 (2021). https://doi.org/10.1007/s11042-020-09869-4

Download citation

Received: 13 March 2020
Revised: 18 August 2020
Accepted: 11 September 2020
Published: 02 November 2020
Issue Date: May 2021
DOI: https://doi.org/10.1007/s11042-020-09869-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantics-preserving hashing based on multi-scale fusion for cross-modal retrieval

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic Preservation and Hash Fusion Network for Unsupervised Cross-Modal Retrieval

CLIP-based fusion-modal reconstructing hashing for large-scale unsupervised cross-modal retrieval

Self-auxiliary Hashing for Unsupervised Cross Modal Retrieval

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now