Abstract
Cross-modal retrieval aims to find the similarity between different modal data, while the hash retrieval method improves retrieval efficiency. This paper proposes a cross-modal hash retrieval method based on multiple similarity matrices. This paper proposes an unsupervised cross-modal hash retrieval method based on multiple similarity matrices. This paper uses a weighted combination method to construct fusion features through hash features and original features. Based on the three features, the auxiliary similarity matrix of each of the three features is established. Finally, the fusion matrix is constructed by a weighted combination of the similarity matrix of the original features and the hash features. These four different matrices include similarity matrices with varying forms of features and similarity matrices with varying construction methods, which concentrate the similarity information of other modalities. The loss function between different similarity matrices and the loss function between different modalities are calculated through these four different matrices. Considering that most models have a single method for extracting text features, this paper uses text self-attention to strengthen the effect of text feature extraction so that the final performance of this paper is effectively improved. In order to verify the impact of this article, the results are tested on the Wikipedia, MIRFlickr, and NUS-WIDE datasets, and the results prove that the effect of this article has certain advantages compared with the latest methods.
Similar content being viewed by others
References
Gu J, Cai J, Joty SR, Niu L, Wang G (2018) Look, imagine and match: Improving textual-visual cross-modal retrieval with generative models. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7181–7189
Li K, Zhang Y, Li K, Li Y, Fu Y (2019) Visual semantic reasoning for image-text matching. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 4654–4662
Ji Z, Wang H, Han J, Pang Y (2019) Saliency-guided attention network for image-sentence matching. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 5754–5763
Wang S, Wang R, Yao Z, Shan S, Chen X (2020) Cross-modal scene graph matching for relationship-aware image-text retrieval. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp 1508–1517
Eisenschtat A, Wolf L (2017) Linking image and text with 2-way nets. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4601–4611
Wang L, Li Y, Huang J, Lazebnik S (2018) Learning two-branch neural networks for image-text matching tasks. IEEE Trans Pattern Anal Mach Intell 41(2):394–407
Liu Y, Guo Y, Bakker EM, Lew MS (2017) Learning a recurrent residual fusion network for multimodal matching. In: Proceedings of the IEEE international conference on computer vision. pp 4107–4116
Jiang Q-Y, Li W-J (2019) Discrete latent factor model for cross-modal hashing. IEEE Trans Image Process 28(7):3490– 3501
Wu D, Dai Q, Liu J, Li B, Wang W (2019) Deep incremental hashing network for efficient image retrieval. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 9069–9077
Wu D, Liu J, Li B, Wang W (2018) Deep index-compatible hashing for fast image retrieval. In: 2018 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6
Lu X, Zhu L, Cheng Z, Li J, Nie X, Zhang H (2019) Flexible online multi-modal hashing for large-scale multimedia retrieval. In: Proceedings of the 27th ACM international conference on multimedia. pp 1129–1137
Lu X, Zhu L, Cheng Z, Nie L, Zhang H (2019) Online multi-modal hashing with dynamic query-adaption. In: Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval. pp 715–724
Sun C, Song X, Feng F, Xin Zhao W, Zhang H, Nie L (2019) Supervised hierarchical cross-modal hashing. In: Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval. pp 725–734
Long M, Cao Y, Wang J, Yu PS (2016) Composite correlation quantization for efficient multimodal retrieval. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval. pp 579–588
Liong VE, Lu J, Tan Y-P, Zhou J (2017) Cross-modal deep variational hashing. In: Proceedings of the IEEE international conference on computer vision. pp 4077–4085
Yang E, Liu T, Deng C, Liu W, Tao D (2019) Distillhash: Unsupervised deep hashing by distilling data pairs. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2946–2955
Gu W, Gu X, Gu J, Li B, Xiong Z, Wang W (2019) Adversary guided asymmetric hashing for cross-modal retrieval. In: Proceedings of the 2019 on international conference on multimedia retrieval. pp 159–167
Wang B, Yang Y, Xu X, Hanjalic A, Shen HT (2017) Adversarial cross-modal retrieval. In: Proceedings of the 25th ACM international conference on Multimedia. pp 154–162
Ye Z, Peng Y (2018) Multi-scale correlation for sequential cross-modal hashing learning. In: Proceedings of the 26th ACM international conference on Multimedia. pp 852–860
Li C, Deng C, Wang L, De X, Liu X (2019) Coupled cyclegan: Unsupervised hashing network for cross-modal retrieval. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 176–183
Wang W, Shen Y, Zhang H, Yao Y, Liu L (2020) Set and rebase: determining the semantic graph connectivity for unsupervised cross modal hashing. In: International joint conference on artificial intelligence. pp 853–859
Yang D, Wu D, Zhang W, Zhang H, Li B, Wang W (2020) Deep semantic-alignment hashing for unsupervised cross-modal retrieval. In: Proceedings of the 2020 international conference on multimedia retrieval. pp 44–52
Su S, Zhong Z, Zhang C (2019) Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 3027–3035
Liu S, Qian S, Guan Y, Zhan J, Ying L (2020) Joint-modal distribution-based similarity hashing for large-scale unsupervised deep cross-modal retrieval. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. pp 1379–1388
Wang D, Wang Q, An Y, Gao X, Tian Y (2020) Online collective matrix factorization hashing for large-scale cross-media retrieval. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. pp 1409–1418
Wu G, Lin Z, Han J, Liu L, Ding G, Zhang B, Shen J (2018) Unsupervised deep hashing via binary latent factor models for large-scale cross-modal retrieval. In: IJCAI. pp 2854–2860
Zhang J, Peng Y, Yuan M (2018) Unsupervised generative adversarial cross-modal hashing. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. In: Twenty-second international joint conference on artificial intelligence
Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD international conference on management of Data. pp 785–796
Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2075–2082
Wu B, Yang Q, Zheng W-S, Wang Y, Wang J (2015) Quantized correlation hashing for fast cross-modal search. In: IJCAI. Citeseer, pp 3946–3952
Hu D, Nie F, Li X (2018) Deep binary reconstruction for cross-modal hashing. IEEE Trans Multimed 21(4):973–985
Liu X, Yu G, Domeniconi C, Wang J, Ren Y, Guo M (2019) Ranking-based deep cross-modal hashing. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 4400–4407
Li C, Deng C, Li N, Liu W, Gao X, Tao D (2018) Self-supervised adversarial hashing networks for cross-modal retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4242–4251
Jiang Q-Y, Li W-J (2017) Deep cross-modal hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3232–3240
De X, Deng C, Li C, Liu X, Tao D (2020) Multi-task consistency-preserving adversarial hashing for cross-modal retrieval. IEEE Trans Image Process 29:3626–3637
Hu H, Xie L, Hong R, Tian Q (2020) Creating something from nothing: Unsupervised knowledge distillation for cross-modal hashing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 3123–3132
Xu R, Li C, Yan J, Deng C, Liu X (2019) Graph convolutional network hashing for cross-modal retrieval. In: Ijcai. pp 982–988
Cao Y, Liu B, Long M, Wang J (2018) Cross-modal hamming hashing. In: Proceedings of the European conference on computer vision (ECCV). pp 202–218
Tan Z, Wang M, Xie J, Chen Y, Shi X (2018) Deep semantic role labeling with self-attention. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Zhou W, Yuan J, Lei J, Luo T (2020) Tsnet: three-stream self-attention network for rgb-d indoor semantic segmentation. IEEE Intell Syst
Zhou W, Liu W, Lei J, Luo T, Yu L (2021) Deep binocular fixation prediction using a hierarchical multimodal fusion network. IEEE Trans Cognit Develop Syst
Zhou W, Guo Q, Lei J, Yu L, Hwang Jenq-Neng (2021) Ecffnet: effective and consistent feature fusion network for rgb-t salient object detection. IEEE Trans Circ Syst Vid Technol
Zhou W, Zhu Y, Lei J, Wan J, Yu L (2021) Ccafnet: crossflow and cross-scale adaptive fusion network for detecting salient objects in rgb-d images. IEEE Trans Multimed
Cao Z, Long M, Wang J, Yu PS (2017) Hashnet: Deep learning to hash by continuation. In: Proceedings of the IEEE international conference on computer vision. pp 5608–5617
Pereira JC, Coviello E, Doyle G, Rasiwasia N, Lanckriet GRG, Levy R, Vasconcelos N (2013) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535
Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on Multimedia information retrieval. pp 39–43
Chua T-S, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of singapore. In: Proceedings of the ACM international conference on image and video retrieval. pp 1–9
Zhang P-F, Li Y, Huang Z, Xu X-S (2021) Aggregation-based graph convolutional hashing for unsupervised cross-modal retrieval. IEEE Trans Multimed
Wang Y, Chen Z-D, Luo X, Li R, Xu X-S (2021) Fast cross-modal hashing with global and local similarity embedding. IEEE Trans Cybern
Yang Z, Yang L, Raymond OI, Zhu L, Huang W, Liao Z, Long J (2021) Nsdh: A nonlinear supervised discrete hashing framework for large-scale cross-modal retrieval. Knowl-Based Syst 217:106818
Acknowledgments
This work is supported by the National Natural Science Foundation of China (Nos.61966004, 61866004, 61762015), the Guangxi Natural Science Foundation (No. 2019GXNSFDA245018), the Postgraduate Education Innovation Foundation (No.JXXYYJSCXXM-2021-008), the Guangxi “Bagui Scholar” Teams for Innovation and Research Project, the Guangxi Talent Highland Project of Big Data Intelligence and Application, Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work. There is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hou, C., Li, Z. & Wu, J. Unsupervised hash retrieval based on multiple similarity matrices and text self-attention mechanism. Appl Intell 52, 7670–7685 (2022). https://doi.org/10.1007/s10489-021-02804-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02804-6