Skip to main content
Log in

Unsupervised hash retrieval based on multiple similarity matrices and text self-attention mechanism

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Cross-modal retrieval aims to find the similarity between different modal data, while the hash retrieval method improves retrieval efficiency. This paper proposes a cross-modal hash retrieval method based on multiple similarity matrices. This paper proposes an unsupervised cross-modal hash retrieval method based on multiple similarity matrices. This paper uses a weighted combination method to construct fusion features through hash features and original features. Based on the three features, the auxiliary similarity matrix of each of the three features is established. Finally, the fusion matrix is constructed by a weighted combination of the similarity matrix of the original features and the hash features. These four different matrices include similarity matrices with varying forms of features and similarity matrices with varying construction methods, which concentrate the similarity information of other modalities. The loss function between different similarity matrices and the loss function between different modalities are calculated through these four different matrices. Considering that most models have a single method for extracting text features, this paper uses text self-attention to strengthen the effect of text feature extraction so that the final performance of this paper is effectively improved. In order to verify the impact of this article, the results are tested on the Wikipedia, MIRFlickr, and NUS-WIDE datasets, and the results prove that the effect of this article has certain advantages compared with the latest methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Gu J, Cai J, Joty SR, Niu L, Wang G (2018) Look, imagine and match: Improving textual-visual cross-modal retrieval with generative models. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7181–7189

  2. Li K, Zhang Y, Li K, Li Y, Fu Y (2019) Visual semantic reasoning for image-text matching. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 4654–4662

  3. Ji Z, Wang H, Han J, Pang Y (2019) Saliency-guided attention network for image-sentence matching. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 5754–5763

  4. Wang S, Wang R, Yao Z, Shan S, Chen X (2020) Cross-modal scene graph matching for relationship-aware image-text retrieval. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp 1508–1517

  5. Eisenschtat A, Wolf L (2017) Linking image and text with 2-way nets. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4601–4611

  6. Wang L, Li Y, Huang J, Lazebnik S (2018) Learning two-branch neural networks for image-text matching tasks. IEEE Trans Pattern Anal Mach Intell 41(2):394–407

    Article  Google Scholar 

  7. Liu Y, Guo Y, Bakker EM, Lew MS (2017) Learning a recurrent residual fusion network for multimodal matching. In: Proceedings of the IEEE international conference on computer vision. pp 4107–4116

  8. Jiang Q-Y, Li W-J (2019) Discrete latent factor model for cross-modal hashing. IEEE Trans Image Process 28(7):3490– 3501

    Article  MathSciNet  Google Scholar 

  9. Wu D, Dai Q, Liu J, Li B, Wang W (2019) Deep incremental hashing network for efficient image retrieval. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 9069–9077

  10. Wu D, Liu J, Li B, Wang W (2018) Deep index-compatible hashing for fast image retrieval. In: 2018 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6

  11. Lu X, Zhu L, Cheng Z, Li J, Nie X, Zhang H (2019) Flexible online multi-modal hashing for large-scale multimedia retrieval. In: Proceedings of the 27th ACM international conference on multimedia. pp 1129–1137

  12. Lu X, Zhu L, Cheng Z, Nie L, Zhang H (2019) Online multi-modal hashing with dynamic query-adaption. In: Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval. pp 715–724

  13. Sun C, Song X, Feng F, Xin Zhao W, Zhang H, Nie L (2019) Supervised hierarchical cross-modal hashing. In: Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval. pp 725–734

  14. Long M, Cao Y, Wang J, Yu PS (2016) Composite correlation quantization for efficient multimodal retrieval. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval. pp 579–588

  15. Liong VE, Lu J, Tan Y-P, Zhou J (2017) Cross-modal deep variational hashing. In: Proceedings of the IEEE international conference on computer vision. pp 4077–4085

  16. Yang E, Liu T, Deng C, Liu W, Tao D (2019) Distillhash: Unsupervised deep hashing by distilling data pairs. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 2946–2955

  17. Gu W, Gu X, Gu J, Li B, Xiong Z, Wang W (2019) Adversary guided asymmetric hashing for cross-modal retrieval. In: Proceedings of the 2019 on international conference on multimedia retrieval. pp 159–167

  18. Wang B, Yang Y, Xu X, Hanjalic A, Shen HT (2017) Adversarial cross-modal retrieval. In: Proceedings of the 25th ACM international conference on Multimedia. pp 154–162

  19. Ye Z, Peng Y (2018) Multi-scale correlation for sequential cross-modal hashing learning. In: Proceedings of the 26th ACM international conference on Multimedia. pp 852–860

  20. Li C, Deng C, Wang L, De X, Liu X (2019) Coupled cyclegan: Unsupervised hashing network for cross-modal retrieval. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 176–183

  21. Wang W, Shen Y, Zhang H, Yao Y, Liu L (2020) Set and rebase: determining the semantic graph connectivity for unsupervised cross modal hashing. In: International joint conference on artificial intelligence. pp 853–859

  22. Yang D, Wu D, Zhang W, Zhang H, Li B, Wang W (2020) Deep semantic-alignment hashing for unsupervised cross-modal retrieval. In: Proceedings of the 2020 international conference on multimedia retrieval. pp 44–52

  23. Su S, Zhong Z, Zhang C (2019) Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 3027–3035

  24. Liu S, Qian S, Guan Y, Zhan J, Ying L (2020) Joint-modal distribution-based similarity hashing for large-scale unsupervised deep cross-modal retrieval. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. pp 1379–1388

  25. Wang D, Wang Q, An Y, Gao X, Tian Y (2020) Online collective matrix factorization hashing for large-scale cross-media retrieval. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. pp 1409–1418

  26. Wu G, Lin Z, Han J, Liu L, Ding G, Zhang B, Shen J (2018) Unsupervised deep hashing via binary latent factor models for large-scale cross-modal retrieval. In: IJCAI. pp 2854–2860

  27. Zhang J, Peng Y, Yuan M (2018) Unsupervised generative adversarial cross-modal hashing. In: Proceedings of the AAAI conference on artificial intelligence, vol 32

  28. Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. In: Twenty-second international joint conference on artificial intelligence

  29. Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD international conference on management of Data. pp 785–796

  30. Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2075–2082

  31. Wu B, Yang Q, Zheng W-S, Wang Y, Wang J (2015) Quantized correlation hashing for fast cross-modal search. In: IJCAI. Citeseer, pp 3946–3952

  32. Hu D, Nie F, Li X (2018) Deep binary reconstruction for cross-modal hashing. IEEE Trans Multimed 21(4):973–985

    Article  Google Scholar 

  33. Liu X, Yu G, Domeniconi C, Wang J, Ren Y, Guo M (2019) Ranking-based deep cross-modal hashing. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 4400–4407

  34. Li C, Deng C, Li N, Liu W, Gao X, Tao D (2018) Self-supervised adversarial hashing networks for cross-modal retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4242–4251

  35. Jiang Q-Y, Li W-J (2017) Deep cross-modal hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3232–3240

  36. De X, Deng C, Li C, Liu X, Tao D (2020) Multi-task consistency-preserving adversarial hashing for cross-modal retrieval. IEEE Trans Image Process 29:3626–3637

    Article  Google Scholar 

  37. Hu H, Xie L, Hong R, Tian Q (2020) Creating something from nothing: Unsupervised knowledge distillation for cross-modal hashing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 3123–3132

  38. Xu R, Li C, Yan J, Deng C, Liu X (2019) Graph convolutional network hashing for cross-modal retrieval. In: Ijcai. pp 982–988

  39. Cao Y, Liu B, Long M, Wang J (2018) Cross-modal hamming hashing. In: Proceedings of the European conference on computer vision (ECCV). pp 202–218

  40. Tan Z, Wang M, Xie J, Chen Y, Shi X (2018) Deep semantic role labeling with self-attention. In: Proceedings of the AAAI conference on artificial intelligence, vol 32

  41. Zhou W, Yuan J, Lei J, Luo T (2020) Tsnet: three-stream self-attention network for rgb-d indoor semantic segmentation. IEEE Intell Syst

  42. Zhou W, Liu W, Lei J, Luo T, Yu L (2021) Deep binocular fixation prediction using a hierarchical multimodal fusion network. IEEE Trans Cognit Develop Syst

  43. Zhou W, Guo Q, Lei J, Yu L, Hwang Jenq-Neng (2021) Ecffnet: effective and consistent feature fusion network for rgb-t salient object detection. IEEE Trans Circ Syst Vid Technol

  44. Zhou W, Zhu Y, Lei J, Wan J, Yu L (2021) Ccafnet: crossflow and cross-scale adaptive fusion network for detecting salient objects in rgb-d images. IEEE Trans Multimed

  45. Cao Z, Long M, Wang J, Yu PS (2017) Hashnet: Deep learning to hash by continuation. In: Proceedings of the IEEE international conference on computer vision. pp 5608–5617

  46. Pereira JC, Coviello E, Doyle G, Rasiwasia N, Lanckriet GRG, Levy R, Vasconcelos N (2013) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535

    Article  Google Scholar 

  47. Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on Multimedia information retrieval. pp 39–43

  48. Chua T-S, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of singapore. In: Proceedings of the ACM international conference on image and video retrieval. pp 1–9

  49. Zhang P-F, Li Y, Huang Z, Xu X-S (2021) Aggregation-based graph convolutional hashing for unsupervised cross-modal retrieval. IEEE Trans Multimed

  50. Wang Y, Chen Z-D, Luo X, Li R, Xu X-S (2021) Fast cross-modal hashing with global and local similarity embedding. IEEE Trans Cybern

  51. Yang Z, Yang L, Raymond OI, Zhu L, Huang W, Liao Z, Long J (2021) Nsdh: A nonlinear supervised discrete hashing framework for large-scale cross-modal retrieval. Knowl-Based Syst 217:106818

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Nos.61966004, 61866004, 61762015), the Guangxi Natural Science Foundation (No. 2019GXNSFDA245018), the Postgraduate Education Innovation Foundation (No.JXXYYJSCXXM-2021-008), the Guangxi “Bagui Scholar” Teams for Innovation and Research Project, the Guangxi Talent Highland Project of Big Data Intelligence and Application, Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhixin Li.

Ethics declarations

Conflict of Interests

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work. There is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hou, C., Li, Z. & Wu, J. Unsupervised hash retrieval based on multiple similarity matrices and text self-attention mechanism. Appl Intell 52, 7670–7685 (2022). https://doi.org/10.1007/s10489-021-02804-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02804-6

Keywords

Navigation