Skip to main content

Advertisement

Semantic-aware matrix factorization hashing with intra- and inter-modality fusion for image-text retrieval

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Cross-modal retrieval aims to retrieve related items in one modality using a query from another modality. As the foundational and key challenge of it, image-text retrieval has garnered significant research interest from scholars. In recent years, hashing techniques have gained widespread interest for large-scale dataset retrieval due to their minimal storage requirements and rapid query processing capabilities. However, existing hashing approaches either learn unified representations for both modalities or specific representations within each modality. The former approach lacks modality-specific information, while the latter does not consider the relationships between image-text pairs across various modalities. Therefore, we propose an innovative supervised hashing method that leverages intra-modality and inter-modality matrix factorization. This method integrates semantic labels into the hash code learning process, aiming to understand both inter-modality and intra-modality relationships within a unified framework for diverse data types. The objective is to preserve inter-modal complementarity and intra-modal consistency in multimodal data. Our approach involves: (1) mapping data from various modalities into a shared latent semantic space through inter-modality matrix factorization to derive unified hash codes, and (2) mapping data from each modality into modality-specific latent semantic spaces via intra-modality matrix factorization to obtain modality-specific hash codes. These are subsequently merged to construct the final hash codes. Experimental results demonstrate that our approach surpasses several state-of-the-art cross-modal image-text retrieval hashing methods. Additionally, ablation studies further validate the effectiveness of each component within our model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability and access

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  1. Pei X, Liu Z, Gao S, Su Y (2023) Complementarity is the king: multi-modal and multi-grained hierarchical semantic enhancement network for cross-modal retrieval. Expert Syst Appl 216:119415

    Article  Google Scholar 

  2. Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the twentieth annual symposium on computational geometry, pp 253–262

  3. Hu M, Yang Y, Shen F, Xie N, Hong R, Shen HT (2018) Collective reconstructive embeddings for cross-modal hashing. IEEE Trans Image Process 28:2770–2784

    Article  MathSciNet  MATH  Google Scholar 

  4. Liu X, Li A, Du J-X, Peng S-J, Fan W (2018) Efficient cross-modal retrieval via flexible supervised collective matrix factorization hashing. Multimed Tool Appl 77:28665–28683

    Article  Google Scholar 

  5. Masci J, Bronstein MM, Bronstein AM, Schmidhuber J (2013) Multimodal similarity-preserving hashing. IEEE Trans Pattern Anal Mach Intell 36:824–830

    Article  MATH  Google Scholar 

  6. Wang B, Yang Y, Xu X, Hanjalic A, Shen HT (2017) Adversarial cross-modal retrieval. In: Proceedings of the 25th ACM international conference on multimedia, pp 154–162

  7. Wang Y, Ou X, Liang J, Sun Z (2020) Deep semantic reconstruction hashing for similarity retrieval. IEEE Trans Circuits Syst Video Technol 31:387–400

    Article  MATH  Google Scholar 

  8. Lu X, Zhu L, Cheng Z, Song X, Zhang H (2019) Efficient discrete latent semantic hashing for scalable cross-modal retrieval. Signal Process 154:217–231

    Article  Google Scholar 

  9. Zhang S, Li J, Jiang M, Yuan P, Zhang B (2017) Scalable discrete supervised multimedia hash learning with clustering. IEEE Trans Circuits Syst Video Technol 28:2716–2729

    Article  MATH  Google Scholar 

  10. Wang J, Liu W, Kumar S, Chang S-F (2015) Learning to hash for indexing big data—a survey. Proc IEEE 104:34–57

    Article  MATH  Google Scholar 

  11. Takahashi T, Kurita T (2014) Mixture of subspaces image representation and compact coding for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 37:1469–1479

    Article  MATH  Google Scholar 

  12. Shen F, Shen C, Shi Q, Van den Hengel A, Tang Z, Shen HT (2015) Hashing on nonlinear manifolds. IEEE Trans Image Process 24:1839–1851

    Article  MathSciNet  MATH  Google Scholar 

  13. Deng C, Chen Z, Liu X, Gao X, Tao D (2018) Triplet-based deep hashing network for cross-modal retrieval. IEEE Trans Image Process 27:3893–3903

    Article  MathSciNet  MATH  Google Scholar 

  14. Yang E, Deng C, Li C, Liu W, Li J, Tao D (2018) Shared predictive cross-modal deep quantization. IEEE Trans Neural Netw Learn Syst 29:5292–5303

    Article  MATH  Google Scholar 

  15. Wang W, Yang X, Ooi BC, Zhang D, Zhuang Y (2016) Effective deep learning-based multi-modal retrieval. VLDB J 25:79–101

    Article  MATH  Google Scholar 

  16. Hotelling H (1992) Relations between two sets of variates. In: Breakthroughs in statistics: methodology and distribution, pp 162–190

  17. Lai PL, Fyfe C (2000) Kernel and nonlinear canonical correlation analysis. Lai, Pei Ling and Fyfe, Colin 10(5):365–377

    MATH  Google Scholar 

  18. Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on multimedia, pp 251–260

  19. Wang C, Yang H, Meinel C (2015) Deep semantic mapping for cross-modal retrieval. In: 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI), pp 234–241

  20. Wang Y, Su Y, Li W, Xiao J, Li X, Liu A-A (2023) Dual-path rare content enhancement network for image and text matching. IEEE Trans Circuits Syst Video Technol 33(10):6144–6158

    Article  MATH  Google Scholar 

  21. Li W, Yang S, Li Q, Li X, Liu A-A (2023) Commonsense-guided semantic and relational consistencies for image-text retrieval. IEEE Trans Multimed

  22. Yang X, Gao X, Song B, Han B (2020) Hierarchical deep embedding for aurora image retrieval. IEEE Trans Cybern 51:5773–5785

    Article  MATH  Google Scholar 

  23. He S, Wang B, Wang Z, Yang Y, Shen F, Huang Z, Shen HT (2020) Bidirectional discrete matrix factorization hashing for image search. IEEE Trans Cybern 50:4157–4168

    Article  MATH  Google Scholar 

  24. Zhu L, Lu X, Cheng Z, Li J, Zhang H (2020) Flexible multi-modal hashing for scalable multimedia retrieval. ACM Transactions on Intelligent Systems and Technology (TIST) 11:1–20

    Article  MATH  Google Scholar 

  25. Gong Y, Lazebnik S, Gordo A, Perronnin F (2013) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35:2916–2929

    Article  Google Scholar 

  26. Tang J, Li Z, Wang M, Zhao R (2015) Neighborhood discriminant hashing for large-scale image retrieval. IEEE Trans Image Process 24:2827–2840

    Article  MathSciNet  MATH  Google Scholar 

  27. Ji R, Liu H, Cao L, Liu D, Wu Y, Huang F (2017) Toward optimal manifold hashing via discrete locally linear embedding. IEEE Trans Image Process 26(11):5411–5420

    Article  MathSciNet  MATH  Google Scholar 

  28. Liu W, Wang J, Ji R, Jiang Y-G, Chang S-F (2012) Supervised hashing with kernels. In: 2012 IEEE conference on computer vision and pattern recognition, pp 2074–2081

  29. Gui J, Liu T, Sun Z, Tao D, Tan T (2018) Fast supervised discrete hashing. IEEE Trans Pattern Anal Mach Intell 40(2):490–496

    Article  MATH  Google Scholar 

  30. Luo X, Zhang P-F, Huang Z, Nie L, Xu X-S (2019) Discrete hashing with multiple supervision. IEEE Trans Image Process 28(6):2962–2975

    Article  MathSciNet  MATH  Google Scholar 

  31. Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, pp 785–796

  32. Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2075–2082

  33. Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval, pp 415–424

  34. Wang D, Wang Q, Gao X (2017) Robust and flexible discrete hashing for cross-modal similarity search. IEEE Trans Circuits Syst Video Technol 28:2703–2715

    Article  MATH  Google Scholar 

  35. Li J, Li F, Zhu L, Cui H, Li J (2023) Prototype-guided knowledge transfer for federated unsupervised cross-modal hashing. In: Proceedings of the 31st ACM international conference on multimedia, pp 1013–1022

  36. Cui J, He Z, Huang Q, Fu Y, Li Y, Wen J (2024) Structure-aware contrastive hashing for unsupervised cross-modal retrieval. Neural Netw 174:106211

    Article  Google Scholar 

  37. Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3864–3872

  38. Tang J, Wang K, Shao L (2016) Supervised matrix factorization hashing for cross-modal retrieval. IEEE Trans Image Process 25:3157–3166

    Article  MathSciNet  MATH  Google Scholar 

  39. Liu X, Hu Z, Ling H, Cheung Y-m (2019) Mtfh: a matrix tri-factorization hashing framework for efficient cross-modal retrieval. IEEE Trans Pattern Anal Mach Intell 43:964–981

  40. Zhang D, Li W-J (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. Proc AAAI Conf Artif Intell 28:2177–2183

    MATH  Google Scholar 

  41. Wang D, Gao X, Wang X, He L (2018) Label consistent matrix factorization hashing for large-scale cross-modal similarity search. IEEE Trans Pattern Anal Mach Intell 41:2466–2479

    Article  MATH  Google Scholar 

  42. Shen HT, Liu L, Yang Y, Xu X, Huang Z, Shen F, Hong R (2021) Exploiting subspace relation in semantic labels for cross-modal hashing. IEEE Trans Knowl Data Eng 33:3351–3365

    Article  MATH  Google Scholar 

  43. Jiang Q-Y, Li W-J (2017) Deep cross-modal hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3232–3240

  44. Li C, Deng C, Li N, Liu W, Gao X, Tao D (2018) Self-supervised adversarial hashing networks for cross-modal retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4242–4251

  45. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Commun ACM 63(11):139–144

    Article  MathSciNet  MATH  Google Scholar 

  46. Shu Z, Li L, Yu J, Zhang D, Yu Z, Wu X-J (2023) Online supervised collective matrix factorization hashing for cross-modal retrieval. Appl Intell 53(11):14201–14218

  47. Shu Z, Yong K, Zhang D, Yu J, Yu Z, Wu X-J (2023) Robust supervised matrix factorization hashing with application to cross-modal retrieval. Neural Comput Appl 35(9):6665–6684

    Article  MATH  Google Scholar 

  48. Wang D, Gao X, Wang X, He L (2015) Semantic topic multimodal hashing for cross-media retrieval. In: Twenty-fourth international joint conference on artificial intelligence, pp 3890–3896

  49. Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. Twenty-Second Int Joint Conf Artif Intell 22:1360–1367

    MATH  Google Scholar 

  50. Zhan Y-W, Wang Y, Sun Y, Wu X-M, Luo X, Xu X-S (2022) Discrete online cross-modal hashing. Pattern Recogn 122:108262

    Article  Google Scholar 

  51. Zhang D, Wu X-J (2022) Robust and discrete matrix factorization hashing for cross-modal retrieval. Pattern Recogn 122:108343

    Article  MATH  Google Scholar 

  52. Chen Y, Quan J, Zhang Y, Feng R, Zhang T (2023) Deep cross-modal hashing with fine-grained similarity. Appl Intell 53(23):28954–28973

    Article  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by Humanities and Social Sciences Project of Education Ministry (20YJA870013), Scientific Research Studio in Colleges and Universities of Ji’nan City (202228105). Authors would like to thank reviewers for their helpful comments.

Author information

Authors and Affiliations

Authors

Contributions

DongXue Shi: Writing - original draft, Conceptualization, Methodology, Software, Investigation, Validation. Zheng Liu: Conceptualization, Writing - review & editing, Supervision, Project administration, Funding acquisition. Shanshan Gao: Writing - review & editing, Funding acquisition. Ang Li: Software, Writing - review & editing, Supervision.

Corresponding author

Correspondence to Zheng Liu.

Ethics declarations

Competing interest

The authors have no competing interests to declare that are relevant to the content of this article.

Ethical and informed consent for data used

This work described in this manuscript is original and has not been under consideration for publication elsewhere. All authors read and approved the final manuscript. The research in this manuscript does not involve human participants and animals. We obtain ethical and informed consent from data subjects before collecting, using, or disclosing their personal data.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shi, D., Liu, Z., Gao, S. et al. Semantic-aware matrix factorization hashing with intra- and inter-modality fusion for image-text retrieval. Appl Intell 55, 5 (2025). https://doi.org/10.1007/s10489-024-06060-2

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-06060-2

Keywords