Semantic-aware matrix factorization hashing with intra- and inter-modality fusion for image-text retrieval

Shi, Dongxue; Liu, Zheng; Gao, Shanshan; Li, Ang

doi:10.1007/s10489-024-06060-2

Semantic-aware matrix factorization hashing with intra- and inter-modality fusion for image-text retrieval

Published: 19 November 2024

Volume 55, article number 5, (2025)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Dongxue Shi^1,2,
Zheng Liu ORCID: orcid.org/0000-0002-6846-5782^1,3,
Shanshan Gao^1,3 &
…
Ang Li^1,3

230 Accesses
Explore all metrics

Abstract

Cross-modal retrieval aims to retrieve related items in one modality using a query from another modality. As the foundational and key challenge of it, image-text retrieval has garnered significant research interest from scholars. In recent years, hashing techniques have gained widespread interest for large-scale dataset retrieval due to their minimal storage requirements and rapid query processing capabilities. However, existing hashing approaches either learn unified representations for both modalities or specific representations within each modality. The former approach lacks modality-specific information, while the latter does not consider the relationships between image-text pairs across various modalities. Therefore, we propose an innovative supervised hashing method that leverages intra-modality and inter-modality matrix factorization. This method integrates semantic labels into the hash code learning process, aiming to understand both inter-modality and intra-modality relationships within a unified framework for diverse data types. The objective is to preserve inter-modal complementarity and intra-modal consistency in multimodal data. Our approach involves: (1) mapping data from various modalities into a shared latent semantic space through inter-modality matrix factorization to derive unified hash codes, and (2) mapping data from each modality into modality-specific latent semantic spaces via intra-modality matrix factorization to obtain modality-specific hash codes. These are subsequently merged to construct the final hash codes. Experimental results demonstrate that our approach surpasses several state-of-the-art cross-modal image-text retrieval hashing methods. Additionally, ablation studies further validate the effectiveness of each component within our model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic Preservation and Hash Fusion Network for Unsupervised Cross-Modal Retrieval

Self-auxiliary Hashing for Unsupervised Cross Modal Retrieval

Latent semantic-enhanced discrete hashing for cross-modal retrieval

Article 19 March 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability and access

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Pei X, Liu Z, Gao S, Su Y (2023) Complementarity is the king: multi-modal and multi-grained hierarchical semantic enhancement network for cross-modal retrieval. Expert Syst Appl 216:119415
Article Google Scholar
Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the twentieth annual symposium on computational geometry, pp 253–262
Hu M, Yang Y, Shen F, Xie N, Hong R, Shen HT (2018) Collective reconstructive embeddings for cross-modal hashing. IEEE Trans Image Process 28:2770–2784
Article MathSciNet MATH Google Scholar
Liu X, Li A, Du J-X, Peng S-J, Fan W (2018) Efficient cross-modal retrieval via flexible supervised collective matrix factorization hashing. Multimed Tool Appl 77:28665–28683
Article Google Scholar
Masci J, Bronstein MM, Bronstein AM, Schmidhuber J (2013) Multimodal similarity-preserving hashing. IEEE Trans Pattern Anal Mach Intell 36:824–830
Article MATH Google Scholar
Wang B, Yang Y, Xu X, Hanjalic A, Shen HT (2017) Adversarial cross-modal retrieval. In: Proceedings of the 25th ACM international conference on multimedia, pp 154–162
Wang Y, Ou X, Liang J, Sun Z (2020) Deep semantic reconstruction hashing for similarity retrieval. IEEE Trans Circuits Syst Video Technol 31:387–400
Article MATH Google Scholar
Lu X, Zhu L, Cheng Z, Song X, Zhang H (2019) Efficient discrete latent semantic hashing for scalable cross-modal retrieval. Signal Process 154:217–231
Article Google Scholar
Zhang S, Li J, Jiang M, Yuan P, Zhang B (2017) Scalable discrete supervised multimedia hash learning with clustering. IEEE Trans Circuits Syst Video Technol 28:2716–2729
Article MATH Google Scholar
Wang J, Liu W, Kumar S, Chang S-F (2015) Learning to hash for indexing big data—a survey. Proc IEEE 104:34–57
Article MATH Google Scholar
Takahashi T, Kurita T (2014) Mixture of subspaces image representation and compact coding for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 37:1469–1479
Article MATH Google Scholar
Shen F, Shen C, Shi Q, Van den Hengel A, Tang Z, Shen HT (2015) Hashing on nonlinear manifolds. IEEE Trans Image Process 24:1839–1851
Article MathSciNet MATH Google Scholar
Deng C, Chen Z, Liu X, Gao X, Tao D (2018) Triplet-based deep hashing network for cross-modal retrieval. IEEE Trans Image Process 27:3893–3903
Article MathSciNet MATH Google Scholar
Yang E, Deng C, Li C, Liu W, Li J, Tao D (2018) Shared predictive cross-modal deep quantization. IEEE Trans Neural Netw Learn Syst 29:5292–5303
Article MATH Google Scholar
Wang W, Yang X, Ooi BC, Zhang D, Zhuang Y (2016) Effective deep learning-based multi-modal retrieval. VLDB J 25:79–101
Article MATH Google Scholar
Hotelling H (1992) Relations between two sets of variates. In: Breakthroughs in statistics: methodology and distribution, pp 162–190
Lai PL, Fyfe C (2000) Kernel and nonlinear canonical correlation analysis. Lai, Pei Ling and Fyfe, Colin 10(5):365–377
MATH Google Scholar
Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on multimedia, pp 251–260
Wang C, Yang H, Meinel C (2015) Deep semantic mapping for cross-modal retrieval. In: 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI), pp 234–241
Wang Y, Su Y, Li W, Xiao J, Li X, Liu A-A (2023) Dual-path rare content enhancement network for image and text matching. IEEE Trans Circuits Syst Video Technol 33(10):6144–6158
Article MATH Google Scholar
Li W, Yang S, Li Q, Li X, Liu A-A (2023) Commonsense-guided semantic and relational consistencies for image-text retrieval. IEEE Trans Multimed
Yang X, Gao X, Song B, Han B (2020) Hierarchical deep embedding for aurora image retrieval. IEEE Trans Cybern 51:5773–5785
Article MATH Google Scholar
He S, Wang B, Wang Z, Yang Y, Shen F, Huang Z, Shen HT (2020) Bidirectional discrete matrix factorization hashing for image search. IEEE Trans Cybern 50:4157–4168
Article MATH Google Scholar
Zhu L, Lu X, Cheng Z, Li J, Zhang H (2020) Flexible multi-modal hashing for scalable multimedia retrieval. ACM Transactions on Intelligent Systems and Technology (TIST) 11:1–20
Article MATH Google Scholar
Gong Y, Lazebnik S, Gordo A, Perronnin F (2013) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35:2916–2929
Article Google Scholar
Tang J, Li Z, Wang M, Zhao R (2015) Neighborhood discriminant hashing for large-scale image retrieval. IEEE Trans Image Process 24:2827–2840
Article MathSciNet MATH Google Scholar
Ji R, Liu H, Cao L, Liu D, Wu Y, Huang F (2017) Toward optimal manifold hashing via discrete locally linear embedding. IEEE Trans Image Process 26(11):5411–5420
Article MathSciNet MATH Google Scholar
Liu W, Wang J, Ji R, Jiang Y-G, Chang S-F (2012) Supervised hashing with kernels. In: 2012 IEEE conference on computer vision and pattern recognition, pp 2074–2081
Gui J, Liu T, Sun Z, Tao D, Tan T (2018) Fast supervised discrete hashing. IEEE Trans Pattern Anal Mach Intell 40(2):490–496
Article MATH Google Scholar
Luo X, Zhang P-F, Huang Z, Nie L, Xu X-S (2019) Discrete hashing with multiple supervision. IEEE Trans Image Process 28(6):2962–2975
Article MathSciNet MATH Google Scholar
Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, pp 785–796
Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2075–2082
Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval, pp 415–424
Wang D, Wang Q, Gao X (2017) Robust and flexible discrete hashing for cross-modal similarity search. IEEE Trans Circuits Syst Video Technol 28:2703–2715
Article MATH Google Scholar
Li J, Li F, Zhu L, Cui H, Li J (2023) Prototype-guided knowledge transfer for federated unsupervised cross-modal hashing. In: Proceedings of the 31st ACM international conference on multimedia, pp 1013–1022
Cui J, He Z, Huang Q, Fu Y, Li Y, Wen J (2024) Structure-aware contrastive hashing for unsupervised cross-modal retrieval. Neural Netw 174:106211
Article Google Scholar
Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3864–3872
Tang J, Wang K, Shao L (2016) Supervised matrix factorization hashing for cross-modal retrieval. IEEE Trans Image Process 25:3157–3166
Article MathSciNet MATH Google Scholar
Liu X, Hu Z, Ling H, Cheung Y-m (2019) Mtfh: a matrix tri-factorization hashing framework for efficient cross-modal retrieval. IEEE Trans Pattern Anal Mach Intell 43:964–981
Zhang D, Li W-J (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. Proc AAAI Conf Artif Intell 28:2177–2183
MATH Google Scholar
Wang D, Gao X, Wang X, He L (2018) Label consistent matrix factorization hashing for large-scale cross-modal similarity search. IEEE Trans Pattern Anal Mach Intell 41:2466–2479
Article MATH Google Scholar
Shen HT, Liu L, Yang Y, Xu X, Huang Z, Shen F, Hong R (2021) Exploiting subspace relation in semantic labels for cross-modal hashing. IEEE Trans Knowl Data Eng 33:3351–3365
Article MATH Google Scholar
Jiang Q-Y, Li W-J (2017) Deep cross-modal hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3232–3240
Li C, Deng C, Li N, Liu W, Gao X, Tao D (2018) Self-supervised adversarial hashing networks for cross-modal retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4242–4251
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2020) Generative adversarial networks. Commun ACM 63(11):139–144
Article MathSciNet MATH Google Scholar
Shu Z, Li L, Yu J, Zhang D, Yu Z, Wu X-J (2023) Online supervised collective matrix factorization hashing for cross-modal retrieval. Appl Intell 53(11):14201–14218
Shu Z, Yong K, Zhang D, Yu J, Yu Z, Wu X-J (2023) Robust supervised matrix factorization hashing with application to cross-modal retrieval. Neural Comput Appl 35(9):6665–6684
Article MATH Google Scholar
Wang D, Gao X, Wang X, He L (2015) Semantic topic multimodal hashing for cross-media retrieval. In: Twenty-fourth international joint conference on artificial intelligence, pp 3890–3896
Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. Twenty-Second Int Joint Conf Artif Intell 22:1360–1367
MATH Google Scholar
Zhan Y-W, Wang Y, Sun Y, Wu X-M, Luo X, Xu X-S (2022) Discrete online cross-modal hashing. Pattern Recogn 122:108262
Article Google Scholar
Zhang D, Wu X-J (2022) Robust and discrete matrix factorization hashing for cross-modal retrieval. Pattern Recogn 122:108343
Article MATH Google Scholar
Chen Y, Quan J, Zhang Y, Feng R, Zhang T (2023) Deep cross-modal hashing with fine-grained similarity. Appl Intell 53(23):28954–28973
Article MATH Google Scholar

Download references

Acknowledgements

This work was supported by Humanities and Social Sciences Project of Education Ministry (20YJA870013), Scientific Research Studio in Colleges and Universities of Ji’nan City (202228105). Authors would like to thank reviewers for their helpful comments.

Author information

Authors and Affiliations

School of Computer Science and Technology, Shandong University of Finance and Economics, East of Erhuan Road 7366, Jinan, 250014, Shandong, China
Dongxue Shi, Zheng Liu, Shanshan Gao & Ang Li
Shandong Key Laboratory of Blockchain Finance, East of Erhuan Road 7366, Jinan, 250014, Shandong, China
Dongxue Shi
Shandong Provincial Key Laboratory of Digital Media Technology, East of Erhuan Road 7366, Jinan, 250014, Shandong, China
Zheng Liu, Shanshan Gao & Ang Li

Authors

Dongxue Shi
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shanshan Gao
View author publications
You can also search for this author in PubMed Google Scholar
Ang Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

DongXue Shi: Writing - original draft, Conceptualization, Methodology, Software, Investigation, Validation. Zheng Liu: Conceptualization, Writing - review & editing, Supervision, Project administration, Funding acquisition. Shanshan Gao: Writing - review & editing, Funding acquisition. Ang Li: Software, Writing - review & editing, Supervision.

Corresponding author

Correspondence to Zheng Liu.

Ethics declarations

Competing interest

The authors have no competing interests to declare that are relevant to the content of this article.

Ethical and informed consent for data used

This work described in this manuscript is original and has not been under consideration for publication elsewhere. All authors read and approved the final manuscript. The research in this manuscript does not involve human participants and animals. We obtain ethical and informed consent from data subjects before collecting, using, or disclosing their personal data.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shi, D., Liu, Z., Gao, S. et al. Semantic-aware matrix factorization hashing with intra- and inter-modality fusion for image-text retrieval. Appl Intell 55, 5 (2025). https://doi.org/10.1007/s10489-024-06060-2

Download citation

Accepted: 08 September 2024
Published: 19 November 2024
DOI: https://doi.org/10.1007/s10489-024-06060-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic-aware matrix factorization hashing with intra- and inter-modality fusion for image-text retrieval

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic Preservation and Hash Fusion Network for Unsupervised Cross-Modal Retrieval

Self-auxiliary Hashing for Unsupervised Cross Modal Retrieval

Latent semantic-enhanced discrete hashing for cross-modal retrieval

Data availability and access

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interest

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Semantic-aware matrix factorization hashing with intra- and inter-modality fusion for image-text retrieval

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic Preservation and Hash Fusion Network for Unsupervised Cross-Modal Retrieval

Self-auxiliary Hashing for Unsupervised Cross Modal Retrieval

Latent semantic-enhanced discrete hashing for cross-modal retrieval

Explore related subjects

Data availability and access

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interest

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation