Skip to main content
Log in

Robust supervised matrix factorization hashing with application to cross-modal retrieval

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In recent years, hashing methods have received extensive attention in multimedia search due to their high computational and storage efficiency. However, most of them explore the common representation of multi-modality data and then use it to generate the hash codes but ignore the specific properties of each modality. To mitigate this problem, we propose a novel hashing method, called Robust Supervised Matrix Factorization Hashing (RSMFH), which keeps both the shared and the specific properties of multimodality data by decomposing each modality into a common representation and an inconsistent representation. Moreover, we impose sparse constraints on the inconsistent part of each modality and minimize the production of the consistent parts, simultaneously. In addition, the supervised label information among the data is embedded into the learned hash codes enhancing the discriminative ability of RSMFH. We employ an efficient discrete optimization strategy to solve the proposed model. Massive experiments on four benchmark databases show that our approach achieves promising results in cross-modal retrieval tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The datasets analyzed during the current study are available in the LabelMe, UCL, Pascal sentences, Wiki repository http://labelme.csail.mit.edu/Release3.0/, https:// www.ucl.ac.uk/library, https://github.com/rupy/PascalSentenceDataset, http://www.svcl.ucsd.edu/projects/crossmodal/.

References

  1. Yang E, Deng C, Liu W et al (2017) Pairwise relationship guided deep hashing for cross-modal retrieval. In: Proceedings of the AAAI conference on artificial intelligence, pp 1618–1625

  2. Yang E, Deng C, Li C et al (2018) Shared predictive cross-modal deep quantization. In: IEEE transactions on neural networks and learning systems, pp 1–12

  3. Shu Z, Li L, Yu J et al (2022) Online supervised collective matrix factorization hashing for cross-modal retrieval. In: Applied intelligence, pp 1–18

  4. Shu Z, Yong K, Yu J et al (2022) Discrete asymmetric zero-shot hashing with application to cross-modal retrieval. In: Neurocomputing, pp 366–379

  5. Zhang D, Wu X, Yin H et al (2022) MOON: multi-hash codes joint learning for cross-media retrieval. In: Pattern recognition letters, pp 19–25

  6. Shu Z, Bai Y, Zhang D et al (2022) Specific class center guided deep hashing for cross-modal retrieval. In: Information sciences, pp 304–318

  7. Deng C, Yang E, Liu T et al (2019) Unsupervised semantic-preserving adversarial hashing for image search. In: IEEE transactions on image processing, pp 4032–4044

  8. Deng C, Yang E, Liu T et al (2019) Two-stream deep hashing with class-specific centers for supervised image search. In: IEEE transactions on neural networks and learning systems, pp 1–13

  9. Yu J, Zhang D, Shu Z et al (2022) Adaptive multi-modal fusion hashing via Hadamard matrix. In: Applied intelligence, pp 1–15

  10. Gionis A, Indyk P, Motwani R et al (1999) Similarity search in high dimensions via hashing. In: Proceedings of the 25th VLDB conference, pp 518–529

  11. Weiss Y, Torralba A, Fergus R (2009) Spectral hashing. In: Advances in neural information processing systems, pp 1753–1760

  12. Zhu X, Huang Z, Cheng H et al (2013) Sparse hashing for fast multimedia search. In: ACM transactions on information systems, pp 1–24

  13. Zhu X, Huang Z, Shen HT et al (2013) Linear cross-modal hashing for efficient multimedia search. In: ACM international conference on multimedia, pp 143–152

  14. Song J, Yang Y, Yang Y et al (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: ACM international conference on multimedia of data, pp 785–796

  15. Zheng F, Tang Y, Shao L (2018) Hetero-manifold regularization for cross-modal hashing. In: IEEE transactions on pattern analysis and machine intelligence, pp 1059–1071

  16. Wu F, Yu Z, Yang Y et al (2014) Sparse multi-modal hashing. In: IEEE transactions on multimedia, pp 427–439

  17. Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: IEEE conference on computer vision and pattern recognition, pp 2083–2090

  18. Wang D, Gao X, Wang X et al (2015) Semantic topic multimodal hashing for cross-media retrieval. In: International joint conference on artificial intelligence, pp 3890–3896

  19. Wang D, Wang Q, He L et al (2020) Joint and individual matrix factorization hashing for large- scale cross-modal retrieval. In: Pattern recognition, pp 1–12

  20. Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: ACM SIGIR conference on research and development in information retrieval, pp 415–424

  21. Wang D, Wang Q, Gao X (2018) Robust and flexible discrete hashing for cross–modal similarity search. In: IEEE transactions on circuits and systems for video technology, pp 2703–2715

  22. Yao T, Li Y, Guan W et al (2021) Discrete robust matrix factorization hashing for large-scale cross-media retrieval. In: IEEE transactions on knowledge and data engineering, pp 1–12

  23. Bronstein MM, Bronstein AM, Michel F et al (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: IEEE conference on computer vision and pattern recognition, pp 3594–3601

  24. Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. In: International joint conference on artificial intelligence, pp 1360–1367

  25. Zhang D, Li W-J (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the AAAI conference on artificial intelligence, pp 2177–2183

  26. Lin Z, Ding G, Hu M et al (2015) Semantics-preserving hashing for cross-view retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3864–3872

  27. Tang J, Wang K, Shao L (2016) Supervised matrix factorization hashing for cross–modal retrieval. In: IEEE transactions on image processing, pp 3157–3166

  28. Mandal D, Chaudhury KN, Biswas S (2017) Generalized semantic preserving hashing for n-label cross-modal retrieval. In: IEEE conference on computer vision and pattern recognition, pp 4076–4084

  29. Wang D, Gao X, Wang X et al (2018) Label consistent matrix factorization hashing for large-scale cross-modal similarity search. In: IEEE transactions on pattern analysis and machine intelligence, pp 2466–2479

  30. Xue F, Wang W, Zhou W et al (2020) Cross-modal retrieval via label category supervised matrix factorization hashing. In: Pattern recognition letters, pp 469–475

  31. Zhang D, Wu X, Yu J (2021) Label consistent flexible matrix factorization hashing for efficient cross-modal retrieval. In: ACM transactions on multimedia computing communications and applications, pp 1–18

  32. Obozinski G, Taskar B, Jordan MI (2010) Joint covariate selection and joint subspace selection for multiple classification problems. In: Statistics and computing, pp 231–252

  33. Kong D, Huang H, Huang H (2011) Robust nonnegative matrix factorization using L2,1-norm. In: ACM international conference on information and knowledge management, pp 673–682

  34. Lai Z, Chen Y, Wu J et al (2018) Jointly sparse hashing for image retrieval. In: IEEE transactions on image processing, pp 6147–6158

  35. Li C-X, Chen Z-D, Zhang P-F et al (2018) SCRATCH: a scalable discrete matrix factorization hashing for cross-modal retrieval. In: ACM international conference on multimedia, pp 1–9

  36. Shen F, Shen C, Liu W et al (2015) Supervised discrete hashing. In: IEEE conference on computer vision and pattern recognition, pp 37–45

  37. Russell BC, Torralba A, Murphy KP et al (2008) LabelMe: a database and web-based tool for image annotation. In: International Journal of Computer Vision, pp 157–173

  38. Seewald AK (2005) Digits–a dataset for handwritten digit recognition. In: Austrian research institut for artificial intelligence technical report, Vienna (Austria)

  39. Rashtchian C, Young P, Hodosh M et al (2010) Collecting image annotations using amazon’s mechanical turk. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical Turk, pp 139–147

  40. Rasiwasia N, Pereira J, Coviello E et al (2010) A new approach to cross-modal multimedia retrieval. In: ACM international conference on multimedia, pp 251–260

  41. Rupnik J, Shawe-Taylor J (2010) Multi-view canonical correlation analysis. In: Proceedings of the conference on data mining and data warehouses, pp 1–4

  42. Wang D, Wang Q, An Y et al (2020) Online collective matrix factorization hashing for large-scale cross-media retrieval. In: ACM SIGIR conference on research and development in information retrieval, pp 1409–1418

  43. Xu X, Shen F et al (2017) Learning discriminative binary codes for large-scale cross-modal retrieval. In: IEEE transactions on image processing, pp 2494–2507

  44. Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison Wesley

    Google Scholar 

Download references

Funding

This work was supported by the National Natural Science Foundation of China [Grant Nos. 61603159, 62162033, U21B2027, U1836218], Yunnan Provincial Major Science and Technology Special Plan Projects [Grant Nos. 202002AD080001, 202103AA080015], Yunnan Foundation Research Projects [Grant Nos. 202101AT070438, 202101BE070001-056], Excellent Key Teachers of QingLan Project in Jiangsu Province.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and experimental analysis were performed by ZS, KY and DZ. The first draft of the manuscript was written by KY and ZS. JY, ZY and XJW commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Zhenqiu Shu.

Ethics declarations

Conflict of interest

All authors declare that they have no conflicts of interest to this work.

Ethical approval

Our study did not involve animals.

Informed consent

Our study did not involve human participants.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shu, Z., Yong, K., Zhang, D. et al. Robust supervised matrix factorization hashing with application to cross-modal retrieval. Neural Comput & Applic 35, 6665–6684 (2023). https://doi.org/10.1007/s00521-022-08006-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-08006-6

Keywords

Navigation