Skip to main content
Log in

Discrete matrix factorization hashing for cross-modal retrieval

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Cross-modal hashing has recently attracted considerable attention in the large-scale retrieval task due to its low storage cost and high retrieval efficiency. However, the existing hashing methods still have some issues that need to be further solved. For example, most existing cross-modal hashing methods convert the original data into a common Hamming space to learn unified hash codes, which ignores the specific properties of multi-modal data. In addition, most of them relax the discrete constraint to learn hash codes, which may lead to quantization loss and suboptimal performance. In order to address the above problems, this paper proposes a novel cross-modal retrieval method, named discrete matrix factorization hashing (DMFH). DMFH is a two-stage approach. In the first stage, given training data, DMFH exploits the matrix factorization technique to learn modality-specific semantic representation for each modality, then generates the corresponding hash codes by linear projection. Meanwhile, in order to ensure that the hash codes can preserve the semantic similarity between different modalities, DMFH optimizes the hash codes by an affinity matrix constructed from the label information. During the first stage, DMFH proposes a discrete optimal algorithm to solve the discrete constraint problem in learning hash codes. In the second stage, given the hash codes learned in the first stage, DMFH utilizes kernel logistic regression to learn the nonlinear features from the unseen instance, then generates corresponding hash codes for each modality. Extensive experimental results on three public benchmark datasets show that the proposed DMFH outperforms several state-of-art cross-modal hashing methods in terms of accuracy and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Bronstein MM, Bronstein AM, Michel F, Paragios N (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: The twenty-third ieee conference on computer vision and pattern recognition, CVPR 2010. IEEE Computer Society, San Francisco, CA, USA, 13–18 June 2010, pp 3594–3601

  2. Charikar M (2002) Similarity estimation techniques from rounding algorithms. In: Reif JH (ed) Proceedings on 34th annual ACM symposium on theory of computing, May 19–21, 2002. ACM, Montréal, Québec, Canada, pp 380–388

  3. Chen Z, Zhong F, Min G, Leng Y, Ying Y (2018) Supervised intra- and inter-modality similarity preserving hashing for cross-modal retrieval. IEEE Access 6:27796–27808

    Article  Google Scholar 

  4. Chua T, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image database from national university of singapore. In: Proceedings of the 8th ACM international conference on image and video retrieval, CIVR 2009. ACM, Santorini Island, Greece, July 8–10, 2009

  5. Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: 2014 IEEE conference on computer vision and pattern recognition, CVPR 2014. IEEE Computer Society, Columbus, OH, USA, June 23–28, 2014, pp 2083–2090

  6. Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: VLDB’99, proceedings of 25th international conference on very large data bases, September 7–10, 1999. Morgan Kaufmann, Edinburgh, Scotland, UK, pp 518–529

  7. Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vis 106(2):210–233

    Article  Google Scholar 

  8. Hardoon DR, Szedmák S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664

    Article  Google Scholar 

  9. Huiskes MJ, Thomee B, Lew MS (2010) New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative. In: Proceedings of the 11th ACM SIGMM international conference on multimedia information retrieval, MIR 2010. ACM, Philadelphia, Pennsylvania, USA, March 29–31, 2010, pp 527–536

  10. Kulis B, Grauman K (2012) Kernelized locality-sensitive hashing. IEEE Trans Pattern Anal Mach Intell 34(6):1092–1104

    Article  Google Scholar 

  11. Liang J, He R, Sun Z, Tan T (2016) Group-invariant cross-modal subspace learning. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, IJCAI 2016, New York, NY, USA, 9–15 July 2016, pp 1739–1745

  12. Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: IEEE conference on computer vision and pattern recognition, CVPR 2015. IEEE Computer Society, Boston, MA, USA, June 7–12, 2015, pp 3864–3872

  13. Liu H, Ji R, Wu Y, Hua G (2016) Supervised matrix factorization for cross-modality hashing. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, IJCAI 2016. IJCAI/AAAI Press, New York, NY, USA, 9–15 July 2016, pp 1767–1773

  14. Liu H, Ji R, Wu Y, Huang F, Zhang B (2017) Cross-modality binary code learning via fusion similarity hashing. In: 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017. IEEE Computer Society, Honolulu, HI, USA, July 21–26, 2017, pp 6345–6353

  15. Liu W, Wang J, Ji R, Jiang Y, Chang S (2012) Supervised hashing with kernels. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE Computer Society, Providence, RI, USA, June 16–21, 2012, pp 2074–2081

  16. Liu X, Li A, Du J, Peng S, Fan W (2018) Efficient cross-modal retrieval via flexible supervised collective matrix factorization hashing. Multimed Tools Appl 77(21):28665–28683

    Article  Google Scholar 

  17. Mandal D, Chaudhury KN, Biswas S (2017) Generalized semantic preserving hashing for n-label cross-modal retrieval. In: 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017. IEEE Computer Society, Honolulu, HI, USA, July 21–26, 2017, pp 2633–2641

  18. Pereira JC, Coviello E, Doyle G, Rasiwasia N, Lanckriet GRG, Levy R, Vasconcelos N (2014) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535

    Article  Google Scholar 

  19. Schmidt M (2005) minfunc: unconstrained differentiable multivariate optimization in matlab

  20. Sharma A, Kumar A, III HD, Jacobs DW (2012) Generalized multiview analysis: a discriminative latent space. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE Computer Society, Providence, RI, USA, June 16–21, 2012, pp 2160–2167

  21. Shen F, Shen C, Liu W, Shen HT (2015) Supervised discrete hashing. In: IEEE Conference on computer vision and pattern recognition, CVPR 2015. IEEE Computer Society, Boston, MA, USA, June 7–12, 2015, pp 37–45

  22. Slaney M, Casey MA (2008) Locality-sensitive hashing for finding nearest neighbors [lecture notes]. IEEE Signal Process Mag 25(2):128–131

    Article  Google Scholar 

  23. Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2013. ACM, New York, NY, USA, June 22–27, 2013, pp 785–796

  24. Tang J, Wang K, Shao L (2016) Supervised matrix factorization hashing for cross-modal retrieval. IEEE Trans Image Process 25(7):3157–3166

    Article  MathSciNet  Google Scholar 

  25. Wang J, Zhang T, Song J, Sebe N, Shen HT (2018) A survey on learning to hash. IEEE Trans Pattern Anal Mach Intell 40(4):769–790

    Article  Google Scholar 

  26. Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems 21, proceedings of the twenty-second annual conference on neural information processing systems. Curran Associates, Inc., Vancouver, British Columbia, Canada, December 8–11, 2008, pp 1753–1760

  27. Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: Proceedings of the twenty-Eighth AAAI conference on artificial intelligence, July 27 -31, 2014, Québec City, Québec, Canada, pp 2156–2162

  28. Xu X, Shen F, Yang Y, Shen HT, Li X (2017) Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans Image Process 26(5):2494–2507

    Article  MathSciNet  Google Scholar 

  29. Zhang D, Li W (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the twenty-eighth AAAI conference on artificial intelligence, July 27–31, 2014. AAAI Press, Québec City, Québec, Canada, pp 2177–2183

  30. Zhang D, Li W (2014) Large-scale supervised multimodal hashing with semantic correlation maximization, pp 2177–2183

  31. Zhen Y, Yeung D (2012) Co-regularized hashing for multimodal data. In: Advances in neural information processing systems 25: 26th annual conference on neural information processing systems 2012. Proceedings of a meeting held December 3–6, 2012, Lake Tahoe, Nevada, United States, pp 1385–1393

  32. Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: The 37th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’14, Gold Coast. ACM, QLD, Australia, July 06–11, 2014, pp 415–424

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61972102, Grant 62006048, and Grant 61772141, in part by the Guangdong Provincial Natural Science Foundation under Grant 2021A1 515012017, and in part by the Science and Technology Planning Project of Guangdong Province, China, under Grant 2019B020208001 and Grant 2019B110210002, and in part by the Guangzhou Science and Technology Planning Project under Grant 201903010107 and Grant 201802010042.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Na Han.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fang, X., Liu, Z., Han, N. et al. Discrete matrix factorization hashing for cross-modal retrieval. Int. J. Mach. Learn. & Cyber. 12, 3023–3036 (2021). https://doi.org/10.1007/s13042-021-01395-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-021-01395-5

Keywords

Navigation