Discrete matrix factorization hashing for cross-modal retrieval

Fang, Xiaozhao; Liu, Zhihu; Han, Na; Jiang, Lin; Teng, Shaohua

doi:10.1007/s13042-021-01395-5

Discrete matrix factorization hashing for cross-modal retrieval

Original Article
Published: 02 August 2021

Volume 12, pages 3023–3036, (2021)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Xiaozhao Fang¹,
Zhihu Liu¹,
Na Han ORCID: orcid.org/0000-0002-9639-7633²,
Lin Jiang¹ &
…
Shaohua Teng¹

534 Accesses
11 Citations
1 Altmetric
Explore all metrics

Abstract

Cross-modal hashing has recently attracted considerable attention in the large-scale retrieval task due to its low storage cost and high retrieval efficiency. However, the existing hashing methods still have some issues that need to be further solved. For example, most existing cross-modal hashing methods convert the original data into a common Hamming space to learn unified hash codes, which ignores the specific properties of multi-modal data. In addition, most of them relax the discrete constraint to learn hash codes, which may lead to quantization loss and suboptimal performance. In order to address the above problems, this paper proposes a novel cross-modal retrieval method, named discrete matrix factorization hashing (DMFH). DMFH is a two-stage approach. In the first stage, given training data, DMFH exploits the matrix factorization technique to learn modality-specific semantic representation for each modality, then generates the corresponding hash codes by linear projection. Meanwhile, in order to ensure that the hash codes can preserve the semantic similarity between different modalities, DMFH optimizes the hash codes by an affinity matrix constructed from the label information. During the first stage, DMFH proposes a discrete optimal algorithm to solve the discrete constraint problem in learning hash codes. In the second stage, given the hash codes learned in the first stage, DMFH utilizes kernel logistic regression to learn the nonlinear features from the unseen instance, then generates corresponding hash codes for each modality. Extensive experimental results on three public benchmark datasets show that the proposed DMFH outperforms several state-of-art cross-modal hashing methods in terms of accuracy and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Modality-specific matrix factorization hashing for cross-modal retrieval

Article 19 June 2020

Haixia Xiong, Weihua Ou, … Anzhi Wang

Robust supervised matrix factorization hashing with application to cross-modal retrieval

Article 27 November 2022

Zhenqiu Shu, Kailing Yong, … Xiao-Jun Wu

A semantic-consistency asymmetric matrix factorization hashing method for cross-modal retrieval

Article 13 June 2023

Yun Liu, Shujuan Ji, … Dickson K. W. Chiu

References

Bronstein MM, Bronstein AM, Michel F, Paragios N (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: The twenty-third ieee conference on computer vision and pattern recognition, CVPR 2010. IEEE Computer Society, San Francisco, CA, USA, 13–18 June 2010, pp 3594–3601
Charikar M (2002) Similarity estimation techniques from rounding algorithms. In: Reif JH (ed) Proceedings on 34th annual ACM symposium on theory of computing, May 19–21, 2002. ACM, Montréal, Québec, Canada, pp 380–388
Chen Z, Zhong F, Min G, Leng Y, Ying Y (2018) Supervised intra- and inter-modality similarity preserving hashing for cross-modal retrieval. IEEE Access 6:27796–27808
Article Google Scholar
Chua T, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image database from national university of singapore. In: Proceedings of the 8th ACM international conference on image and video retrieval, CIVR 2009. ACM, Santorini Island, Greece, July 8–10, 2009
Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: 2014 IEEE conference on computer vision and pattern recognition, CVPR 2014. IEEE Computer Society, Columbus, OH, USA, June 23–28, 2014, pp 2083–2090
Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: VLDB’99, proceedings of 25th international conference on very large data bases, September 7–10, 1999. Morgan Kaufmann, Edinburgh, Scotland, UK, pp 518–529
Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vis 106(2):210–233
Article Google Scholar
Hardoon DR, Szedmák S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664
Article Google Scholar
Huiskes MJ, Thomee B, Lew MS (2010) New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative. In: Proceedings of the 11th ACM SIGMM international conference on multimedia information retrieval, MIR 2010. ACM, Philadelphia, Pennsylvania, USA, March 29–31, 2010, pp 527–536
Kulis B, Grauman K (2012) Kernelized locality-sensitive hashing. IEEE Trans Pattern Anal Mach Intell 34(6):1092–1104
Article Google Scholar
Liang J, He R, Sun Z, Tan T (2016) Group-invariant cross-modal subspace learning. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, IJCAI 2016, New York, NY, USA, 9–15 July 2016, pp 1739–1745
Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: IEEE conference on computer vision and pattern recognition, CVPR 2015. IEEE Computer Society, Boston, MA, USA, June 7–12, 2015, pp 3864–3872
Liu H, Ji R, Wu Y, Hua G (2016) Supervised matrix factorization for cross-modality hashing. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, IJCAI 2016. IJCAI/AAAI Press, New York, NY, USA, 9–15 July 2016, pp 1767–1773
Liu H, Ji R, Wu Y, Huang F, Zhang B (2017) Cross-modality binary code learning via fusion similarity hashing. In: 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017. IEEE Computer Society, Honolulu, HI, USA, July 21–26, 2017, pp 6345–6353
Liu W, Wang J, Ji R, Jiang Y, Chang S (2012) Supervised hashing with kernels. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE Computer Society, Providence, RI, USA, June 16–21, 2012, pp 2074–2081
Liu X, Li A, Du J, Peng S, Fan W (2018) Efficient cross-modal retrieval via flexible supervised collective matrix factorization hashing. Multimed Tools Appl 77(21):28665–28683
Article Google Scholar
Mandal D, Chaudhury KN, Biswas S (2017) Generalized semantic preserving hashing for n-label cross-modal retrieval. In: 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017. IEEE Computer Society, Honolulu, HI, USA, July 21–26, 2017, pp 2633–2641
Pereira JC, Coviello E, Doyle G, Rasiwasia N, Lanckriet GRG, Levy R, Vasconcelos N (2014) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535
Article Google Scholar
Schmidt M (2005) minfunc: unconstrained differentiable multivariate optimization in matlab
Sharma A, Kumar A, III HD, Jacobs DW (2012) Generalized multiview analysis: a discriminative latent space. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE Computer Society, Providence, RI, USA, June 16–21, 2012, pp 2160–2167
Shen F, Shen C, Liu W, Shen HT (2015) Supervised discrete hashing. In: IEEE Conference on computer vision and pattern recognition, CVPR 2015. IEEE Computer Society, Boston, MA, USA, June 7–12, 2015, pp 37–45
Slaney M, Casey MA (2008) Locality-sensitive hashing for finding nearest neighbors [lecture notes]. IEEE Signal Process Mag 25(2):128–131
Article Google Scholar
Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2013. ACM, New York, NY, USA, June 22–27, 2013, pp 785–796
Tang J, Wang K, Shao L (2016) Supervised matrix factorization hashing for cross-modal retrieval. IEEE Trans Image Process 25(7):3157–3166
Article MathSciNet Google Scholar
Wang J, Zhang T, Song J, Sebe N, Shen HT (2018) A survey on learning to hash. IEEE Trans Pattern Anal Mach Intell 40(4):769–790
Article Google Scholar
Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems 21, proceedings of the twenty-second annual conference on neural information processing systems. Curran Associates, Inc., Vancouver, British Columbia, Canada, December 8–11, 2008, pp 1753–1760
Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: Proceedings of the twenty-Eighth AAAI conference on artificial intelligence, July 27 -31, 2014, Québec City, Québec, Canada, pp 2156–2162
Xu X, Shen F, Yang Y, Shen HT, Li X (2017) Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans Image Process 26(5):2494–2507
Article MathSciNet Google Scholar
Zhang D, Li W (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the twenty-eighth AAAI conference on artificial intelligence, July 27–31, 2014. AAAI Press, Québec City, Québec, Canada, pp 2177–2183
Zhang D, Li W (2014) Large-scale supervised multimodal hashing with semantic correlation maximization, pp 2177–2183
Zhen Y, Yeung D (2012) Co-regularized hashing for multimodal data. In: Advances in neural information processing systems 25: 26th annual conference on neural information processing systems 2012. Proceedings of a meeting held December 3–6, 2012, Lake Tahoe, Nevada, United States, pp 1385–1393
Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: The 37th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’14, Gold Coast. ACM, QLD, Australia, July 06–11, 2014, pp 415–424

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61972102, Grant 62006048, and Grant 61772141, in part by the Guangdong Provincial Natural Science Foundation under Grant 2021A1 515012017, and in part by the Science and Technology Planning Project of Guangdong Province, China, under Grant 2019B020208001 and Grant 2019B110210002, and in part by the Guangzhou Science and Technology Planning Project under Grant 201903010107 and Grant 201802010042.

Author information

Authors and Affiliations

School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, 510006, China
Xiaozhao Fang, Zhihu Liu, Lin Jiang & Shaohua Teng
School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, 510665, China
Na Han

Authors

Xiaozhao Fang
View author publications
You can also search for this author in PubMed Google Scholar
Zhihu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Na Han
View author publications
You can also search for this author in PubMed Google Scholar
Lin Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Shaohua Teng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Na Han.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fang, X., Liu, Z., Han, N. et al. Discrete matrix factorization hashing for cross-modal retrieval. Int. J. Mach. Learn. & Cyber. 12, 3023–3036 (2021). https://doi.org/10.1007/s13042-021-01395-5

Download citation

Received: 10 July 2020
Accepted: 22 July 2021
Published: 02 August 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s13042-021-01395-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Discrete matrix factorization hashing for cross-modal retrieval

Abstract

Access this article

Similar content being viewed by others

Modality-specific matrix factorization hashing for cross-modal retrieval

Robust supervised matrix factorization hashing with application to cross-modal retrieval

A semantic-consistency asymmetric matrix factorization hashing method for cross-modal retrieval

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Discrete matrix factorization hashing for cross-modal retrieval

Abstract

Access this article

Similar content being viewed by others

Modality-specific matrix factorization hashing for cross-modal retrieval

Robust supervised matrix factorization hashing with application to cross-modal retrieval

A semantic-consistency asymmetric matrix factorization hashing method for cross-modal retrieval

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation