Abstract
Cross-modal hashing has recently attracted considerable attention in the large-scale retrieval task due to its low storage cost and high retrieval efficiency. However, the existing hashing methods still have some issues that need to be further solved. For example, most existing cross-modal hashing methods convert the original data into a common Hamming space to learn unified hash codes, which ignores the specific properties of multi-modal data. In addition, most of them relax the discrete constraint to learn hash codes, which may lead to quantization loss and suboptimal performance. In order to address the above problems, this paper proposes a novel cross-modal retrieval method, named discrete matrix factorization hashing (DMFH). DMFH is a two-stage approach. In the first stage, given training data, DMFH exploits the matrix factorization technique to learn modality-specific semantic representation for each modality, then generates the corresponding hash codes by linear projection. Meanwhile, in order to ensure that the hash codes can preserve the semantic similarity between different modalities, DMFH optimizes the hash codes by an affinity matrix constructed from the label information. During the first stage, DMFH proposes a discrete optimal algorithm to solve the discrete constraint problem in learning hash codes. In the second stage, given the hash codes learned in the first stage, DMFH utilizes kernel logistic regression to learn the nonlinear features from the unseen instance, then generates corresponding hash codes for each modality. Extensive experimental results on three public benchmark datasets show that the proposed DMFH outperforms several state-of-art cross-modal hashing methods in terms of accuracy and efficiency.
Similar content being viewed by others
References
Bronstein MM, Bronstein AM, Michel F, Paragios N (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: The twenty-third ieee conference on computer vision and pattern recognition, CVPR 2010. IEEE Computer Society, San Francisco, CA, USA, 13–18 June 2010, pp 3594–3601
Charikar M (2002) Similarity estimation techniques from rounding algorithms. In: Reif JH (ed) Proceedings on 34th annual ACM symposium on theory of computing, May 19–21, 2002. ACM, Montréal, Québec, Canada, pp 380–388
Chen Z, Zhong F, Min G, Leng Y, Ying Y (2018) Supervised intra- and inter-modality similarity preserving hashing for cross-modal retrieval. IEEE Access 6:27796–27808
Chua T, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image database from national university of singapore. In: Proceedings of the 8th ACM international conference on image and video retrieval, CIVR 2009. ACM, Santorini Island, Greece, July 8–10, 2009
Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: 2014 IEEE conference on computer vision and pattern recognition, CVPR 2014. IEEE Computer Society, Columbus, OH, USA, June 23–28, 2014, pp 2083–2090
Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: VLDB’99, proceedings of 25th international conference on very large data bases, September 7–10, 1999. Morgan Kaufmann, Edinburgh, Scotland, UK, pp 518–529
Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vis 106(2):210–233
Hardoon DR, Szedmák S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664
Huiskes MJ, Thomee B, Lew MS (2010) New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative. In: Proceedings of the 11th ACM SIGMM international conference on multimedia information retrieval, MIR 2010. ACM, Philadelphia, Pennsylvania, USA, March 29–31, 2010, pp 527–536
Kulis B, Grauman K (2012) Kernelized locality-sensitive hashing. IEEE Trans Pattern Anal Mach Intell 34(6):1092–1104
Liang J, He R, Sun Z, Tan T (2016) Group-invariant cross-modal subspace learning. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, IJCAI 2016, New York, NY, USA, 9–15 July 2016, pp 1739–1745
Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: IEEE conference on computer vision and pattern recognition, CVPR 2015. IEEE Computer Society, Boston, MA, USA, June 7–12, 2015, pp 3864–3872
Liu H, Ji R, Wu Y, Hua G (2016) Supervised matrix factorization for cross-modality hashing. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, IJCAI 2016. IJCAI/AAAI Press, New York, NY, USA, 9–15 July 2016, pp 1767–1773
Liu H, Ji R, Wu Y, Huang F, Zhang B (2017) Cross-modality binary code learning via fusion similarity hashing. In: 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017. IEEE Computer Society, Honolulu, HI, USA, July 21–26, 2017, pp 6345–6353
Liu W, Wang J, Ji R, Jiang Y, Chang S (2012) Supervised hashing with kernels. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE Computer Society, Providence, RI, USA, June 16–21, 2012, pp 2074–2081
Liu X, Li A, Du J, Peng S, Fan W (2018) Efficient cross-modal retrieval via flexible supervised collective matrix factorization hashing. Multimed Tools Appl 77(21):28665–28683
Mandal D, Chaudhury KN, Biswas S (2017) Generalized semantic preserving hashing for n-label cross-modal retrieval. In: 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017. IEEE Computer Society, Honolulu, HI, USA, July 21–26, 2017, pp 2633–2641
Pereira JC, Coviello E, Doyle G, Rasiwasia N, Lanckriet GRG, Levy R, Vasconcelos N (2014) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535
Schmidt M (2005) minfunc: unconstrained differentiable multivariate optimization in matlab
Sharma A, Kumar A, III HD, Jacobs DW (2012) Generalized multiview analysis: a discriminative latent space. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE Computer Society, Providence, RI, USA, June 16–21, 2012, pp 2160–2167
Shen F, Shen C, Liu W, Shen HT (2015) Supervised discrete hashing. In: IEEE Conference on computer vision and pattern recognition, CVPR 2015. IEEE Computer Society, Boston, MA, USA, June 7–12, 2015, pp 37–45
Slaney M, Casey MA (2008) Locality-sensitive hashing for finding nearest neighbors [lecture notes]. IEEE Signal Process Mag 25(2):128–131
Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2013. ACM, New York, NY, USA, June 22–27, 2013, pp 785–796
Tang J, Wang K, Shao L (2016) Supervised matrix factorization hashing for cross-modal retrieval. IEEE Trans Image Process 25(7):3157–3166
Wang J, Zhang T, Song J, Sebe N, Shen HT (2018) A survey on learning to hash. IEEE Trans Pattern Anal Mach Intell 40(4):769–790
Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems 21, proceedings of the twenty-second annual conference on neural information processing systems. Curran Associates, Inc., Vancouver, British Columbia, Canada, December 8–11, 2008, pp 1753–1760
Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: Proceedings of the twenty-Eighth AAAI conference on artificial intelligence, July 27 -31, 2014, Québec City, Québec, Canada, pp 2156–2162
Xu X, Shen F, Yang Y, Shen HT, Li X (2017) Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans Image Process 26(5):2494–2507
Zhang D, Li W (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the twenty-eighth AAAI conference on artificial intelligence, July 27–31, 2014. AAAI Press, Québec City, Québec, Canada, pp 2177–2183
Zhang D, Li W (2014) Large-scale supervised multimodal hashing with semantic correlation maximization, pp 2177–2183
Zhen Y, Yeung D (2012) Co-regularized hashing for multimodal data. In: Advances in neural information processing systems 25: 26th annual conference on neural information processing systems 2012. Proceedings of a meeting held December 3–6, 2012, Lake Tahoe, Nevada, United States, pp 1385–1393
Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: The 37th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’14, Gold Coast. ACM, QLD, Australia, July 06–11, 2014, pp 415–424
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant 61972102, Grant 62006048, and Grant 61772141, in part by the Guangdong Provincial Natural Science Foundation under Grant 2021A1 515012017, and in part by the Science and Technology Planning Project of Guangdong Province, China, under Grant 2019B020208001 and Grant 2019B110210002, and in part by the Guangzhou Science and Technology Planning Project under Grant 201903010107 and Grant 201802010042.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Fang, X., Liu, Z., Han, N. et al. Discrete matrix factorization hashing for cross-modal retrieval. Int. J. Mach. Learn. & Cyber. 12, 3023–3036 (2021). https://doi.org/10.1007/s13042-021-01395-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-021-01395-5