Abstract
Hashing is an effective technique to solve large-scale data storage problem and achieve efficient retrieval, and it is also a core technology to promote the intelligent development of the new infrastructure construction. In most practical situations, label information is unavailable, and creating manual annotations is a time-consuming and laborious process. Therefore, unsupervised cross-modal hashing technique has received extensive attention from the information retrieval community due to its fast retrieval speed and feasibility. However, the capabilities of existing unsupervised cross-modal hashing methods are not sufficient to comprehensively describe the complex relations among different modalities, such as the balance of complementary and consistency between different modalities. In this article, we propose a new-type of unsupervised cross-modal hashing method called Fast Unsupervised Consistent and Modality-Specific Hashing (FUCMSH). Specifically, FUCMSH consists of two main modules, i.e., shared matrix factorization module (SMFM) and individual auto-encoding module (IAEM). In the SMFM, FUCMSH dynamically assigns weights to different modalities to adaptively balance the contribution of different modalities. By doing so, the information completeness of the shared consistent representation can be guaranteed. In the IAEM, FUCMSH learns individual modality-specific latent representations of different modalities through modality-specific linear autoencoders. Moreover, FUCMSH makes use of the transfer learning to link the relationships between different individual modality-specific latent representations. Combined with the SMFM and the IAEM, the discriminative capability of the generated binary codes can be significantly improved. The relatively extensive experimental results manifest the superiority of the proposed FUCMSH.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability Statement
This publication is supported by multiple datasets, which are openly available at the hyperlinks in the dataset section or at the locations cited in the reference section.
References
Gao D, Jin L, Chen B, Qiu M, Li P, Wei Y, Hu Y, Wang H (2020) Fashionbert: text and image matching with adaptive loss for cross-modal retrieval. In: ACM SIGIR. ACM, pp 2251–2260
Lin K, Xu X, Gao L, Wang Z, Shen HT (2020) Learning cross-aligned latent embeddings for zero-shot cross-modal retrieval. In: AAAI. AAAI Press, pp 11515–11522
Wang B, Yang Y, Xu X, Hanjalic A, Shen HT (2017) Adversarial cross-modal retrieval. In: ACM MM. ACM, pp 154–162
Wu Y, Wang S, Huang Q (2020) Online fast adaptive low-rank similarity learning for cross-modal retrieval. IEEE Trans Multimed 22(5):1310–1322
Zhang Y, Zhou W, Wang M, Tian Q, Li H (2021) Deep relation embedding for cross-modal retrieval. IEEE Trans Image Process 30:617–627
Wang Z, Zhang Z, Luo Y, Huang Z, Shen HT (2021) Deep collaborative discrete hashing with semantic-invariant structure construction. IEEE Trans Multimed 23:1274–1286
Qiang H, Wan Y, Liu Z, Xiang L, Meng X (2020) Discriminative deep asymmetric supervised hashing for cross-modal retrieval. Knowl Based Syst 204:106188
Yang Z, Long J, Zhu L, Huang W (2020) Nonlinear robust discrete hashing for cross-modal retrieval. In: ACM SIGIR, pp 1349–1358
Li Z, Tang J, Zhang L, Yang J (2020) Weakly-supervised semantic guided hashing for social image retrieval. Int J Comput Vis 128(8):2265–2278
Fang Y, Li B, Li X, Ren Y (2021) Unsupervised cross-modal similarity via latent structure discrete hashing factorization. Knowl Based Syst 218:106857
Mandal D, Chaudhury KN, Biswas S (2019) Generalized semantic preserving hashing for cross-modal retrieval. TIP 28(1):102–112
Bronstein MM, Bronstein AM, Michel F, Paragios N (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: CVPR, pp 3594–3601
Liu X, Nie X, Zeng W, Cui C, Zhu L, Yin Y (2018) Fast discrete cross-modal hashing with regressing from semantic labels. In: ACM MM, pp 1662–1669
Shen F, Shen C, Liu W, Shen HT (2015) Supervised discrete hashing. In: IEEE conference on computer vision and pattern recognition, CVPR 2015, Boston, MA, USA, June 7–12, 2015, pp 37–45
Luo X, Zhang P, Wu Y, Chen Z, Huang H, Xu X (2018) Asymmetric discrete cross-modal hashing. In: ICMR, pp 204–212
Liu H, Wang R, Shan S, Chen X (2016) Deep supervised hashing for fast image retrieval. In: CVPR, pp 2064–2072
Yang Z, Raymond OI, Huang W, Liao Z, Zhu L, Long J (2020) Scalable deep asymmetric hashing via unequal-dimensional embeddings for image similarity search. Neurocomputing 412:262–275
Li F, Wang T, Zhu L, Zhang Z, Wang X (2021) Task-adaptive asymmetric deep cross-modal hashing. Knowl Based Syst 219:106851
Deng C, Yang E, Liu T, Li J, Liu W, Tao D (2019) Unsupervised semantic-preserving adversarial hashing for image search. IEEE Trans Image Process 28(8):4032–4044
Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: SIGMOD, pp 785–796
Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: SIGIR, pp 415–424
He K, Wen F, Sun J (2013) K-means hashing: an affinity-preserving quantization method for learning binary compact codes. In: CVPR, pp 2938–2945
Shen F, Xu Y, Liu L, Yang Y, Huang Z, Shen HT (2018) Unsupervised deep hashing with similarity-adaptive and discrete optimization. IEEE Trans Pattern Anal Mach Intell 40(12):3034–3044
Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. In: NIPS, pp 1753–1760
Zhang H, Liu L, Long Y, Shao L (2018) Unsupervised deep hashing with pseudo labels for scalable image retrieval. IEEE Trans Image Process 27(4):1626–1638
Fang Y, Zhang H, Ren Y (2019) Unsupervised cross-modal retrieval via multi-modal graph regularized smooth matrix factorization hashing. Knowl Based Syst 171:69–80
Yu J, Zhou H, Zhan Y, Tao D (2021) Deep graph-neighbor coherence preserving network for unsupervised cross-modal hashing. In: AAAI. AAAI Press, pp 4626–4634
Zhang D, Li W (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: AAAI, pp 2177–2183
Xu X, Shen F, Yang Y, Shen HT, Li X (2017) Learning discriminative binary codes for large-scale cross-modal retrieval. TIP 26(5):2494–2507
Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: CVPR, pp 3864–3872
Kim S, Choi S (2013) Multi-view anchor graph hashing. In: ICASSP. IEEE, pp 3123–3127
Meng M, Wang H, Yu J, Chen H, Wu J (2021) Asymmetric supervised consistent and specific hashing for cross-modal retrieval. IEEE Trans Image Process 30:986–1000
Sun L, Ji S, Ye J (2008) A least squares formulation for canonical correlation analysis. In: ICML, ACM International Conference Proceeding Series, vol 307, pp 1024–1031
Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. In: IJCAI, pp 1360–1365
Lee K, Chen X, Hua G, Hu H, He X (2018) Stacked cross attention for image-text matching. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) ECCV, Lecture notes in computer science, vol 11208. Springer, Berlin, pp 212–228
Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: CVPR, pp 2083–2090
Wang D, Wang Q, He L, Gao X, Tian Y (2020) Joint and individual matrix factorization hashing for large-scale cross-modal retrieval. Pattern Recognit 107:107479
Cheng M, Jing L, Ng MK (2020) Robust unsupervised cross-modal hashing for multimedia retrieval. ACM Trans Inf Syst 38(3):30:1-30:25
Wang L, Yang J, Zareapoor M, Zheng Z (2021) Cluster-wise unsupervised hashing for cross-modal similarity search. Pattern Recognit 111:107732
Ji D, Gao J, Fei H, Teng C, Ren Y (2020) A deep neural network model for speakers coreference resolution in legal texts. Inf Process Manag 57(6):102365
Farrugia RA, Guillemot C (2020) Light field super-resolution using a low-rank prior and deep convolutional neural networks. IEEE Trans Pattern Anal Mach Intell 42(5):1162–1175
Zhang C, Liu A, Liu X, Xu Y, Yu H, Ma Y, Li T (2021) Interpreting and improving adversarial robustness of deep neural networks with neuron sensitivity. IEEE Trans Image Process 30:1291–1304
Fu X, Wang W, Huang Y, Ding X, Paisley JW (2021) Deep multiscale detail networks for multiband spectral image sharpening. IEEE Trans Neural Netw Learn Syst 32(5):2090–2104
Yang E, Deng C, Liu W, Liu X, Tao D, Gao X (2017) Pairwise relationship guided deep hashing for cross-modal retrieval. In: AAAI, pp 1618–1625
Jiang Q, Li W (2017) Deep cross-modal hashing. In: CVPR, pp 3270–3278
Wu G, Lin Z, Han J, Liu L, Ding G, Zhang B, Shen J (2018) Unsupervised deep hashing via binary latent factor models for large-scale cross-modal retrieval. In: IJCAI, pp 2854–2860
Zhu L, Lu X, Cheng Z, Li J, Zhang H (2020) Deep collaborative multi-view hashing for large-scale image search. IEEE Trans Image Process 29:4643–4655
Zhang J, Peng Y, Yuan M (2018) Unsupervised generative adversarial cross-modal hashing. In: AAAI, pp 539–546
Shao M, Kit D, Fu Y (2014) Generalized transfer subspace learning through low-rank constraint. Int J Comput Vis 109(1–2):74–93
Kafai M, Eshghi K (2019) Croification: Accurate kernel classification with the efficiency of sparse linear SVM. IEEE Trans Pattern Anal Mach Intell 41(1):34–48
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Rasiwasia N, Pereira JC, Coviello E, Doyle G, Lanckriet GRG, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: ACM MM. ACM, pp 251–260
Huiskes M J, Lew M S (2008) The MIR flickr retrieval evaluation. In: ACM SIGMM, pp 39–43
Chua T, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image database from national university of Singapore. Iin: Proceedings of the 8th ACM international conference on image and video retrieval, CIVR 2009, Santorini Island, Greece, July 8–10, 2009
Chen Z, Wang Y, Li H, Luo X, Nie L, Xu X (2019) A two-step cross-modal hashing by exploiting label correlations and preserving similarity in both steps. In: ACM MM, pp 1694–1702
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR
Deng J, Dong W, Socher R, Li R, Li K, Li F (2009) Imagenet: a large-scale hierarchical image database. In: CVPR, pp 248–255
Liu H, Lin M, Zhang S, Wu Y, Huang F, Ji R (2018) Dense auto-encoder hashing for robust cross-modality retrieval. In: ACM MM. ACM, pp 1589–1597
Zheng C, Zhu L, Cheng Z, Li J, Liu A (2021) Adaptive partial multi-view hashing for efficient social image retrieval. IEEE Trans Multimed 23:4079–4092
Acknowledgements
This work was supported in part by the National Key R &D Program of China under Grant 2021YFB3900902, in part by the National Natural Science Foundation of China under Grants (62202501, U2003208), and in part by the Science and Technology Plan of Hunan Province under Grants (2022JJ40638, 2016TP1003).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, Z., Deng, X. & Long, J. Fast unsupervised consistent and modality-specific hashing for multimedia retrieval. Neural Comput & Applic 35, 6207–6223 (2023). https://doi.org/10.1007/s00521-022-08008-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-08008-4