Abstract
Hashing methods have been proposed for the cross-modal retrieval tasks due to their flexibility and effectiveness. The main idea of cross-modal hashing is to embed heterogeneous multimedia data into common Hamming space. How to effectively exploit the modal semantic information and reduce optimization loss have been a challenging problem for existing cross-modal hashing methods. To address these issues, we propose a supervised cross-modal hashing method, called Latent Semantic-Enhanced discrete Hashing (LSEH). LSEH first leverages matrix factorization to obtain individual latent semantic representations of different modalities, and then applies correlation analysis and kernel discriminant analysis when projecting the latent semantic representations into the common Hamming space. Finally, the binary codes are directly generated with discrete optimization strategy. Experimental results on four benchmark datasets demonstrate that LSEH outperforms state-of-the-art cross-modal hashing methods in terms of retrieval accuracy, especially when dealing with image to text retrieval task, using shorter hash codes to associate images and texts.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Peng Y, Huang X, Zhao Y (2018) An Overview of Cross-Media Retrieval: Concepts, Methodologies, Benchmarks, and Challenges. IEEE Transactions on Circuits and Systems for Video Technology 28(9):2372–2385
M Müller, Arzt, A., Balke, S., Dorfer, M., & Widmer, G. (2019) Cross-modal music retrieval and applications: an overview of key methodologies. IEEE Signal Processing Magazine 36(1):52–62
Liu, H, Feng, Y, Zhou, M, & Qiang, B (2020). Semantic ranking structure preserving for cross-modal retrieval. Applied Intelligence, 1-11
Djenouri Y, Belhadi A, Fournier-Viger P, Lin CW (2018) Fast and effective cluster-based information retrieval using frequent closed itemsets. Information Sciences 453:154–167
Djenouri, Y, Belhadi, A, Djenouri, D, & Lin, CW (2021). Cluster-based information retrieval using pattern mining. Applied Intelligence, 51, 1888–1909
Djenouri, Y, & Hjelmervik, J (2020). Hybrid Decomposition Convolution Neural Network and Vocabulary Forest for Image Retrieval. 25th International Conference on Pattern Recognition, 3064-3070
Yu E, Sun J, Li J, Chang X, Han X, Hauptmann A (2018) Adaptive semi-supervised feature selection for cross-modal retrieval. IEEE Transactions on Multimedia 21(5):1276–1288
Yan J, Zhang H, Sun J, Wang Q, Guo P, Meng L, Dong X (2018) Joint graph regularization based modality-dependent cross-media retrieval. Multimedia Tools and Applications 77(3):3009–3027
Wang K, He R, Wang L, Wang W, Tan T (2016) Joint Feature Selection and Subspace Learning for Cross-Modal Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 38(10):2010–2023
Wu, J, Xie, X, Nie, L, & Lin, Z (2021). Reconstruction regularized low-rank subspace learning for cross-modal retrieval. Pattern Recognit, 113
Yao, T, Wang, G, Yan, L, Kong, X, Su, Q, Zhang, C, & Tian, Q (2019). Online Latent Semantic Hashing for Cross-media Retrieval. Pattern Recognition, pp, 1-11
Liong, VE, Lu, J, & Tan, Y (2018). Cross-Modal Discrete Hashing. Pattern Recognition, pp, 114-129
Ding, G, Guo, Y, & Zhou, J (2014). Collective Matrix Factorization Hashing for Multimodal Data. Computer Vision Pattern Recognition, pp, 2075-2082
Kumar S, Udupa R (2011). Learning Hash Functions for CrossView Similarity Search. International Joint Conference on Artificial Intelligence. AAAI Pres, pp 1360–1365
Zhang, D, & Li, W (2014). Large-scale supervised multimodal hashing with semantic correlation maximization. National Conference on Artificial Intelligence, pp, 2177-2183
Zhou, J, Ding, G, & Guo, Y (2014). Latent semantic sparse hashing for cross-modal similarity search. International ACM SIGIR Conferenceon Research and Development in Information Retrieval, pp 415-424
Ma D, Liang J, He R, Kong X (2017) Nonlinear Discrete Cross-Modal Hashing for Visual-Textual Data. IEEE MultiMedia 24(2):56–65
Zhen, Y, & Yeung, D (2012). Co-Regularized Hashing for Multimodal Data. Neural Information Processing Systems, pp 1376-1384
Liu, H, Ji, R, Wu, Y, Huang, F, & Zhang, B (2017). Cross-Modality Binary Code Learning via Fusion Similarity Hashing. Computer Vision Pattern Recognition, pp 7380–7388
Lin, Z, Ding, G, Hu, M, & Wang, J (2015). Semantics-preserving hashing for cross-view retrieval. IEEE Conference on Computer Vision Pattern Recognition, pp 3864-3872
Fang, Y, & Ren, Y (2020). Supervised discrete cross-modal hashing based on kernel discriminant analysis. Pattern Recognition, 98
Wang D, Gao X, Wang X (2018) Label Consistent Matrix Factorization Hashing for Large-Scale CrossModal Similarity Search. IEEE Transactions on Pattern Analysis and Machine Intelligence 41(10):2466–2479
Liu, H, Ji, R, Wu, Y, & Hua, G (2016). Supervised matrix factorization for cross-modality hashing. In Proceedings of the International Joint Conference on Artificial Intelligence, pp 1767–1773
Tang J, Wang K, Shao L (2016) Supervised Matrix Factorization Hashing for Cross-Modal Retrieval. IEEE Transactions on Image Processing 25(7):3157–3166
Lu X, Zhu L, Cheng Z (2019) Efficient discrete latent semantic hashing for scalable cross-modal retrieval. Signal processing 154:217–231
Fang, Y, Zhang, H, & Ren, Y (2019). Unsupervised cross-modal retrieval via Multi-modal graph regularized Smooth Matrix Factorization Hashing. Knowledge Based Systems, pp 69-80
Zeng, H, Zhang, H, & Zhu, L (2019). Label consistent locally linear embedding based cross-modal hashing. Information Processing and Management
Yao, T, Kong, X, Fu, H, & Tian, Q (2016). Semantic consistency hashing for cross-modal retrieval. Neurocomputing, pp 250-259
Dong, F, Nie, X, Liu, X, Geng, L, & Wang, Q (2018). Cross-modal hashing based on category structure preserving. Journal of Visual Communication and Image Representation, pp 28-33
Zheng, C, Zhu, L, Zhang, S, & Zhang, H (2020). Efficient parameter-free adaptive multi-modal hashing. IEEE Signal Processing Letters, PP(99), 1-1
Jiang, QY, & Li, WJ (2017). Deep cross-modal hashing. Computer Vision Pattern Recognition, pp 3270-3278
Zhong, F, Chen, Z, & Min, G (2018). Deep Discrete Cross-Modal Hashing for Cross-Media Retrieval. Pattern Recognition, pp 64-77
Cai D, He X, Han J (2011) Speed up kernel discriminant analysis. Vldb Journal 20(1):21–33
Rasiwasia, N, Pereira, JC, Coviello, E, Doyle, G, Lanckriet, GR, Levy, R, & Vasconcelos, N (2010). A new approach to cross-modal multimedia retrieval. Acm Multimedia, pp 251-260
Russell BC, Torralba A, Murphy K, Freeman WT (2008) LabelMe: A Database and Web-Based Tool for Image Annotation. International Journal of Computer Vision 77(1):157–173
Hwang SJ, Grauman K (2012) Reading between the lines: Object localization using implicit cues from image tags. Computer Vision Pattern Recognition 34(6):1145–1158
Krapac, J, Allan, M, & Verbeek, J (2010). Improving web image search results using query-relative classifiers. Computer Vision Pattern Recognition, pp 1094–1101
Wei Y, Zhao Y, Lu C, Wei S, Liu L, Zhu Z et al (2017) Cross-modal retrieval with cnn visual features: a new baseline. IEEE Transactions on Cybernetics 47(2):449–460
Acknowledgements
This paper is supported by the Natural Science Foundation of China (71772107, 62072288), the Natural Science Foundation of Shandong Province of China (ZR2020MF044, ZR202102230289, ZR2019MF003, ZR2021MF104), Shandong Education Quality Improvement Plan for Postgraduate (2021), the SDUST Research Fund, Humanity and Social Science Fund of the Ministry of Education under Grant 20YJAZH078 and 20YJAZH127.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Liu, Y., Ji, S., Fu, Q. et al. Latent semantic-enhanced discrete hashing for cross-modal retrieval. Appl Intell 52, 16004–16020 (2022). https://doi.org/10.1007/s10489-021-03143-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-03143-2