Skip to main content
Log in

Latent semantic-enhanced discrete hashing for cross-modal retrieval

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Hashing methods have been proposed for the cross-modal retrieval tasks due to their flexibility and effectiveness. The main idea of cross-modal hashing is to embed heterogeneous multimedia data into common Hamming space. How to effectively exploit the modal semantic information and reduce optimization loss have been a challenging problem for existing cross-modal hashing methods. To address these issues, we propose a supervised cross-modal hashing method, called Latent Semantic-Enhanced discrete Hashing (LSEH). LSEH first leverages matrix factorization to obtain individual latent semantic representations of different modalities, and then applies correlation analysis and kernel discriminant analysis when projecting the latent semantic representations into the common Hamming space. Finally, the binary codes are directly generated with discrete optimization strategy. Experimental results on four benchmark datasets demonstrate that LSEH outperforms state-of-the-art cross-modal hashing methods in terms of retrieval accuracy, especially when dealing with image to text retrieval task, using shorter hash codes to associate images and texts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Peng Y, Huang X, Zhao Y (2018) An Overview of Cross-Media Retrieval: Concepts, Methodologies, Benchmarks, and Challenges. IEEE Transactions on Circuits and Systems for Video Technology 28(9):2372–2385

    Article  Google Scholar 

  2. M Müller, Arzt, A., Balke, S., Dorfer, M., & Widmer, G. (2019) Cross-modal music retrieval and applications: an overview of key methodologies. IEEE Signal Processing Magazine 36(1):52–62

  3. Liu, H, Feng, Y, Zhou, M, & Qiang, B (2020). Semantic ranking structure preserving for cross-modal retrieval. Applied Intelligence, 1-11

  4. Djenouri Y, Belhadi A, Fournier-Viger P, Lin CW (2018) Fast and effective cluster-based information retrieval using frequent closed itemsets. Information Sciences 453:154–167

    Article  MathSciNet  MATH  Google Scholar 

  5. Djenouri, Y, Belhadi, A, Djenouri, D, & Lin, CW (2021). Cluster-based information retrieval using pattern mining. Applied Intelligence, 51, 1888–1909

  6. Djenouri, Y, & Hjelmervik, J (2020). Hybrid Decomposition Convolution Neural Network and Vocabulary Forest for Image Retrieval. 25th International Conference on Pattern Recognition, 3064-3070

  7. Yu E, Sun J, Li J, Chang X, Han X, Hauptmann A (2018) Adaptive semi-supervised feature selection for cross-modal retrieval. IEEE Transactions on Multimedia 21(5):1276–1288

    Article  Google Scholar 

  8. Yan J, Zhang H, Sun J, Wang Q, Guo P, Meng L, Dong X (2018) Joint graph regularization based modality-dependent cross-media retrieval. Multimedia Tools and Applications 77(3):3009–3027

    Article  Google Scholar 

  9. Wang K, He R, Wang L, Wang W, Tan T (2016) Joint Feature Selection and Subspace Learning for Cross-Modal Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 38(10):2010–2023

    Article  Google Scholar 

  10. Wu, J, Xie, X, Nie, L, & Lin, Z (2021). Reconstruction regularized low-rank subspace learning for cross-modal retrieval. Pattern Recognit, 113

  11. Yao, T, Wang, G, Yan, L, Kong, X, Su, Q, Zhang, C, & Tian, Q (2019). Online Latent Semantic Hashing for Cross-media Retrieval. Pattern Recognition, pp, 1-11

  12. Liong, VE, Lu, J, & Tan, Y (2018). Cross-Modal Discrete Hashing. Pattern Recognition, pp, 114-129

  13. Ding, G, Guo, Y, & Zhou, J (2014). Collective Matrix Factorization Hashing for Multimodal Data. Computer Vision Pattern Recognition, pp, 2075-2082

  14. Kumar S, Udupa R (2011). Learning Hash Functions for CrossView Similarity Search. International Joint Conference on Artificial Intelligence. AAAI Pres, pp 1360–1365

  15. Zhang, D, & Li, W (2014). Large-scale supervised multimodal hashing with semantic correlation maximization. National Conference on Artificial Intelligence, pp, 2177-2183

  16. Zhou, J, Ding, G, & Guo, Y (2014). Latent semantic sparse hashing for cross-modal similarity search. International ACM SIGIR Conferenceon Research and Development in Information Retrieval, pp 415-424

  17. Ma D, Liang J, He R, Kong X (2017) Nonlinear Discrete Cross-Modal Hashing for Visual-Textual Data. IEEE MultiMedia 24(2):56–65

    Article  Google Scholar 

  18. Zhen, Y, & Yeung, D (2012). Co-Regularized Hashing for Multimodal Data. Neural Information Processing Systems, pp 1376-1384

  19. Liu, H, Ji, R, Wu, Y, Huang, F, & Zhang, B (2017). Cross-Modality Binary Code Learning via Fusion Similarity Hashing. Computer Vision Pattern Recognition, pp 7380–7388

  20. Lin, Z, Ding, G, Hu, M, & Wang, J (2015). Semantics-preserving hashing for cross-view retrieval. IEEE Conference on Computer Vision Pattern Recognition, pp 3864-3872

  21. Fang, Y, & Ren, Y (2020). Supervised discrete cross-modal hashing based on kernel discriminant analysis. Pattern Recognition, 98

  22. Wang D, Gao X, Wang X (2018) Label Consistent Matrix Factorization Hashing for Large-Scale CrossModal Similarity Search. IEEE Transactions on Pattern Analysis and Machine Intelligence 41(10):2466–2479

    Article  Google Scholar 

  23. Liu, H, Ji, R, Wu, Y, & Hua, G (2016). Supervised matrix factorization for cross-modality hashing. In Proceedings of the International Joint Conference on Artificial Intelligence, pp 1767–1773

  24. Tang J, Wang K, Shao L (2016) Supervised Matrix Factorization Hashing for Cross-Modal Retrieval. IEEE Transactions on Image Processing 25(7):3157–3166

    Article  MathSciNet  MATH  Google Scholar 

  25. Lu X, Zhu L, Cheng Z (2019) Efficient discrete latent semantic hashing for scalable cross-modal retrieval. Signal processing 154:217–231

    Article  Google Scholar 

  26. Fang, Y, Zhang, H, & Ren, Y (2019). Unsupervised cross-modal retrieval via Multi-modal graph regularized Smooth Matrix Factorization Hashing. Knowledge Based Systems, pp 69-80

  27. Zeng, H, Zhang, H, & Zhu, L (2019). Label consistent locally linear embedding based cross-modal hashing. Information Processing and Management

  28. Yao, T, Kong, X, Fu, H, & Tian, Q (2016). Semantic consistency hashing for cross-modal retrieval. Neurocomputing, pp 250-259

  29. Dong, F, Nie, X, Liu, X, Geng, L, & Wang, Q (2018). Cross-modal hashing based on category structure preserving. Journal of Visual Communication and Image Representation, pp 28-33

  30. Zheng, C, Zhu, L, Zhang, S, & Zhang, H (2020). Efficient parameter-free adaptive multi-modal hashing. IEEE Signal Processing Letters, PP(99), 1-1

  31. Jiang, QY, & Li, WJ (2017). Deep cross-modal hashing. Computer Vision Pattern Recognition, pp 3270-3278

  32. Zhong, F, Chen, Z, & Min, G (2018). Deep Discrete Cross-Modal Hashing for Cross-Media Retrieval. Pattern Recognition, pp 64-77

  33. Cai D, He X, Han J (2011) Speed up kernel discriminant analysis. Vldb Journal 20(1):21–33

    Article  Google Scholar 

  34. Rasiwasia, N, Pereira, JC, Coviello, E, Doyle, G, Lanckriet, GR, Levy, R, & Vasconcelos, N (2010). A new approach to cross-modal multimedia retrieval. Acm Multimedia, pp 251-260

  35. Russell BC, Torralba A, Murphy K, Freeman WT (2008) LabelMe: A Database and Web-Based Tool for Image Annotation. International Journal of Computer Vision 77(1):157–173

    Article  Google Scholar 

  36. Hwang SJ, Grauman K (2012) Reading between the lines: Object localization using implicit cues from image tags. Computer Vision Pattern Recognition 34(6):1145–1158

    Google Scholar 

  37. Krapac, J, Allan, M, & Verbeek, J (2010). Improving web image search results using query-relative classifiers. Computer Vision Pattern Recognition, pp 1094–1101

  38. Wei Y, Zhao Y, Lu C, Wei S, Liu L, Zhu Z et al (2017) Cross-modal retrieval with cnn visual features: a new baseline. IEEE Transactions on Cybernetics 47(2):449–460

    Google Scholar 

Download references

Acknowledgements

This paper is supported by the Natural Science Foundation of China (71772107, 62072288), the Natural Science Foundation of Shandong Province of China (ZR2020MF044, ZR202102230289, ZR2019MF003, ZR2021MF104), Shandong Education Quality Improvement Plan for Postgraduate (2021), the SDUST Research Fund, Humanity and Social Science Fund of the Ministry of Education under Grant 20YJAZH078 and 20YJAZH127.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Shujuan Ji or Maoguo Gong.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Ji, S., Fu, Q. et al. Latent semantic-enhanced discrete hashing for cross-modal retrieval. Appl Intell 52, 16004–16020 (2022). https://doi.org/10.1007/s10489-021-03143-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-03143-2

Keywords

Navigation