Skip to main content
Log in

Learning latent hash codes with discriminative structure preserving for cross-modal retrieval

  • Short paper
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Due to the low storage cost and computational efficiency, hashing approaches have drawn considerable interest and gained great success in multimodal retrieval. However, most existing works study the local geometric structure in the original space, which suffers from intra- and inter-modality ambiguity, resulting in low discriminative hash codes. To address this issue, we propose a novel cross-modal hashing approach by taking inter- and intra-modality structure preserving into consideration, dubbed discriminative structure preserving hashing (DSPH). Specifically, DSPH explores the intra- and inter-modality in the latent structure of the constructed common space. In addition, the local geometric consistency is improved by a supervised shrinking scheme. DSPH learns the hash codes and latent features based on factorization coding scheme. The objective function includes common latent subspace learning and inter- & intra-modality structure embedding. We devise an alternative optimization scheme, where the hash codes are solved by a bitwise scheme, and the large quantization error can be avoided. Owing to the merit of DSPH, more discriminative hash codes can be generated. The extensive experimental results on several widely used databases demonstrate that the proposed algorithm outperforms several state-of-art cross-media retrieval methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

References

  1. Asuncion A, Newman D (2007) UCI machine learning repository

  2. Cao Y, Long M, Wang J, Yu PS (2016) Correlation hashing network for efficient cross-modal retrieval. arXiv preprint arXiv:1602.06697

  3. Chen Y, Lai Z, Ding Y, Lin K, Wong WK (2019) Deep supervised hashing with anchor graph. In: Proceedings of the IEEE international conference on computer vision, pp 9796–9804

  4. Choraś RS, Andrysiak T, Choraś M (2007) Integrated color, texture and shape information for content-based image retrieval. Pattern Anal Appl 10(4):333–343

    Article  MathSciNet  Google Scholar 

  5. Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) Nus-wide: a real-world web image database from national university of Singapore. In: Proceedings of the ACM international conference on image and video retrieval, pp 1–9

  6. Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the twentieth annual symposium on Computational geometry, pp 253–262

  7. Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2075–2082

  8. Fang X, Teng S, Lai Z, He Z, Xie S, Wong WK (2017) Robust latent subspace learning for image classification. IEEE Trans Neural Netw Learn Syst 29(6):2502–2515

    Article  MathSciNet  Google Scholar 

  9. Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vision 106(2):210–233

    Article  Google Scholar 

  10. He R, Zhang M, Wang L, Ji Y, Yin Q (2015) Cross-modal subspace learning via pairwise constraints. IEEE Trans Image Process 24(12):5543–5556

    Article  MathSciNet  Google Scholar 

  11. Hui K, Wang C (2008) Clustering-based locally linear embedding. In: 2008 19th international conference on pattern recognition, pp 1–4. IEEE

  12. Huiskes MJ, Lew MS (2008) The MIR flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on Multimedia information retrieval, pp 39–43

  13. Jin L, Li K, Li Z, Xiao F, Qi GJ, Tang J (2018) Deep semantic-preserving ordinal hashing for cross-modal similarity search. IEEE Trans Neural Netw Learn Syst 30(5):1429–1440

    Article  MathSciNet  Google Scholar 

  14. Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. In: Twenty-second international joint conference on artificial intelligence

  15. Lai Z, Chen Y, Wu J, Wong WK, Shen F (2018) Jointly sparse hashing for image retrieval. IEEE Trans Image Process 27(12):6147–6158

    Article  MathSciNet  Google Scholar 

  16. Li K, Qi GJ, Ye J, Hua KA (2016) Linear subspace ranking hashing for cross-modal retrieval. IEEE Trans Pattern Anal Mach Intell 39(9):1825–1838

    Article  Google Scholar 

  17. Li Z, Tang J (2016) Weakly supervised deep matrix factorization for social image understanding. IEEE Trans Image Process 26(1):276–288

    Article  MathSciNet  Google Scholar 

  18. Lin G, Shen C, Shi Q, Van den Hengel A, Suter D (2014) Fast supervised hashing with decision trees for high-dimensional data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1963–1970

  19. Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3864–3872

  20. Liu DC, Nocedal J (1989) On the limited memory bfgs method for large scale optimization. Math Program 45(1–3):503–528

    Article  MathSciNet  Google Scholar 

  21. Liu H, Ji R, Wu Y, Hua G (2016) Supervised matrix factorization for cross-modality hashing. arXiv preprint arXiv:1603.05572

  22. Liu H, Ji R, Wu Y, Huang F, Zhang B (2017) Cross-modality binary code learning via fusion similarity hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7380–7388

  23. Liu X, Hu Z, Ling H, Cheung Ym (2019) Mtfh: A matrix tri-factorization hashing framework for efficient cross-modal retrieval. In: IEEE transactions on pattern analysis and machine intelligence

  24. Masci J, Bronstein MM, Bronstein AM, Schmidhuber J (2013) Multimodal similarity-preserving hashing. IEEE Trans Pattern Anal Mach Intell 36(4):824–830

    Article  Google Scholar 

  25. Qin Z, Yu J, Cong Y, Wan T (2016) Topic correlation model for cross-modal multimedia information retrieval. Pattern Anal Appl 19(4):1007–1022

    Article  MathSciNet  Google Scholar 

  26. Rafailidis D, Crestani F (2016) Cluster-based joint matrix factorization hashing for cross-modal retrieval. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval, pp 781–784

  27. Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on multimedia, pp 251–260

  28. Rupnik J, Shawe-Taylor J (2010) Multi-view canonical correlation analysis. In: Conference on data mining and data warehouses (SiKDD 2010), pp 1–4

  29. Sharma A, Jacobs DW (2011) Bypassing synthesis: PLS for face recognition with pose, low-resolution and sketch. In: CVPR 2011, pp 593–600. IEEE

  30. Sharma A, Kumar A, Daume H, Jacobs DW (2012) Generalized multiview analysis: A discriminative latent space. In: 2012 IEEE conference on computer vision and pattern recognition, pp 2160–2167. IEEE

  31. Shen F, Shen C, Liu W, Tao Shen H (2015) Supervised discrete hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 37–45

  32. Shen GL, Wu XJ (2013) Content based image retrieval by combining color, texture and centrist

  33. Shen X, Shen F, Sun QS, Yang Y, Yuan YH, Shen HT (2016) Semi-paired discrete hashing: Learning latent hash codes for semi-paired cross-view retrieval. IEEE Trans Cybern 47(12):4275–4288

    Article  Google Scholar 

  34. Shu X, Wu XJ (2011) A novel contour descriptor for 2d shape matching and its application to image retrieval. Image Vis Comput 29(4):286–294

    Article  Google Scholar 

  35. Song J, Yang Y, Yang Y, Huang Z, Shen HT (2013) Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, pp 785–796

  36. Tang J, Li Z (2017) Weakly supervised multimodal hashing for scalable social image retrieval. IEEE Trans Circuits Syst Video Technol 28(10):2730–2741

    Article  Google Scholar 

  37. Tang J, Li Z, Wang M, Zhao R (2015) Neighborhood discriminant hashing for large-scale image retrieval. IEEE Trans Image Process 24(9):2827–2840

    Article  MathSciNet  Google Scholar 

  38. Tang J, Wang K, Shao L (2016) Supervised matrix factorization hashing for cross-modal retrieval. IEEE Trans Image Process 25(7):3157–3166

    Article  MathSciNet  Google Scholar 

  39. Wan M, Lai Z, Yang G, Yang Z, Zhang F, Zheng H (2017) Local graph embedding based on maximum margin criterion via fuzzy set. Fuzzy Sets Syst 318:120–131

    Article  MathSciNet  Google Scholar 

  40. Wan M, Li M, Yang G, Gai S, Jin Z (2014) Feature extraction using two-dimensional maximum embedding difference. Inf Sci 274:55–69

    Article  Google Scholar 

  41. Wang D, Gao X, Wang X, He L (2015) Semantic topic multimodal hashing for cross-media retrieval. In: Twenty-fourth international joint conference on artificial intelligence

  42. Wang Y, Lin X, Wu L, Zhang W, Zhang Q (2015) Lbmch: Learning bridging mapping for cross-modal hashing. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, pp 999–1002

  43. Weiss Y, Torralba A, Fergus R (2009) Spectral hashing. In: Advances in neural information processing systems, pp 1753–1760

  44. Wu B, Yang Q, Zheng WS, Wang Y, Wang J (2015) Quantized correlation hashing for fast cross-modal search. In: Twenty-fourth international joint conference on artificial intelligence

  45. Wu F, Yu Z, Yang Y, Tang S, Zhang Y, Zhuang Y (2013) Sparse multi-modal hashing. IEEE Trans Multimedia 16(2):427–439

    Article  Google Scholar 

  46. Yu J, Wu XJ, Kittler J (2018) Semi-supervised hashing for semi-paired cross-view retrieval. In: 2018 24th international conference on pattern recognition (ICPR), pp 958–963. IEEE

  47. Zhang D, Li WJ (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Twenty-Eighth AAAI conference on artificial intelligence

  48. Zhang L, Zhang Y, Hong R, Tian Q (2015) Full-space local topology extraction for cross-modal retrieval. IEEE Trans Image Process 24(7):2212–2224

    Article  MathSciNet  Google Scholar 

  49. Zhen Y, Yeung DY (2012) Co-regularized hashing for multimodal data. In: Advances in neural information processing systems, pp 1376–1384

  50. Zhong F, Min G, Leng Y, Ying Y (2018) Supervised intra-and inter-modality similarity preserving hashing for cross-modal retrieval. IEEE Access 6:27796–27808

    Article  Google Scholar 

  51. Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th international ACM SIGIR conference on research and development in information retrieval, pp 415–424

  52. Zhu X, Huang Z, Shen HT, Zhao X (2013) Linear cross-modal hashing for efficient multimedia search. In: Proceedings of the 21st ACM international conference on Multimedia, pp 143–152

Download references

Acknowledgements

This research was supported by the National Nature Science Foundation of China [Grant 61672265, U1836218] and the 111 Project of Chinese Ministry of Education under Grant B12018.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao-Jun Wu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, D., Wu, XJ. & Yu, J. Learning latent hash codes with discriminative structure preserving for cross-modal retrieval. Pattern Anal Applic 24, 283–297 (2021). https://doi.org/10.1007/s10044-020-00893-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-020-00893-6

Keywords

Navigation