Skip to main content
Log in

A novel cross-modal hashing algorithm based on multimodal deep learning

一种新的基于多模态深度学习的跨模态哈希算法

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

With the growing popularity of multimodal data on the Web, cross-modal retrieval on large-scale multimedia databases has become an important research topic. Cross-modal retrieval methods based on hashing assume that there is a latent space shared by multimodal features. To model the relationship among heterogeneous data, most existing methods embed the data into a joint abstraction space by linear projections. However, these approaches are sensitive to noise in the data and are unable to make use of unlabeled data and multi-modal data with missing values in real-world applications. To address these challenges, we proposed a novel multimodal deep-learning-based hash (MDLH) algorithm. In particular, MDLH uses a deep neural network to encode heterogeneous features into a compact common representation and learns the hash functions based on the common representation. The parameters of the whole model are fine-tuned in a supervised training stage. Experiments on two standard datasets show that the method achieves more effective results than other methods in cross-modal retrieval.

创新点

随着网络上多模态数据的普及, 海量多媒体数据库上的跨模态检索成为研究的热点。跨模态检索方法假设多个模态的数据特征之间存在一个共享的潜在特征空间。因此, 为了建立多模态数据之间的关联, 大部分已有方法通过线性映射将多模态数据分别映射到同一个共享特征空间。但是, 该类方法对于数据中的噪声比较敏感, 并且也无法使用现实场景中的无标记的数据或缺失模态的数据。针对该问题本文提出了一种新的基于多模态深度学习的哈希算法。该方法使用深度神经网络结构将异构特征映射为一个共同的压缩表示, 并在此表示的基础上学习哈希函数。整个模型的参数通过有监督的方式进行训练。在两个标准数据集上的实验结果显示本文的方法能够有效的完成跨模态检索的任务。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Chen C, Zhu Q S, Lin L, et al. Web media semantic concept retrieval via tag removal and model fusion. ACM Trans Intel Syst Technol, 2013, 4: 478–488

    Google Scholar 

  2. Leung C H C, Chan A W S, Milani A, et al. Intelligent social media indexing and sharing using an adaptive indexing search engine. ACM Trans Intel Syst Technol, 2012, 3: 338–343

    Article  Google Scholar 

  3. Zhang R M, Lin L, Zhang R, et al. Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification. IEEE Trans Imag Process, 2015, 24: 4766–4779

    Article  MathSciNet  Google Scholar 

  4. Nie X S, Liu J, Sun J D, et al. Robust video hashing based on representative-dispersive frames. Sci China Inf Sci, 2013, 56: 068104

    Article  MathSciNet  Google Scholar 

  5. Xiang S J, Yang J Q, Huang J W. Perceptual video hashing robust against geometric distortions. Sci China Inf Sci, 2012, 55: 1520–1527

    Article  MathSciNet  Google Scholar 

  6. Datar M, Immorlica N, Indyk P, et al. Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of ACM Symposium on Computational Geometry, New York, 2004. 253–262

    Google Scholar 

  7. Weiss Y, Torralba A, Fergus R. Spectral hashing. In: Proceedings of 22nd Annual Conference on Neural Information Processing Systems, Vancouver, 2008. 1753–1760

    Google Scholar 

  8. Zhen Y, Yang D. A probabilistic model for multimodal hash function learning. In: Proceedings of the 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Beijing, 2012. 940–948

    Google Scholar 

  9. Zhu X F, Huang Z, Shen H T, et al. Linear cross-modal hashing for efficient multimedia search. In: Proceedings of the 21st ACM International Conference on Multimedia, Barcelona, 2013. 143–152

    Chapter  Google Scholar 

  10. Yu Z, Wu F, Yang Y, et al. Discriminative coupled dictionary hashing for fast cross-media retrieval. In: Proceedings of the 37th Annual ACM SIGIR Conference, Gold Coast, 2014. 395–404

    Google Scholar 

  11. Bronstein M, Bronstein A, Michel F, et al. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. 3594–3601

    Google Scholar 

  12. Kumar S, Udupa R. Learning hash functions for cross-view similarity search. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence, New York, 2011. 1360–1365

    Google Scholar 

  13. Hu Y, Jin Z M, Ren H Y, et al. Iterative multi-view hashing for cross media indexing. In: Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, 2014. 527–536

    Google Scholar 

  14. Song J K, Yang Y, Yang Y, et al. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, New York, 2013. 785–796

    Google Scholar 

  15. Wu B T, Yang Q, Zheng W S, et al. Quantized correlation hashing for fast cross-modal search. In: Proceedings of International Joint Conference on Artificial Intelligence, Buenos Aires, 2015. 3946–3952

    Google Scholar 

  16. Kang Y, Kim S, Choi S. Deep learning to hash with multiple representations. In: Proceedings of IEEE International Conference on Data Mining, Brusselsm, 2012. 930–935

    Google Scholar 

  17. Wang D X, Cui P, Ou M D, et al. Deep multimodal hashing with orthogonal regularization. In: Proceedings of International Joint Conference on Artificial Intelligence, Buenos Aires, 2015. 2291–2297

    Google Scholar 

  18. Wang Q F, Si L, Shen B. Learning to hash on partial multimodal data. In: Proceedings of International Joint Conference on Artificial Intelligence, Buenos Aires, 2015. 3904–3910

    Google Scholar 

  19. Dahl G E, Yu D, Deng L, et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech, 2012, 20: 30–42

    Article  Google Scholar 

  20. Krizhevsky A, Sutskever I, Hinton G. Imagenet classification with deep convolutional neural networks. In: Proceedings of Annual Conference on Neural Information Processing Systems, Lake Tahoe, 2012. 1106–1114

    Google Scholar 

  21. Ngiam J, Khosla A, Kim M, et al. Multimodal deep learning. In: Proceedings of International Conference on Machine Learning, Washington, 2011. 689–696

    Google Scholar 

  22. Srivastava N, Salakhutdinov R. Multimodal learning with deep Boltzmann machines. In: Proceedings of the 26th Annual Conference on Neural Information Processing Systems, Lake Tahoe, 2012. 2231–2239

    Google Scholar 

  23. Sohn K, Shang W, Lee H. Improved multimodal deep learning with variation of information. In: Proceedings of the 28th Annual Conference on Neural Information Processing Systems, Montreal, 2014. 2141–2149

    Google Scholar 

  24. Wu P C, Hoi S C, Xia H, et al. Online multimodal deep similarity learning with application to image retrieval. In: Proceedings of the 21st ACM International Conference on Multimedia, Barcelona, 2013. 153–162

    Chapter  Google Scholar 

  25. Wang W, Ooi B C, Yang X Y, et al. Effective multi-modal retrieval based on stacked autoencoders. In: Proceedings of 40th International Conference on Very Large Data Bases, Hangzhou, 2014. 649–660

    Google Scholar 

  26. Feng F X, Wang X J, Li R F. Cross-modal retrieval with correspondence autoencoder. In: Proceedings of the 21st ACM International Conference on Multimedia, Orlando, 2014. 7–16

    Google Scholar 

  27. Vincent P, Larochelle H, Lajoie I, et al. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res, 2010, 11: 3371–3408

    MathSciNet  MATH  Google Scholar 

  28. Salakhutdinov R, Hinton G. Deep Boltzmann machines. In: Proceedings of 12th International Conference on Artificial Intelligence and Statistics, Florida, 2009. 448–455

    Google Scholar 

  29. Hinton G, Salakhutdinov R. Reducing the dimensionality of data with neural networks. Science, 2006, 313: 504–507

    Article  MathSciNet  MATH  Google Scholar 

  30. Bengio Y, Lamblin P, Popovici D, et al. Greedy layer-wise training of deep networks. In: Proceedings of Annual Conference on Neural Information Processing Systems, Vancouver, 2006. 153–160

    Google Scholar 

  31. Rumelhart D, Hinton G, Williams R. Neurocomputing: Foundations of Research. Cambridge: MIT Press, 1988

    Google Scholar 

  32. Rasiwasia N, Pereira J, Coviello E, et al. A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM International Conference on Multimedia, New York, 2010. 251–260

    Google Scholar 

  33. Blei D, Ng A, Jordan M. Latent dirichlet allocation. J Mach Learn Res, 2003, 3: 993–1022

    MATH  Google Scholar 

  34. Lowe D. Distinctive image features from scale-invariant key points. Int J Comput Vision, 2004, 60: 91–110

    Article  Google Scholar 

  35. Chua T S, Tang J H, Hong R C, et al. NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of ACM International Conference on Image and Video Retrieval, Santorini, 2009. 1–9

    Chapter  Google Scholar 

  36. Zhou J, Ding G G, Guo Y C. Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th Annual International ACMSIGIR Conference, Gold Coast, 2014. 415–424

    Google Scholar 

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant Nos. 61402091, 61370074), and Fundamental Research Funds for the Central Universities of China (Grant No. N140404012).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ge Yu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qu, W., Wang, D., Feng, S. et al. A novel cross-modal hashing algorithm based on multimodal deep learning. Sci. China Inf. Sci. 60, 092104 (2017). https://doi.org/10.1007/s11432-015-0902-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-015-0902-2

Keywords

关键词

Navigation