A novel cross-modal hashing algorithm based on multimodal deep learning

一种新的基于多模态深度学习的跨模态哈希算法

Research Paper
Published: 21 March 2017

Volume 60, article number 092104, (2017)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Wen Qu¹,
Daling Wang^1,2,
Shi Feng^1,2,
Yifei Zhang^1,2 &
…
Ge Yu^1,2

411 Accesses
26 Citations
Explore all metrics

Abstract

With the growing popularity of multimodal data on the Web, cross-modal retrieval on large-scale multimedia databases has become an important research topic. Cross-modal retrieval methods based on hashing assume that there is a latent space shared by multimodal features. To model the relationship among heterogeneous data, most existing methods embed the data into a joint abstraction space by linear projections. However, these approaches are sensitive to noise in the data and are unable to make use of unlabeled data and multi-modal data with missing values in real-world applications. To address these challenges, we proposed a novel multimodal deep-learning-based hash (MDLH) algorithm. In particular, MDLH uses a deep neural network to encode heterogeneous features into a compact common representation and learns the hash functions based on the common representation. The parameters of the whole model are fine-tuned in a supervised training stage. Experiments on two standard datasets show that the method achieves more effective results than other methods in cross-modal retrieval.

创新点

随着网络上多模态数据的普及, 海量多媒体数据库上的跨模态检索成为研究的热点。跨模态检索方法假设多个模态的数据特征之间存在一个共享的潜在特征空间。因此, 为了建立多模态数据之间的关联, 大部分已有方法通过线性映射将多模态数据分别映射到同一个共享特征空间。但是, 该类方法对于数据中的噪声比较敏感, 并且也无法使用现实场景中的无标记的数据或缺失模态的数据。针对该问题本文提出了一种新的基于多模态深度学习的哈希算法。该方法使用深度神经网络结构将异构特征映射为一个共同的压缩表示, 并在此表示的基础上学习哈希函数。整个模型的参数通过有监督的方式进行训练。在两个标准数据集上的实验结果显示本文的方法能够有效的完成跨模态检索的任务。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

A Novel Cross Modal Hashing Algorithm Based on Multi-modal Deep Learning

Chapter © 2015

Deep cross-modal hashing with fine-grained similarity

Article 19 October 2023

Multi-head Hashing with Orthogonal Decomposition for Cross-modal Retrieval

Chapter © 2024

References

Chen C, Zhu Q S, Lin L, et al. Web media semantic concept retrieval via tag removal and model fusion. ACM Trans Intel Syst Technol, 2013, 4: 478–488
Google Scholar
Leung C H C, Chan A W S, Milani A, et al. Intelligent social media indexing and sharing using an adaptive indexing search engine. ACM Trans Intel Syst Technol, 2012, 3: 338–343
Article Google Scholar
Zhang R M, Lin L, Zhang R, et al. Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification. IEEE Trans Imag Process, 2015, 24: 4766–4779
Article MathSciNet Google Scholar
Nie X S, Liu J, Sun J D, et al. Robust video hashing based on representative-dispersive frames. Sci China Inf Sci, 2013, 56: 068104
Article MathSciNet Google Scholar
Xiang S J, Yang J Q, Huang J W. Perceptual video hashing robust against geometric distortions. Sci China Inf Sci, 2012, 55: 1520–1527
Article MathSciNet Google Scholar
Datar M, Immorlica N, Indyk P, et al. Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of ACM Symposium on Computational Geometry, New York, 2004. 253–262
Google Scholar
Weiss Y, Torralba A, Fergus R. Spectral hashing. In: Proceedings of 22nd Annual Conference on Neural Information Processing Systems, Vancouver, 2008. 1753–1760
Google Scholar
Zhen Y, Yang D. A probabilistic model for multimodal hash function learning. In: Proceedings of the 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Beijing, 2012. 940–948
Google Scholar
Zhu X F, Huang Z, Shen H T, et al. Linear cross-modal hashing for efficient multimedia search. In: Proceedings of the 21st ACM International Conference on Multimedia, Barcelona, 2013. 143–152
Chapter Google Scholar
Yu Z, Wu F, Yang Y, et al. Discriminative coupled dictionary hashing for fast cross-media retrieval. In: Proceedings of the 37th Annual ACM SIGIR Conference, Gold Coast, 2014. 395–404
Google Scholar
Bronstein M, Bronstein A, Michel F, et al. Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. 3594–3601
Google Scholar
Kumar S, Udupa R. Learning hash functions for cross-view similarity search. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence, New York, 2011. 1360–1365
Google Scholar
Hu Y, Jin Z M, Ren H Y, et al. Iterative multi-view hashing for cross media indexing. In: Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, 2014. 527–536
Google Scholar
Song J K, Yang Y, Yang Y, et al. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, New York, 2013. 785–796
Google Scholar
Wu B T, Yang Q, Zheng W S, et al. Quantized correlation hashing for fast cross-modal search. In: Proceedings of International Joint Conference on Artificial Intelligence, Buenos Aires, 2015. 3946–3952
Google Scholar
Kang Y, Kim S, Choi S. Deep learning to hash with multiple representations. In: Proceedings of IEEE International Conference on Data Mining, Brusselsm, 2012. 930–935
Google Scholar
Wang D X, Cui P, Ou M D, et al. Deep multimodal hashing with orthogonal regularization. In: Proceedings of International Joint Conference on Artificial Intelligence, Buenos Aires, 2015. 2291–2297
Google Scholar
Wang Q F, Si L, Shen B. Learning to hash on partial multimodal data. In: Proceedings of International Joint Conference on Artificial Intelligence, Buenos Aires, 2015. 3904–3910
Google Scholar
Dahl G E, Yu D, Deng L, et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech, 2012, 20: 30–42
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton G. Imagenet classification with deep convolutional neural networks. In: Proceedings of Annual Conference on Neural Information Processing Systems, Lake Tahoe, 2012. 1106–1114
Google Scholar
Ngiam J, Khosla A, Kim M, et al. Multimodal deep learning. In: Proceedings of International Conference on Machine Learning, Washington, 2011. 689–696
Google Scholar
Srivastava N, Salakhutdinov R. Multimodal learning with deep Boltzmann machines. In: Proceedings of the 26th Annual Conference on Neural Information Processing Systems, Lake Tahoe, 2012. 2231–2239
Google Scholar
Sohn K, Shang W, Lee H. Improved multimodal deep learning with variation of information. In: Proceedings of the 28th Annual Conference on Neural Information Processing Systems, Montreal, 2014. 2141–2149
Google Scholar
Wu P C, Hoi S C, Xia H, et al. Online multimodal deep similarity learning with application to image retrieval. In: Proceedings of the 21st ACM International Conference on Multimedia, Barcelona, 2013. 153–162
Chapter Google Scholar
Wang W, Ooi B C, Yang X Y, et al. Effective multi-modal retrieval based on stacked autoencoders. In: Proceedings of 40th International Conference on Very Large Data Bases, Hangzhou, 2014. 649–660
Google Scholar
Feng F X, Wang X J, Li R F. Cross-modal retrieval with correspondence autoencoder. In: Proceedings of the 21st ACM International Conference on Multimedia, Orlando, 2014. 7–16
Google Scholar
Vincent P, Larochelle H, Lajoie I, et al. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res, 2010, 11: 3371–3408
MathSciNet MATH Google Scholar
Salakhutdinov R, Hinton G. Deep Boltzmann machines. In: Proceedings of 12th International Conference on Artificial Intelligence and Statistics, Florida, 2009. 448–455
Google Scholar
Hinton G, Salakhutdinov R. Reducing the dimensionality of data with neural networks. Science, 2006, 313: 504–507
Article MathSciNet MATH Google Scholar
Bengio Y, Lamblin P, Popovici D, et al. Greedy layer-wise training of deep networks. In: Proceedings of Annual Conference on Neural Information Processing Systems, Vancouver, 2006. 153–160
Google Scholar
Rumelhart D, Hinton G, Williams R. Neurocomputing: Foundations of Research. Cambridge: MIT Press, 1988
Google Scholar
Rasiwasia N, Pereira J, Coviello E, et al. A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM International Conference on Multimedia, New York, 2010. 251–260
Google Scholar
Blei D, Ng A, Jordan M. Latent dirichlet allocation. J Mach Learn Res, 2003, 3: 993–1022
MATH Google Scholar
Lowe D. Distinctive image features from scale-invariant key points. Int J Comput Vision, 2004, 60: 91–110
Article Google Scholar
Chua T S, Tang J H, Hong R C, et al. NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of ACM International Conference on Image and Video Retrieval, Santorini, 2009. 1–9
Chapter Google Scholar
Zhou J, Ding G G, Guo Y C. Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th Annual International ACMSIGIR Conference, Gold Coast, 2014. 415–424
Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant Nos. 61402091, 61370074), and Fundamental Research Funds for the Central Universities of China (Grant No. N140404012).

Author information

Authors and Affiliations

School of Information Science and Engineering, Northeastern University, Shenyang, 110819, China
Wen Qu, Daling Wang, Shi Feng, Yifei Zhang & Ge Yu
Key Laboratory of Medical Image Computing, Shenyang, 110819, China
Daling Wang, Shi Feng, Yifei Zhang & Ge Yu

Authors

Wen Qu
View author publications
You can also search for this author inPubMed Google Scholar
Daling Wang
View author publications
You can also search for this author inPubMed Google Scholar
Shi Feng
View author publications
You can also search for this author inPubMed Google Scholar
Yifei Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Ge Yu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Ge Yu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qu, W., Wang, D., Feng, S. et al. A novel cross-modal hashing algorithm based on multimodal deep learning. Sci. China Inf. Sci. 60, 092104 (2017). https://doi.org/10.1007/s11432-015-0902-2

Download citation

Received: 29 April 2016
Accepted: 15 August 2016
Published: 21 March 2017
DOI: https://doi.org/10.1007/s11432-015-0902-2

Keywords

关键词

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions