skip to main content
10.1145/3132847.3133092acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Common-Specific Multimodal Learning for Deep Belief Network

Published: 06 November 2017 Publication History

Abstract

Multimodal Deep Belief Network has been widely used to extract representations for multimodal data by fusing the high-level features of each data modality into common representations. Such straightforward fusion strategy can benefit the classification and information retrieval tasks. However, it may introduce noise in case the high-level features are not naturally common hence non-fusable for different modalities. Intuitively, each modality may have its own specific features and corresponding representation capabilities thus should not be simply fused. Therefore, it is more reasonable to fuse only the common features and represent the multimodal data by both the fused features and the modality-specific features. To distinguish common features from modal-specific features is a challenging task for traditional DBN models where all features are crudely mixed. This paper proposes the Common-Specific Multimodal Deep Belief Network (CSDBN) to solve the problem. CS-DBN automatically separates common features from modal-specific features and fuses only the common ones for data representation. Experimental results demonstrate the superiority of CS-DBN for classification tasks compared with the baseline approaches.

References

[1]
Geoffrey E Hinton. 2002. Training products of experts by minimizing contrastive divergence. Neural computation, Vol. 14, 8 (2002), 1771--1800.
[2]
Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural computation, Vol. 18, 7 (2006), 1527--1554.
[3]
Mark J Huiskes and Michael S Lew. 2008. The MIR Flickr retrieval evaluation. In Proceedings of the 1st ACM international conference on Multimedia information retrieval. ACM, 39--43.
[4]
Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3128--3137.
[5]
Douwe Kiela and Léon Bottou. 2014. Learning Image Embeddings using Convolutional Neural Networks for Improved Multi-Modal Semantics. In EMNLP. Citeseer, 36--45.
[6]
Ryan Kiros, Ruslan Salakhutdinov, and Richard S Zemel. 2014. Multimodal Neural Language Models. In ICML, Vol. Vol. 14. 595--603.
[7]
Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, and Alan Yuille. 2014. Deep captioning with multimodal recurrent neural networks (m-rnn). arXiv preprint arXiv:1412.6632 (2014).
[8]
Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y Ng. 2011. Multimodal deep learning. In Proceedings of the 28th international conference on machine learning (ICML-11). 689--696.
[9]
Ruslan Salakhutdinov and Geoffrey E Hinton. 2009. Deep Boltzmann Machines. In AISTATS, Vol. Vol. 1. 3.
[10]
Kihyuk Sohn, Guanyu Zhou, Chansoo Lee, and Honglak Lee. 2013. Learning and selecting features jointly with point-wise gated $B$ oltzmann machines Proceedings of The 30th International Conference on Machine Learning. 217--225.
[11]
Nitish Srivastava and Ruslan Salakhutdinov. 2012 a. Learning representations for multimodal data with deep belief nets International Conference on Machine Learning Workshop.
[12]
Nitish Srivastava and Ruslan Salakhutdinov. 2014. Multimodal Learning with Deep Boltzmann Machines. Journal of Machine Learning Research Vol. 15 (2014), 2949--2980. http://jmlr.org/papers/v15/srivastava14b.html
[13]
Nitish Srivastava and Ruslan R Salakhutdinov. 2012 b. Multimodal learning with deep boltzmann machines. Advances in neural information processing systems. 2222--2230.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
November 2017
2604 pages
ISBN:9781450349185
DOI:10.1145/3132847
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep belief network
  2. multimodal data
  3. representation learning

Qualifiers

  • Short-paper

Funding Sources

  • National Basic Research Program of China
  • Tsinghua-CISCO Joint Laboratory Project
  • Key Technology R&D Program of Shenyang

Conference

CIKM '17
Sponsor:

Acceptance Rates

CIKM '17 Paper Acceptance Rate 171 of 855 submissions, 20%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 135
    Total Downloads
  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media