short-paper

Common-Specific Multimodal Learning for Deep Belief Network

Authors:

Changsheng Xiang,

Xiaoming JinAuthors Info & Claims

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

Pages 2387 - 2390

https://doi.org/10.1145/3132847.3133092

Published: 06 November 2017 Publication History

Get Access

Abstract

Multimodal Deep Belief Network has been widely used to extract representations for multimodal data by fusing the high-level features of each data modality into common representations. Such straightforward fusion strategy can benefit the classification and information retrieval tasks. However, it may introduce noise in case the high-level features are not naturally common hence non-fusable for different modalities. Intuitively, each modality may have its own specific features and corresponding representation capabilities thus should not be simply fused. Therefore, it is more reasonable to fuse only the common features and represent the multimodal data by both the fused features and the modality-specific features. To distinguish common features from modal-specific features is a challenging task for traditional DBN models where all features are crudely mixed. This paper proposes the Common-Specific Multimodal Deep Belief Network (CSDBN) to solve the problem. CS-DBN automatically separates common features from modal-specific features and fuses only the common ones for data representation. Experimental results demonstrate the superiority of CS-DBN for classification tasks compared with the baseline approaches.

References

[1]

Geoffrey E Hinton. 2002. Training products of experts by minimizing contrastive divergence. Neural computation, Vol. 14, 8 (2002), 1771--1800.

Digital Library

Google Scholar

[2]

Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural computation, Vol. 18, 7 (2006), 1527--1554.

Digital Library

Google Scholar

[3]

Mark J Huiskes and Michael S Lew. 2008. The MIR Flickr retrieval evaluation. In Proceedings of the 1st ACM international conference on Multimedia information retrieval. ACM, 39--43.

Digital Library

Google Scholar

[4]

Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3128--3137.

Google Scholar

[5]

Douwe Kiela and Léon Bottou. 2014. Learning Image Embeddings using Convolutional Neural Networks for Improved Multi-Modal Semantics. In EMNLP. Citeseer, 36--45.

Google Scholar

[6]

Ryan Kiros, Ruslan Salakhutdinov, and Richard S Zemel. 2014. Multimodal Neural Language Models. In ICML, Vol. Vol. 14. 595--603.

Digital Library

Google Scholar

[7]

Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, and Alan Yuille. 2014. Deep captioning with multimodal recurrent neural networks (m-rnn). arXiv preprint arXiv:1412.6632 (2014).

Google Scholar

[8]

Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y Ng. 2011. Multimodal deep learning. In Proceedings of the 28th international conference on machine learning (ICML-11). 689--696.

Digital Library

Google Scholar

[9]

Ruslan Salakhutdinov and Geoffrey E Hinton. 2009. Deep Boltzmann Machines. In AISTATS, Vol. Vol. 1. 3.

Google Scholar

[10]

Kihyuk Sohn, Guanyu Zhou, Chansoo Lee, and Honglak Lee. 2013. Learning and selecting features jointly with point-wise gated $B$ oltzmann machines Proceedings of The 30th International Conference on Machine Learning. 217--225.

Digital Library

Google Scholar

[11]

Nitish Srivastava and Ruslan Salakhutdinov. 2012 a. Learning representations for multimodal data with deep belief nets International Conference on Machine Learning Workshop.

Google Scholar

[12]

Nitish Srivastava and Ruslan Salakhutdinov. 2014. Multimodal Learning with Deep Boltzmann Machines. Journal of Machine Learning Research Vol. 15 (2014), 2949--2980. http://jmlr.org/papers/v15/srivastava14b.html

Digital Library

Google Scholar

[13]

Nitish Srivastava and Ruslan R Salakhutdinov. 2012 b. Multimodal learning with deep boltzmann machines. Advances in neural information processing systems. 2222--2230.

Digital Library

Google Scholar

Index Terms

Common-Specific Multimodal Learning for Deep Belief Network
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
    1. Machine learning approaches
      1. Learning latent representations
        Deep belief networks
      2. Neural networks

Recommendations

Learning Modality-Specific and -Agnostic Representations for Asynchronous Multimodal Language Sequences
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Understanding human behaviors and intents from videos is a challenging task. Video flows usually involve time-series data from different modalities, such as natural language, facial gestures, and acoustic information. Due to the variable receiving ...
Multi-scale multimodal deep learning framework for Alzheimer's disease diagnosis
Abstract
Multimodal neuroimaging data, including magnetic resonance imaging (MRI) and positron emission tomography (PET), provides complementary information about the brain that can aid in Alzheimer's disease (AD) diagnosis. However, most existing deep ...
Highlights
- A multi-scale multimodal deep learning approach is introduced to use MRIs and PET scans for AD diagnosis.
- A multimodal scale fusion is proposed with multi-head self- and cross-attention to capture intra- and inter-modality interactions,...
Text-Oriented Modality Reinforcement Network for Multimodal Sentiment Analysis from Unaligned Multimodal Sequences
Artificial Intelligence
Abstract
Multimodal Sentiment Analysis (MSA) aims to mine sentiment information from text, visual, and acoustic modalities. Previous works have focused on representation learning and feature fusion strategies. However, most of these efforts ignored the ...

Comments

Information & Contributors

Information

Published In

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

November 2017

2604 pages

ISBN:9781450349185

DOI:10.1145/3132847

General Chairs:
Ee-Peng Lim
Singapore Management University, Singapore
,
Marianne Winslett
University of Illinois at Urbana-Champaign, USA, and Advanced Digital Sciences Center, Singapore
,
Program Chairs:
Mark Sanderson
RMIT, Australia
,
Ada Fu
Chinese University of Hong Kong, Hong Kong
,
Jimeng Sun
Georgia Tech, USA
,
Shane Culpepper
RMIT, Australia
,
Eric Lo
Chinese University of Hong Kong, Hong Kong
,
Joyce Ho
Emory University, USA
,
Debora Donato
Mix Tech, Inc., USA
,
Rakesh Agrawal
Data Insights Laboratories, USA
,
Yu Zheng
Microsoft Research Asia, China
,
Carlos Castillo
Qatar Computing Research Institute, Qatar
,
Aixin Sun
Nanyang Technological University, Singapore
,
Vincent S. Tseng
National Cheng Kung University, Taiwan
,
Chenliang Li
Wuhan University, China

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

National Basic Research Program of China
Tsinghua-CISCO Joint Laboratory Project
Key Technology R&D Program of Shenyang

Conference

CIKM '17

Sponsor:

CIKM '17: ACM Conference on Information and Knowledge Management

November 6 - 10, 2017

Singapore, Singapore

Acceptance Rates

CIKM '17 Paper Acceptance Rate 171 of 855 submissions, 20%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
135
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Index Terms

Recommendations

Learning Modality-Specific and -Agnostic Representations for Asynchronous Multimodal Language Sequences

Multi-scale multimodal deep learning framework for Alzheimer's disease diagnosis

Text-Oriented Modality Reinforcement Network for Multimodal Sentiment Analysis from Unaligned Multimodal Sequences

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations