Learning Joint Multimodal Representation with Adversarial Attention Networks

Published: 15 October 2018 Publication History


Recently, learning a joint representation for the multimodal data (e.g., containing both visual content and text description) has attracted extensive research interests. Usually, the features of different modalities are correlational and compositive, and thus a joint representation capturing the correlation is more effective than a subset of the features. Most of existing multimodal representation learning methods suffer from lack of additional constraints to enhance the robustness of the learned representations. In this paper, a novel Adversarial Attention Networks (AAN) is proposed to incorporate both the attention mechanism and the adversarial networks for effective and robust multimodal representation learning. Specifically, a visual-semantic attention model with siamese learning strategy is proposed to encode the fine-grained correlation between visual and textual modalities. Meanwhile, the adversarial learning model is employed to regularize the generated representation by matching the posterior distribution of the representation to the given priors. Then, the two modules are incorporated into a integrated learning framework to learn the joint multimodal representation. Experimental results in two tasks, i.e., multi-label classification and tag recommendation, show that the proposed model outperforms state-of-the-art representation learning methods.


MM '18: Proceedings of the 26th ACM international conference on Multimedia
October 2018
2167 pages
Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2018


Author Tags

  1. adversarial networks
  2. attention model
  3. multimodal
  4. representation learning
  5. siamese learning


Funding Sources

  • State Key Laboratory of Software Development Environment
  • National Natural Science Foundation of China
  • Beijing Natural Science Foundation of China


MM '18
MM '18: ACM Multimedia Conference
October 22 - 26, 2018
Seoul, Republic of Korea

Acceptance Rates

MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%


