Skip to main content

Dual Subspaces with Adversarial Learning for Cross-Modal Retrieval

  • Conference paper
  • First Online:
Advances in Multimedia Information Processing – PCM 2018 (PCM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11164))

Included in the following conference series:

  • 3647 Accesses

Abstract

Learning an effective subspace to calculate the correlation of items from different modalities is the core of cross-modal retrieval task, such as image, text or latent subspace. However, data in different modalities have imbalance and complementary relationships. Image contains abundant spatial information while text includes more background and context details. In this paper, we propose a model with dual parallel subspaces (visual and textual subspace) to better preserve modality-specific information. Triplet constraints are employed to minimize the semantic gap between items from different modalities with the same concept, while maximize that of concept-different image-text pair in corresponding subspace. Then we novelly combine adversarial learning with dual subspaces, which act as an interplay of two agents. The first agent, dual subspaces with similarity merging and concept prediction, aims to narrow the difference of data distributions from different modalities under the premise of concept invariance to fool the other agent, modality discriminator, which tries to distinguish image from text accurately. Extensive experiments on Wikipedia dataset and NUS-WIDE-10k dataset verify the effectiveness of our proposed model for cross-modal retrieval tasks, which outperforms the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm.

  2. 2.

    http://www.svcl.ucsd.edu/projects/crossmodal/.

References

  1. Ballan, L., Uricchio, T., Seidenari, L., Bimbo, A.D.: A cross-media model for automatic image annotation. In: International Conference on Multimedia Retrieval, p. 73 (2014)

    Google Scholar 

  2. Chen, Y., Wang, L., Wang, W., Zhang, Z.: Continuum regression for cross-modal multimedia retrieval. In: IEEE International Conference on Image Processing, pp. 1949–1952 (2013)

    Google Scholar 

  3. Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.T.: NUS-WIDE: a real-world web image database from National University of Singapore. In: Proceedings of ACM Conference on Image and Video Retrieval (CIVR 2009), Santorini, Greece, 8–10 July 2009

    Google Scholar 

  4. Dong, J., Li, X., Snoek, C.G.M.: Word2VisualVec: cross-media retrieval by visual feature prediction (2016)

    Google Scholar 

  5. Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence autoencoder, pp. 7–16 (2014)

    Google Scholar 

  6. Goodfellow, I.J., et al.: Generative adversarial nets. In: International Conference on Neural Information Processing Systems, pp. 2672–2680 (2014)

    Google Scholar 

  7. Jacobs, D.W., Daume, H., Kumar, A., Sharma, A.: Generalized multiview analysis: a discriminative latent space. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2160–2167 (2012)

    Google Scholar 

  8. Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network, pp. 105–114 (2016)

    Google Scholar 

  9. Norouzi, M., et al.: Zero-shot learning by convex combination of semantic embeddings. Eprint Arxiv (2013)

    Google Scholar 

  10. Peng, Y., Qi, J., Yuan, Y.: CM-GANs: cross-modal generative adversarial networks for common representation learning (2017)

    Google Scholar 

  11. Peng, Y., Qi, J., Yuan, Y.: Modality-specific cross-modal similarity measurement with recurrent attention network (2017)

    Google Scholar 

  12. Pereira, J.C., et al.: On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 521–35 (2014)

    Article  Google Scholar 

  13. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. Comput. Sci. (2015)

    Google Scholar 

  14. Rasiwasia, N., et al.: A new approach to cross-modal multimedia retrieval. In: International Conference on Multimedia, pp. 251–260 (2010)

    Google Scholar 

  15. Reed, S., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H.: Learning what and where to draw. New Republic (2016)

    Google Scholar 

  16. Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis, pp. 1060–1069 (2016)

    Google Scholar 

  17. Srivastava, N., Salakhutdinov, R.: Learning representations for multimodal data with deep belief nets. In: ICML Workshop (2012)

    Google Scholar 

  18. Wang, B., Yang, Y., Xu, X., Hanjalic, A., Shen, H.T.: Adversarial cross-modal retrieval. In: ACM on Multimedia Conference, pp. 154–162 (2017)

    Google Scholar 

  19. Zhai, X., Peng, Y., Xiao, J.: Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans. Circuits Syst. Video Technol. 24(6), 965–978 (2014)

    Article  Google Scholar 

  20. Zhang, H., Xu, T., Li, H.: Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks, pp. 5908–5916 (2016)

    Google Scholar 

Download references

Acknowledgement

This work is supported by Shenzhen Peacock Plan (20130408-183003656), Shenzhen Key Laboratory for Intelligent Multimedia and Virtual Reality (ZDSYS201703031405467) and National Natural Science Foundation of China (NSFC, No.U1613209).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenmin Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xia, Y., Wang, W., Han, L. (2018). Dual Subspaces with Adversarial Learning for Cross-Modal Retrieval. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11164. Springer, Cham. https://doi.org/10.1007/978-3-030-00776-8_60

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00776-8_60

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00775-1

  • Online ISBN: 978-3-030-00776-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics