Skip to main content

Unsupervised Concept Learning in Text Subspace for Cross-Media Retrieval

  • Conference paper
  • First Online:
Advances in Multimedia Information Processing – PCM 2017 (PCM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10735))

Included in the following conference series:

Abstract

Subspace (i.e. image, text or latent subspace) learning is one of the essential parts in cross-media retrieval. And most of the existing methods deal with mapping different modalities to the latent subspace pre-defined by category labels. However, the labels need a lot of manual annotation, and the label concerned subspace may not be exact enough to represent the semantic information. In this paper, we propose a novel unsupervised concept learning approach in text subspace for cross-media retrieval, which can map images and texts to a conceptual text subspace via the neural networks trained by self-learned concept labels, therefore the well-established text subspace is more reasonable and practicable than pre-defined latent subspace. Experiments demonstrate that our proposed method not only outperforms the state-of-the-art unsupervised methods but achieves better performance than several supervised methods on two benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 155.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://en.wikipedia.org/wiki/Supervised_learning.

  2. 2.

    https://en.wikipedia.org/wiki/Unsupervised_learning.

  3. 3.

    http://vision.cs.uiuc.edu/pascal-sentences/.

  4. 4.

    http://www.svcl.ucsd.edu/projects/crossmodal/.

References

  1. Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (2007)

    Google Scholar 

  2. Costa, P.J., Coviello, E., Doyle, G., Rasiwasia, N., Lanckriet, G.R., Levy, R., Vasconcelos, N.: On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 521–535 (2014)

    Article  Google Scholar 

  3. Dong, J., Li, X., Snoek, C.G.M.: Word2VisualVec: Cross-media retrieval by visual feature prediction. arXiv (2016)

    Google Scholar 

  4. Fan, M., Wang, W., Wang, R.: Coupled feature mapping and correlation mining for cross-media retrieval. In: ICME Workshop (2016)

    Google Scholar 

  5. Feng, F., Wang, X., Li, R.: Cross-modal retrieval with correspondence Autoencoder. In: ACM MM, pp. 7–16 (2014)

    Google Scholar 

  6. Habibian, A., Mensink, T., Snoek, C.G.M.: Discovering semantic vocabularies for cross-media retrieval. In: ACM ICMR, pp. 131–138 (2015)

    Google Scholar 

  7. Han, L., Wang, W., Fan, M., Wang, R.: Cross-modality matching based on Fisher Vector with neural word embeddings and deep image features. In: ICASSP (2017)

    Google Scholar 

  8. Liang, J., Li, Z., Cao, D., He, R.: Self-paced cross-modal subspace matching. In: ACM SIGIR, pp. 569–578 (2016)

    Google Scholar 

  9. Marneffe, M.C.D., Maccartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure parses. In: LREC (2006)

    Google Scholar 

  10. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: CoRR (2013)

    Google Scholar 

  11. Peng, Y., Huang, X., Qi, J.: Cross-media shared representation by hierarchical learning with multiple deep networks. In: IJCAI (2016)

    Google Scholar 

  12. Rashtchian, C., Young, P., Hodosh, M., Hockenmaier, J.: Collecting image annotations using Amazon’s mechanical Turk. In: NAACL Workshop (2010)

    Google Scholar 

  13. Rasiwasia, N., Costa Pereira, J., Coviello, E., et al.: A new approach to cross-modal multimedia retrieval. In: ACM MM (2010)

    Google Scholar 

  14. Rosipal, R., Krämer, N.: Overview and recent advances in partial least squares. In: Saunders, C., Grobelnik, M., Gunn, S., Shawe-Taylor, J. (eds.) SLSFS 2005. LNCS, vol. 3940, pp. 34–51. Springer, Heidelberg (2006). https://doi.org/10.1007/11752790_2

    Chapter  Google Scholar 

  15. Russakovsky, O., et al.: ImageNet Large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  16. Srivastava, N., Salakhutdinov, R.: Learning representations for multimodal data with deep belief nets. In: ICML Workshop (2012)

    Google Scholar 

  17. Sun, C., Gan, C., Nevatia, R.: Automatic concept discovery from parallel text and visual corpora. In: IEEE International Conference on Computer Vision ICCV (2015)

    Google Scholar 

  18. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Computer Vision and Pattern Recognition CVPR (2015)

    Google Scholar 

  19. Tenenbaum, J.B., Freeman, W.T.: Separating style and content with bilinear models. Neural Comput. 12(6), 1247–1283 (2000)

    Article  Google Scholar 

  20. Wang, C., Yang, H., Meinel, C.: Deep semantic mapping for cross-modal retrieval. In: IEEE 27th International Conference on Tools with Artificial Intelligence ICTAI, pp. 1082–3409 (2015)

    Google Scholar 

  21. Wang, K., He, R., Wang, L., Wang, W., Tan, T.: Joint feature selection and subspace learning for cross-modal retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2010–2023 (2016)

    Article  Google Scholar 

  22. Wang, K., He, R., Wang, W., Wang, L., Tan, T.: Learning coupled feature spaces for cross-modal matching. In: IEEE International Conference on Computer Vision ICCV (2013)

    Google Scholar 

  23. Wei, Y., Zhao, Y., Lu, C., Wei, S.: Cross-modal retrieval with CNN visual features: a new baseline. IEEE Trans. Cybern. 47(2), 449–460 (2016)

    Google Scholar 

  24. Yan, F., Mikolajczyk, K.: Deep correlation for matching images and text. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition CVPR (2015)

    Google Scholar 

  25. Zelnik-Manor, L., Perona, P.: Self-tuning spectral clustering. In: Proceedings of the 17th International Conference on Neural Information Processing Systems NIPS, pp. 1601–1608 (2004)

    Google Scholar 

Download references

Acknowledgement

This project was supported by Shenzhen Peacock Plan (20130408-183003656), Shenzhen Key Laboratory for Intelligent Multimedia and Virtual Reality (ZDSYS201703031405467), and Guangdong Science and Technology Project (2014B010117007).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenmin Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fan, M., Wang, W., Dong, P., Wang, R., Li, G. (2018). Unsupervised Concept Learning in Text Subspace for Cross-Media Retrieval. In: Zeng, B., Huang, Q., El Saddik, A., Li, H., Jiang, S., Fan, X. (eds) Advances in Multimedia Information Processing – PCM 2017. PCM 2017. Lecture Notes in Computer Science(), vol 10735. Springer, Cham. https://doi.org/10.1007/978-3-319-77380-3_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-77380-3_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-77379-7

  • Online ISBN: 978-3-319-77380-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics