Skip to main content

Concept-Based Multimodal Learning for Topic Generation

  • Conference paper
MultiMedia Modeling (MMM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8935))

Included in the following conference series:

Abstract

In this paper, we propose a concept-based multimodal learning model (CMLM) for generating document topic through modeling textual and visual data. Our model considers cross-modal concept similarity and unlabeled image concept, it is capable of processing document which has modality missing. The model can extract semantic concepts from unlabeled image and combine with text modality to generate document topics. Our comparison experiments on news document topic generation shows, in multimodal scenario, CMLM can generate more representative topics than latent dirichet allocation (LDA) based topic for representing given document.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blei, D.M., Lafferty, J.D.: A correlated topic model of science. The Annals of Applied Statistics, 17–35 (2007)

    Google Scholar 

  2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  3. Clinchant, S., Ah-Pine, J., Csurka, G.: Semantic combination of textual and visual information in multimedia retrieval. In: Proceedings of the 1st ACM International Conference on Multimedia Retrieval, p. 44. ACM (2011)

    Google Scholar 

  4. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE (2009)

    Google Scholar 

  5. Fan, J., Elmagarmid, A.K., Zhu, X., Aref, W.G., Wu, L.: Classview: hierarchical video shot classification, indexing, and accessing. IEEE Transactions on Multimedia 6(1), 70–86 (2004)

    Article  Google Scholar 

  6. Feng, Y., Lapata, M.: Topic models for image annotation and text illustration. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 831–839. Association for Computational Linguistics (2010)

    Google Scholar 

  7. He, X., Ma, W.-Y., Zhang, H.-J.: Learning an image manifold for retrieval. In: Proceedings of the 12th Annual ACM International Conference on Multimedia, pp. 17–23. ACM (2004)

    Google Scholar 

  8. Huang, A.: Similarity measures for text document clustering. In: Proceedings of the Sixth New Zealand Computer Science Research Student Conference (NZCSRSC 2008), Christchurch, New Zealand, pp. 49–56 (2008)

    Google Scholar 

  9. Jia, Y., Salzmann, M., Darrell, T.: Learning cross-modality similarity for multinomial data. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2407–2414. IEEE (2011)

    Google Scholar 

  10. Putthividhy, D., Attias, H.T., Nagarajan, S.S.: Topic regression multi-modal latent dirichlet allocation for image annotation. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3408–3415. IEEE (2010)

    Google Scholar 

  11. Rasiwasia, N., Pereira, J.C., Coviello, E., Doyle, G., Lanckriet, G.R., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: Proceedings of the International Conference on Multimedia, pp. 251–260. ACM (2010)

    Google Scholar 

  12. Vedaldi, A., Fulkerson, B.: Vlfeat: An open and portable library of computer vision algorithms. In: Proceedings of the International Conference on Multimedia, pp. 1469–1472. ACM (2010)

    Google Scholar 

  13. Yu, J., Cong, Y., Qin, Z., Wan, T.: Cross-modal topic correlations for multimedia retrieval. In: 2012 21st International Conference on Pattern Recognition (ICPR), pp. 246–249. IEEE (2012)

    Google Scholar 

  14. Zhai, X., Peng, Y., Xiao, J.: Cross-media retrieval by intra-media and inter-media correlation mining. Multimedia Systems 19(5), 395–406 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Wang, C., Yang, H., Che, X., Meinel, C. (2015). Concept-Based Multimodal Learning for Topic Generation. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds) MultiMedia Modeling. MMM 2015. Lecture Notes in Computer Science, vol 8935. Springer, Cham. https://doi.org/10.1007/978-3-319-14445-0_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14445-0_33

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14444-3

  • Online ISBN: 978-3-319-14445-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics