Skip to main content

Boosting Multimodal Semantic Understanding by Local Similarity Adaptation and Global Correlation Propagation

  • Conference paper
Advances in Multimedia Information Processing - PCM 2010 (PCM 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6297))

Included in the following conference series:

Abstract

An important trend in multimedia semantic understanding is the utilization and support of multimodal data which are heterogeneous in low-level features, such as image and audio. The main challenge is how to measure different kinds of correlations among multimodal data. In this paper, we propose a novel approach to boost multimodal semantic understanding from local and global perspectives. First, cross-media correlation between images and audio clips is estimated with Kernel Canonical Correlation Analysis; secondly, a multimodal graph is constructed to enable global correlation propagation with adapted intra-media similarity; then cross-media retrieval algorithm is discussed as an application of our approach. A prototype system is developed to demonstrate the feasibility and capability. Experimental results are encouraging and show that the performance of our approach is effective.

This work is supported by Scientific Research Project funded by Education Department of Hubei Province (Q20091101), Science Foundation of Wuhan University of Science and Technology(2008TD04).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Lew, M., Sebe, N., Djeraba, C., Jain, R.: Content-based Multimedia Information Retrieval: State-of-the-art and Challenges. ACM Transactions on Multimedia Computing, Communication, and Applications 2(1), 1–19 (2006)

    Article  Google Scholar 

  2. Yang, Y., Zhuang, Y., Wu, F., Pan, Y.: Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-media Retrieval. IEEE Transactions on Multimedia 10(3), 437–446 (2008)

    Article  Google Scholar 

  3. Yang, Y., Xu, D., Nie, F., Luo, J., Zhuang, Y.: Ranking with local regression and global alignment for cross media retrieval. In: ACM Multimedia, pp. 175–184 (2009)

    Google Scholar 

  4. Swain, M., Ballard, D.: Color indexing. International Journal of Computer Vision 7(1), 11–32 (1991)

    Article  Google Scholar 

  5. Zhao, R., Grosky, W.I.: Negotiating the Semantic Gap: from Feature Maps to Semantic Landscapes. Pattern Recognition 35(3), 593–600 (2002)

    Article  MATH  Google Scholar 

  6. Zhou, Z.-H., Ng, M., She, Q.-Q., Jiang, Y.: Budget Semi-supervised Learning, pp. 588–595 (2009)

    Google Scholar 

  7. Kim, T.-K., Wong, S.-F., Cipolla, R.: Tensor Canonical Correlation Analysis for Action Classification. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)

    Google Scholar 

  8. Rui, Y., Huang, T.S., Ortega, M., Mehrotra, S.: Relevance Feedback: A Power Tool in Interactive Content-based Image Retrieval. IEEE Trans. on Circuits and Systems for Video Technology 8, 644–655 (1998)

    Article  Google Scholar 

  9. He, X., Ma, W.Y., Zhang, H.J.: Learning an Image Manifold for Retrieval. In: Proceedings of ACM Multimedia Conference (2004)

    Google Scholar 

  10. Jafari-Khouzani, K., Soltanian-Zadeh, H.: Radon Transform Orientation Estimation for Rotation Invariant Texture Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(6), 1004–1008 (2005)

    Article  MathSciNet  Google Scholar 

  11. Srivastava, A., Joshi, S.H., Mio, W., Liu, X.: Statistical Shape Analysis: Clustering, Learning, and Testing. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(4), 590–602 (2005)

    Article  Google Scholar 

  12. Guo, G., Li, S.Z.: Content-based Audio Classification and Retrieval by Support Vector Machines. IEEE Transactions on Neural Networks 14(1), 209–215 (2003)

    Article  Google Scholar 

  13. Fan, J., Elmagarmid, A.K., Zhu, X.q., Aref, W.G., Wu, L.: ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing. IEEE Transactions on Multimedia 6(1), 70–86 (2004)

    Article  Google Scholar 

  14. Müller, M., Röder, T., Clausen, M.: Efficient Content-Based Retrieval of Motion Capture Data. In: Proceedings of ACM SIGGRAPH 2005 (2005)

    Google Scholar 

  15. McGurk, H., MacDonald, J.: Hearing Lips and Seeing Voices. Nature 264, 746–748 (1976)

    Article  Google Scholar 

  16. Zhang, H., Weng, J.: Measuring Multi-modality Similarities via Subspace Learning for Cross-media Retrieval. In: Proceedings of 7th Pacific-Rim Conference on Multimedia, pp. 979–988 (2006)

    Google Scholar 

  17. Wang, X.-j., Ma, W.-Y., Zhang, L., Li, X.: Multi-graph Enabled Active Learning for Multimodal Web Image Retrieval. In: The 7th ACM SIGMM International Workshop on Multimedia Information Retrieval, Singapore (2005)

    Google Scholar 

  18. Yang, Y., Wu, F., Xu, D., et al.: Cross-media Retrieval using query dependent search methods. Pattern Recognition 43(8), 2927–2936 (2010)

    Article  MATH  Google Scholar 

  19. Zhang, H., Zhuang, Y., Wu, F.: Cross-modal correlation learning for clustering on image-audio dataset. In: ACM International Conference on Multimedia, Germany (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, H., Liu, X. (2010). Boosting Multimodal Semantic Understanding by Local Similarity Adaptation and Global Correlation Propagation. In: Qiu, G., Lam, K.M., Kiya, H., Xue, XY., Kuo, CC.J., Lew, M.S. (eds) Advances in Multimedia Information Processing - PCM 2010. PCM 2010. Lecture Notes in Computer Science, vol 6297. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15702-8_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15702-8_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15701-1

  • Online ISBN: 978-3-642-15702-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics