Abstract
An important trend in multimedia semantic understanding is the utilization and support of multimodal data which are heterogeneous in low-level features, such as image and audio. The main challenge is how to measure different kinds of correlations among multimodal data. In this paper, we propose a novel approach to boost multimodal semantic understanding from local and global perspectives. First, cross-media correlation between images and audio clips is estimated with Kernel Canonical Correlation Analysis; secondly, a multimodal graph is constructed to enable global correlation propagation with adapted intra-media similarity; then cross-media retrieval algorithm is discussed as an application of our approach. A prototype system is developed to demonstrate the feasibility and capability. Experimental results are encouraging and show that the performance of our approach is effective.
This work is supported by Scientific Research Project funded by Education Department of Hubei Province (Q20091101), Science Foundation of Wuhan University of Science and Technology(2008TD04).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lew, M., Sebe, N., Djeraba, C., Jain, R.: Content-based Multimedia Information Retrieval: State-of-the-art and Challenges. ACM Transactions on Multimedia Computing, Communication, and Applications 2(1), 1–19 (2006)
Yang, Y., Zhuang, Y., Wu, F., Pan, Y.: Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-media Retrieval. IEEE Transactions on Multimedia 10(3), 437–446 (2008)
Yang, Y., Xu, D., Nie, F., Luo, J., Zhuang, Y.: Ranking with local regression and global alignment for cross media retrieval. In: ACM Multimedia, pp. 175–184 (2009)
Swain, M., Ballard, D.: Color indexing. International Journal of Computer Vision 7(1), 11–32 (1991)
Zhao, R., Grosky, W.I.: Negotiating the Semantic Gap: from Feature Maps to Semantic Landscapes. Pattern Recognition 35(3), 593–600 (2002)
Zhou, Z.-H., Ng, M., She, Q.-Q., Jiang, Y.: Budget Semi-supervised Learning, pp. 588–595 (2009)
Kim, T.-K., Wong, S.-F., Cipolla, R.: Tensor Canonical Correlation Analysis for Action Classification. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007)
Rui, Y., Huang, T.S., Ortega, M., Mehrotra, S.: Relevance Feedback: A Power Tool in Interactive Content-based Image Retrieval. IEEE Trans. on Circuits and Systems for Video Technology 8, 644–655 (1998)
He, X., Ma, W.Y., Zhang, H.J.: Learning an Image Manifold for Retrieval. In: Proceedings of ACM Multimedia Conference (2004)
Jafari-Khouzani, K., Soltanian-Zadeh, H.: Radon Transform Orientation Estimation for Rotation Invariant Texture Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(6), 1004–1008 (2005)
Srivastava, A., Joshi, S.H., Mio, W., Liu, X.: Statistical Shape Analysis: Clustering, Learning, and Testing. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(4), 590–602 (2005)
Guo, G., Li, S.Z.: Content-based Audio Classification and Retrieval by Support Vector Machines. IEEE Transactions on Neural Networks 14(1), 209–215 (2003)
Fan, J., Elmagarmid, A.K., Zhu, X.q., Aref, W.G., Wu, L.: ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing. IEEE Transactions on Multimedia 6(1), 70–86 (2004)
Müller, M., Röder, T., Clausen, M.: Efficient Content-Based Retrieval of Motion Capture Data. In: Proceedings of ACM SIGGRAPH 2005 (2005)
McGurk, H., MacDonald, J.: Hearing Lips and Seeing Voices. Nature 264, 746–748 (1976)
Zhang, H., Weng, J.: Measuring Multi-modality Similarities via Subspace Learning for Cross-media Retrieval. In: Proceedings of 7th Pacific-Rim Conference on Multimedia, pp. 979–988 (2006)
Wang, X.-j., Ma, W.-Y., Zhang, L., Li, X.: Multi-graph Enabled Active Learning for Multimodal Web Image Retrieval. In: The 7th ACM SIGMM International Workshop on Multimedia Information Retrieval, Singapore (2005)
Yang, Y., Wu, F., Xu, D., et al.: Cross-media Retrieval using query dependent search methods. Pattern Recognition 43(8), 2927–2936 (2010)
Zhang, H., Zhuang, Y., Wu, F.: Cross-modal correlation learning for clustering on image-audio dataset. In: ACM International Conference on Multimedia, Germany (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, H., Liu, X. (2010). Boosting Multimodal Semantic Understanding by Local Similarity Adaptation and Global Correlation Propagation. In: Qiu, G., Lam, K.M., Kiya, H., Xue, XY., Kuo, CC.J., Lew, M.S. (eds) Advances in Multimedia Information Processing - PCM 2010. PCM 2010. Lecture Notes in Computer Science, vol 6297. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15702-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-15702-8_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15701-1
Online ISBN: 978-3-642-15702-8
eBook Packages: Computer ScienceComputer Science (R0)