Abstract
Cross-media retrieval is an interesting research problem, which seeks to breakthrough the limitation of modality so that users can query multimedia objects by examples of different modalities. In this paper we present a novel approach to learn the underlying correlation between visual and auditory feature spaces for cross-media retrieval. A semi-supervised Correlation Preserving Mapping (SSCPM) is described to learn the isomorphic SSCPM subspace where canonical correlations between original visual and auditory features are furthest preserved. Based on user interactions of relevance feedback, local semantic clusters are formed for images and audios respectively. With the dynamic spread of ranking scores of positive and negative examples, cross-media semantic correlations are refined, and cross-media distance is accurately estimated. Experiment results are encouraging and show that the performance of our approach is effective.
This research is supported by National Natural Science Foundation of China (No.60533090, No.60525108), Science and Technology Project of Zhejiang Province (2005C13032, 2005C11001-05), and China-US Million Book Digital Library Project (www.cadal.zju. edu.cn).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wang, X.-j., Ma, W.-Y., Xue, G.-R., Li, X.: Multi-Model Similarity Propagation and its Applications for Web Image Retrieval. In: 12th ACM International Conference on Multimedia, USA (2004)
Chang, E., Goh, K., Sychay, G., Wu, G.: CBSA: Content-based soft annotation for multimodal image retrieval using Bayes point machine. IEEE Trans on Circuits and Systems for Video Technology 13(1) (2003)
He, X., Ma, W.Y., Zhang, H.J.: Learning an image manifold for retrieval. In: ACM Multimedia Conference, pp. 17–23 (2004)
Zhao, X., Zhuang, Y., Wu, F.: Audio clip retrieval with fast relevance feedback based on constrained fuzzy clustering and stored index table. In: The Third Pacific-Rim Conference on Multimedia, pp. 237–244 (2002)
Guo, G., Li, S.Z.: Content-based audio classification and retrieval by support vector machines. IEEE Transactions on Neural Networks 14(1), 209–215 (2003)
Fan, J., Elmagarmid, A.K., Zhu, X.q., Aref, W.G., Wu, L.: ClassView: hierarchical video shot classification, indexing, and accessing. IEEE Transactions on Multimedia 6(1), 70–86 (2004)
Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis; an overview with application to learning methods. Neural Computation 16, 2639–2664 (2004)
Zhang, H., Weng, J.: Measuring Multi-modality Similarities from Partly Labeled Data for Cross-media Retrieval. In: The 7th Pacific-Rim Conference on Multimedia, pp. 979–988 (2006)
Tenenbaum, J.B., Silva, V.D., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Wu, F., Zhang, H., Zhuang, Y.: Learning Semantic Correlations for Cross-media Retrieval. In: The 13th Int’l Conf. on Image Processing (ICIP), USA (2006)
Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning, with application to clustering with side-information. In: NIPS 15, pp. 505–512 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, H., Wu, F. (2006). Bridging the Gap Between Visual and Auditory Feature Spaces for Cross-Media Retrieval. In: Cham, TJ., Cai, J., Dorai, C., Rajan, D., Chua, TS., Chia, LT. (eds) Advances in Multimedia Modeling. MMM 2007. Lecture Notes in Computer Science, vol 4351. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69423-6_58
Download citation
DOI: https://doi.org/10.1007/978-3-540-69423-6_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69421-2
Online ISBN: 978-3-540-69423-6
eBook Packages: Computer ScienceComputer Science (R0)