Bridging the Gap Between Visual and Auditory Feature Spaces for Cross-Media Retrieval

Zhang, Hong; Wu, Fei

doi:10.1007/978-3-540-69423-6_58

Hong Zhang²¹ &
Fei Wu²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4351))

Included in the following conference series:

International Conference on Multimedia Modeling

863 Accesses
2 Citations

Abstract

Cross-media retrieval is an interesting research problem, which seeks to breakthrough the limitation of modality so that users can query multimedia objects by examples of different modalities. In this paper we present a novel approach to learn the underlying correlation between visual and auditory feature spaces for cross-media retrieval. A semi-supervised Correlation Preserving Mapping (SSCPM) is described to learn the isomorphic SSCPM subspace where canonical correlations between original visual and auditory features are furthest preserved. Based on user interactions of relevance feedback, local semantic clusters are formed for images and audios respectively. With the dynamic spread of ranking scores of positive and negative examples, cross-media semantic correlations are refined, and cross-media distance is accurately estimated. Experiment results are encouraging and show that the performance of our approach is effective.

This research is supported by National Natural Science Foundation of China (No.60533090, No.60525108), Science and Technology Project of Zhejiang Province (2005C13032, 2005C11001-05), and China-US Million Book Digital Library Project (www.cadal.zju. edu.cn).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Wang, X.-j., Ma, W.-Y., Xue, G.-R., Li, X.: Multi-Model Similarity Propagation and its Applications for Web Image Retrieval. In: 12th ACM International Conference on Multimedia, USA (2004)
Google Scholar
Chang, E., Goh, K., Sychay, G., Wu, G.: CBSA: Content-based soft annotation for multimodal image retrieval using Bayes point machine. IEEE Trans on Circuits and Systems for Video Technology 13(1) (2003)
Google Scholar
He, X., Ma, W.Y., Zhang, H.J.: Learning an image manifold for retrieval. In: ACM Multimedia Conference, pp. 17–23 (2004)
Google Scholar
Zhao, X., Zhuang, Y., Wu, F.: Audio clip retrieval with fast relevance feedback based on constrained fuzzy clustering and stored index table. In: The Third Pacific-Rim Conference on Multimedia, pp. 237–244 (2002)
Google Scholar
Guo, G., Li, S.Z.: Content-based audio classification and retrieval by support vector machines. IEEE Transactions on Neural Networks 14(1), 209–215 (2003)
Article Google Scholar
Fan, J., Elmagarmid, A.K., Zhu, X.q., Aref, W.G., Wu, L.: ClassView: hierarchical video shot classification, indexing, and accessing. IEEE Transactions on Multimedia 6(1), 70–86 (2004)
Article Google Scholar
Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis; an overview with application to learning methods. Neural Computation 16, 2639–2664 (2004)
Article MATH Google Scholar
Zhang, H., Weng, J.: Measuring Multi-modality Similarities from Partly Labeled Data for Cross-media Retrieval. In: The 7th Pacific-Rim Conference on Multimedia, pp. 979–988 (2006)
Google Scholar
Tenenbaum, J.B., Silva, V.D., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Article Google Scholar
Wu, F., Zhang, H., Zhuang, Y.: Learning Semantic Correlations for Cross-media Retrieval. In: The 13th Int’l Conf. on Image Processing (ICIP), USA (2006)
Google Scholar
Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning, with application to clustering with side-information. In: NIPS 15, pp. 505–512 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

The Institute of Artificial Intelligence, Zhejiang University, HangZhou, 310027, P.R. China
Hong Zhang & Fei Wu

Authors

Hong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Fei Wu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Engineering, Nanyang Technological University, Block N4, Nanyang Avenue, 639798, Singapore
Tat-Jen Cham & Deepu Rajan &
School of Computer Engineering, Nanyang Technological University, 639798, Singapore
Jianfei Cai
IBM T.J. Watson Research Center, Yorktown Heights, P.O. Box 704, 10598, New York, USA
Chitra Dorai
National University of Singapore, 3 Science Dr, 117543, Singapore
Tat-Seng Chua
Center for Multimedia and Network Technology, School of Computer Enginnering, Nanyang Technological University, 639798, Singapore
Liang-Tien Chia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, H., Wu, F. (2006). Bridging the Gap Between Visual and Auditory Feature Spaces for Cross-Media Retrieval. In: Cham, TJ., Cai, J., Dorai, C., Rajan, D., Chua, TS., Chia, LT. (eds) Advances in Multimedia Modeling. MMM 2007. Lecture Notes in Computer Science, vol 4351. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69423-6_58

Download citation

DOI: https://doi.org/10.1007/978-3-540-69423-6_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69421-2
Online ISBN: 978-3-540-69423-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics