Skip to main content

Bridging the Gap Between Visual and Auditory Feature Spaces for Cross-Media Retrieval

  • Conference paper
Book cover Advances in Multimedia Modeling (MMM 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4351))

Included in the following conference series:

Abstract

Cross-media retrieval is an interesting research problem, which seeks to breakthrough the limitation of modality so that users can query multimedia objects by examples of different modalities. In this paper we present a novel approach to learn the underlying correlation between visual and auditory feature spaces for cross-media retrieval. A semi-supervised Correlation Preserving Mapping (SSCPM) is described to learn the isomorphic SSCPM subspace where canonical correlations between original visual and auditory features are furthest preserved. Based on user interactions of relevance feedback, local semantic clusters are formed for images and audios respectively. With the dynamic spread of ranking scores of positive and negative examples, cross-media semantic correlations are refined, and cross-media distance is accurately estimated. Experiment results are encouraging and show that the performance of our approach is effective.

This research is supported by National Natural Science Foundation of China (No.60533090, No.60525108), Science and Technology Project of Zhejiang Province (2005C13032, 2005C11001-05), and China-US Million Book Digital Library Project (www.cadal.zju. edu.cn).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wang, X.-j., Ma, W.-Y., Xue, G.-R., Li, X.: Multi-Model Similarity Propagation and its Applications for Web Image Retrieval. In: 12th ACM International Conference on Multimedia, USA (2004)

    Google Scholar 

  2. Chang, E., Goh, K., Sychay, G., Wu, G.: CBSA: Content-based soft annotation for multimodal image retrieval using Bayes point machine. IEEE Trans on Circuits and Systems for Video Technology 13(1) (2003)

    Google Scholar 

  3. He, X., Ma, W.Y., Zhang, H.J.: Learning an image manifold for retrieval. In: ACM Multimedia Conference, pp. 17–23 (2004)

    Google Scholar 

  4. Zhao, X., Zhuang, Y., Wu, F.: Audio clip retrieval with fast relevance feedback based on constrained fuzzy clustering and stored index table. In: The Third Pacific-Rim Conference on Multimedia, pp. 237–244 (2002)

    Google Scholar 

  5. Guo, G., Li, S.Z.: Content-based audio classification and retrieval by support vector machines. IEEE Transactions on Neural Networks 14(1), 209–215 (2003)

    Article  Google Scholar 

  6. Fan, J., Elmagarmid, A.K., Zhu, X.q., Aref, W.G., Wu, L.: ClassView: hierarchical video shot classification, indexing, and accessing. IEEE Transactions on Multimedia 6(1), 70–86 (2004)

    Article  Google Scholar 

  7. Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis; an overview with application to learning methods. Neural Computation 16, 2639–2664 (2004)

    Article  MATH  Google Scholar 

  8. Zhang, H., Weng, J.: Measuring Multi-modality Similarities from Partly Labeled Data for Cross-media Retrieval. In: The 7th Pacific-Rim Conference on Multimedia, pp. 979–988 (2006)

    Google Scholar 

  9. Tenenbaum, J.B., Silva, V.D., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)

    Article  Google Scholar 

  10. Wu, F., Zhang, H., Zhuang, Y.: Learning Semantic Correlations for Cross-media Retrieval. In: The 13th Int’l Conf. on Image Processing (ICIP), USA (2006)

    Google Scholar 

  11. Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning, with application to clustering with side-information. In: NIPS 15, pp. 505–512 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, H., Wu, F. (2006). Bridging the Gap Between Visual and Auditory Feature Spaces for Cross-Media Retrieval. In: Cham, TJ., Cai, J., Dorai, C., Rajan, D., Chua, TS., Chia, LT. (eds) Advances in Multimedia Modeling. MMM 2007. Lecture Notes in Computer Science, vol 4351. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69423-6_58

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69423-6_58

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69421-2

  • Online ISBN: 978-3-540-69423-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics