Understanding Multimedia Document Semantics for Cross-Media Retrieval

Wu, Fei; Yang, Yi; Zhuang, Yueting; Pan, Yunhe

doi:10.1007/11581772_87

Fei Wu¹⁸,
Yi Yang¹⁸,
Yueting Zhuang¹⁸ &
…
Yunhe Pan¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3767))

Included in the following conference series:

Pacific-Rim Conference on Multimedia

1225 Accesses
5 Citations

Abstract

Multimedia Document (MMD) such as Web Page and Multimedia cyclopedias is composed of media objects of different modalities, and its integrated semantics is always expressed by the combination of all media objects in it. Since the contents in MMDs are enormous and the amount of them is increasing rapidly, effective management of MMDs is in great demand. Meanwhile, it is meaningful to provide users cross-media retrieval facilities so that users can query media objects by examples of different modalities, e.g. users may query an MMD (or an image) by submitting a audio clip and vice versa. However, there exist two challenges to achieve the above goals. First, how can we represent an MMD and fuse media objects together to achieve Cross-index and facilitate Cross-media retrieval? Second, how can we understand MMD semantics? Taking into account of the two problems, we give the definition of MMD and propose a manifold learning method to discover MMD semantics in this paper. We first construct an MMD semi-semantic graph (SSG) and then adopt Multidimensional scaling to create an MMD semantic space (MMDSS). We also propose two periods’ feedbacks. The first one is used to refine SSG and the second one is adopted to introduce new MMD that is not in the MMDSS into MMDSS. Since all of the MMDs and their component media objects of different modalities lie in MMDSS, cross-media retrieval can be easily performed. Experiment results are encouraging and indicate that the performance of the proposed approach is effective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zhang, H.J., Zhong, D.: Schema for visual feature based image retrieval [A]. In: Proceedings of Storage and Retrieval for Image and Video Database, USA, pp. 36–46 (1995)
Google Scholar
Wang, J.Z., Wiederhold, G., Firschein, O., Wei, S.X.: Content-based image indexing and searching using Daubechies’ wavelets. International Journal on Digital Libaries 1, 311–328 (1997)
Article Google Scholar
Chang, E., Goh, K., Sychay, G., Wu, G.: CBSA: Content-Based Soft Annotation for Multimodal Image Retrieval Using Bayes Point Machine. IEEE Trans on Circuits and Systems for Video Technology 13(1) (January 2003)
Google Scholar
He, X., Ma, W.Y., Zhang, H.J.: Learning an Image Manifold for Retrieval. In: ACM Multimedia Conference, New York (2004)
Google Scholar
Maddage, N.C., Xu., C., Kankanhalli, M.S., Shao, X.: Content-based Music Structure Analysis with Applications to Music Semantics Understanding. In: ACM Multimedia Conference, New York (2004)
Google Scholar
Guo, G., Li, S.Z.: Content-based audio classification and retrieval by support vector machines. IEEE Transactions on Neural Networks 14(1), 209–215 (2003)
Article Google Scholar
Wold, E., Blum, T., Keislar, D., Wheaton, J.: Content-based classification,search and retrieval of audio. IEEE Multimedia Mag. 3, 27–36 (1996)
Article Google Scholar
Smoliar, S.W., Zhang, H.: Content based video indexing and retrieval. Multimedia, IEEE 1(2), 62–72 (Summer 1994)
Google Scholar
Fan, J., Elmagarmid, A.K., Zhu, X., Aref, W.G., Wu, L.: ClassView: hierarchical video shot classification, indexing, and accessing. IEEE Transactions on Multimedia 6(1), 70–86 (2004)
Article Google Scholar
Wu, M.Y., Chiu, C.Y., Chao, S., Yang, S., Lin, H.C.: Content-Based Retrieval for Human Motion Data. In: 16th IPPR Conference on Computer Vision, Graphics and Image Processing CVGIP 2003 (2003)
Google Scholar
Müller, M., Röder, T., Clausen, M.: Efficient Content-Based Retrieval of Motion Capture Data. Proceedings of ACM SIGGRAPH (2005)
Google Scholar
Wang, Z., Liu, J.: Multimedia content analysis using audio and visual information [J]. IEEE Signal Processing Magazine 17(6), 12–36 (2000)
Article Google Scholar
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: International Conference on Database Theory, pp. 217–235 (1999)
Google Scholar
Yang, J., Zhuang, Y.T., Li, Q.: Search for multi-modality data in digital libraries. In: Proceedings of 2nd IEEE Pacific-rim Conference on Multimedia, Beijing, China, pp. 482–489 (2001)
Google Scholar
Seung, H.S., Lee, D.: The manifold ways of perception. Science 290 (December 22, 2000)
Google Scholar
Tenenbaum, J.B., Silva, V.D., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290 (December 22, 2000)
Google Scholar
Kruskal, J.B., Wish, M.: Multidimensional Scaling. Sage Publications, Beverly Hills (1977)
Google Scholar
Zhuang, Y., Wu, C., Wu, F., Liu, X.: Improving Web-based Learning: Automatic Annotation of Multimedia Semantics and Cross-Media Indexing. In: Liu, W., Shi, Y., Li, Q. (eds.) ICWL 2004. LNCS, vol. 3143, pp. 255–262. Springer, Heidelberg (2004)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Engineering, Zhejiang University, Hangzhou, P.R. China
Fei Wu, Yi Yang, Yueting Zhuang & Yunhe Pan

Authors

Fei Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yueting Zhuang
View author publications
You can also search for this author in PubMed Google Scholar
Yunhe Pan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Gwangju Institute of Science and Technology (GIST), 1 Oryong-dong Buk-gu, 500-712, Gwangju, Korea
Yo-Sung Ho
Multimedia Security Lab, Korea University, Science Campus, 136-701, Seoul, Korea
Hyoung Joong Kim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, F., Yang, Y., Zhuang, Y., Pan, Y. (2005). Understanding Multimedia Document Semantics for Cross-Media Retrieval. In: Ho, YS., Kim, H.J. (eds) Advances in Multimedia Information Processing - PCM 2005. PCM 2005. Lecture Notes in Computer Science, vol 3767. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11581772_87

Download citation

DOI: https://doi.org/10.1007/11581772_87
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30027-4
Online ISBN: 978-3-540-32130-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics