Skip to main content
Log in

Toward better retrieval and presentation by exploring cross-media correlations

  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract.

This paper addresses explicit correlation and implicit correlation between various media streams in a composite multimedia document, the so-called navigated hypermedia document in our language learning system, in order to facilitate document retrieval and synchronized presentation. For replaying a recorded lecture in a form as close as possible to the original classroom experience, we devised a capturing mechanism to explicitly record all the lecturing media streams and relations between them, including instructor’s voice, slide change of the HTML lectures, and various guiding actions (e.g., tele-pointers, pen strokes, document scrolling, keyword highlighting, and text annotations) on HTML-based slides. In addition, for more effective learning, we study three different aspects - temporal, spatial, and content relation - of the implicit correlations that are inherently hidden between the media involved. The implicit relations are discovered by three designed processes: the speech-text alignment process for temporally synchronized speech-text presentation, the automatic scrolling process for the viewing window’s spatial synchronization, and the content dependency checking process to ensure consistency of the content processed and the relations involved. The experimental results show that exploring cross-media correlations is helpful for system development in document presentation and retrieving. Users are allowed to replay a vivid and learning-effective multimedia lecture and to access the desired part of the document very easily via cross-media indexing. Hence the results have been applied to the development of online multimedia language learning systems aimed at improving students’ English and Chinese language capabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Chen HY, Chen GY, Hong JS (1999) Design of a Web-based synchronized multimedia lecture system for distance education. In: Proceedings of the IEEE international conference on multimedia computing and systems, 2:887-891

  2. Mukhopadhyay S, Smith B (1999) Passive capture and structuring of lectures. In: Proceedings of ACM Multimedia, pp 477-487

  3. Abowd GD, Atkeson CG, Brotherton JA, Enqvist T, Gulley P, Lemon J (1998) Investigating the capture, integration and access problem of ubiquitous computing in an educational setting. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 440-447

  4. Owen CB (1998) Multiple media correlation: theory and applications. Technical Report PCS-TR98-335, Dartmouth College, Hanover, NH

    Google Scholar 

  5. W3C (1998) Synchronized Multimedia Integration Language (SMIL) Specification. http://www.w3.org/TR/REC-smil/

  6. Steinmetz R (1996) Human perception of jitter and media synchronization. IEEE J Select Areas Commun 14(1):61-72

    Google Scholar 

  7. Huang X, Alleva F, Hon HW, Hwang MY, Rosenfeld R (1993) The SPHINX II speech recognition system: an overview. Comput Speech Lang 2(7):137-148

    Article  Google Scholar 

  8. Carnegie Mellon University (1998) The CMU pronouncing dictionary. http://www.speech.cs.cmu.edu/cgi-bin/cmudict

  9. Ney H, Ortmanns S (2000) Progress in dynamic programming search for LVCSR. Proc IEEE 88(8):1224-1240

    Article  Google Scholar 

  10. Kececioglu JD, Zhang WQ (1998) Aligning alignments. In: Proceedings of the 9th symposium on combinatorial pattern matching. Lecture notes in computer science, vol 1448. Springer, Berlin Heidelberg New York, pp 189-208

  11. Anson EL, Myers EW (1997) ReAligner: a program for refining DNA sequence multi-alignments. In: Proceedings of the 1st ACM conference on computational molecular biology, pp 9-16

  12. Lopresti D, Wilfong G (1999) Cross-domain approximate string matching. In: Proceedings of the 6th international symposium on string processing and information retrieval. IEEE Press, New York, pp 120-127

  13. Hauptmann AG, Witbrock MJ (1997) Informedia: news-on-demand multimedia information acquisition and retrieval. In: Maybury M (ed) Intelligent multimedia information retrieval. AAAI Press, Cambridge, MA

  14. Owen CB, Makedon F (1999) Computed synchronization for multimedia application. Kluwer, Boston

  15. Zobel J, Dart P (1996) Phonetic string matching: lessons from information retrieval. In: Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval, pp 166-172

  16. Chen T, Graf HP, Wang K (1995) Lip synchronization using speech-assisted video processing. IEEE Signal Process Lett 2(4):57-59

    Article  Google Scholar 

  17. Muller R, Ottmann T (2000) The “Authoring on the Fly” system for automated recording and reply of (tele)presentations. Multimedia Syst J 8(3):158-176

    Article  Google Scholar 

  18. Damerau FJ (1964) The technique for computer detection and correction of spelling errors. Commun ACM 7(3):171-176

    Article  Google Scholar 

  19. Moreno PJ, Joerg C, Van Thong JM, Glickman O (1998) A recursive algorithm for the forced alignment of very long audio segments. In: Proceedings of ICSLP’98, pp 68-71

  20. Abowd GD (1999) Classroom 2000: an experiment with the instrumentation of a living educational environment. IBM Syst J 38(4):508-530

    Google Scholar 

  21. Blakowski G, Steinmetz R (1996) A media synchronization survey: reference model, specification, and case studies. IEEE J Select Areas Commun 14(1):5-35

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei-Ta Chu.

Additional information

Published online: 14 December 2004

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chu, WT., Chen, HY. Toward better retrieval and presentation by exploring cross-media correlations. Multimedia Systems 10, 183–198 (2005). https://doi.org/10.1007/s00530-004-0150-7

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-004-0150-7

Keywords:

Navigation