Abstract.
This paper addresses explicit correlation and implicit correlation between various media streams in a composite multimedia document, the so-called navigated hypermedia document in our language learning system, in order to facilitate document retrieval and synchronized presentation. For replaying a recorded lecture in a form as close as possible to the original classroom experience, we devised a capturing mechanism to explicitly record all the lecturing media streams and relations between them, including instructor’s voice, slide change of the HTML lectures, and various guiding actions (e.g., tele-pointers, pen strokes, document scrolling, keyword highlighting, and text annotations) on HTML-based slides. In addition, for more effective learning, we study three different aspects - temporal, spatial, and content relation - of the implicit correlations that are inherently hidden between the media involved. The implicit relations are discovered by three designed processes: the speech-text alignment process for temporally synchronized speech-text presentation, the automatic scrolling process for the viewing window’s spatial synchronization, and the content dependency checking process to ensure consistency of the content processed and the relations involved. The experimental results show that exploring cross-media correlations is helpful for system development in document presentation and retrieving. Users are allowed to replay a vivid and learning-effective multimedia lecture and to access the desired part of the document very easily via cross-media indexing. Hence the results have been applied to the development of online multimedia language learning systems aimed at improving students’ English and Chinese language capabilities.
Similar content being viewed by others
References
Chen HY, Chen GY, Hong JS (1999) Design of a Web-based synchronized multimedia lecture system for distance education. In: Proceedings of the IEEE international conference on multimedia computing and systems, 2:887-891
Mukhopadhyay S, Smith B (1999) Passive capture and structuring of lectures. In: Proceedings of ACM Multimedia, pp 477-487
Abowd GD, Atkeson CG, Brotherton JA, Enqvist T, Gulley P, Lemon J (1998) Investigating the capture, integration and access problem of ubiquitous computing in an educational setting. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 440-447
Owen CB (1998) Multiple media correlation: theory and applications. Technical Report PCS-TR98-335, Dartmouth College, Hanover, NH
W3C (1998) Synchronized Multimedia Integration Language (SMIL) Specification. http://www.w3.org/TR/REC-smil/
Steinmetz R (1996) Human perception of jitter and media synchronization. IEEE J Select Areas Commun 14(1):61-72
Huang X, Alleva F, Hon HW, Hwang MY, Rosenfeld R (1993) The SPHINX II speech recognition system: an overview. Comput Speech Lang 2(7):137-148
Carnegie Mellon University (1998) The CMU pronouncing dictionary. http://www.speech.cs.cmu.edu/cgi-bin/cmudict
Ney H, Ortmanns S (2000) Progress in dynamic programming search for LVCSR. Proc IEEE 88(8):1224-1240
Kececioglu JD, Zhang WQ (1998) Aligning alignments. In: Proceedings of the 9th symposium on combinatorial pattern matching. Lecture notes in computer science, vol 1448. Springer, Berlin Heidelberg New York, pp 189-208
Anson EL, Myers EW (1997) ReAligner: a program for refining DNA sequence multi-alignments. In: Proceedings of the 1st ACM conference on computational molecular biology, pp 9-16
Lopresti D, Wilfong G (1999) Cross-domain approximate string matching. In: Proceedings of the 6th international symposium on string processing and information retrieval. IEEE Press, New York, pp 120-127
Hauptmann AG, Witbrock MJ (1997) Informedia: news-on-demand multimedia information acquisition and retrieval. In: Maybury M (ed) Intelligent multimedia information retrieval. AAAI Press, Cambridge, MA
Owen CB, Makedon F (1999) Computed synchronization for multimedia application. Kluwer, Boston
Zobel J, Dart P (1996) Phonetic string matching: lessons from information retrieval. In: Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval, pp 166-172
Chen T, Graf HP, Wang K (1995) Lip synchronization using speech-assisted video processing. IEEE Signal Process Lett 2(4):57-59
Muller R, Ottmann T (2000) The “Authoring on the Fly” system for automated recording and reply of (tele)presentations. Multimedia Syst J 8(3):158-176
Damerau FJ (1964) The technique for computer detection and correction of spelling errors. Commun ACM 7(3):171-176
Moreno PJ, Joerg C, Van Thong JM, Glickman O (1998) A recursive algorithm for the forced alignment of very long audio segments. In: Proceedings of ICSLP’98, pp 68-71
Abowd GD (1999) Classroom 2000: an experiment with the instrumentation of a living educational environment. IBM Syst J 38(4):508-530
Blakowski G, Steinmetz R (1996) A media synchronization survey: reference model, specification, and case studies. IEEE J Select Areas Commun 14(1):5-35
Author information
Authors and Affiliations
Corresponding author
Additional information
Published online: 14 December 2004
Rights and permissions
About this article
Cite this article
Chu, WT., Chen, HY. Toward better retrieval and presentation by exploring cross-media correlations. Multimedia Systems 10, 183–198 (2005). https://doi.org/10.1007/s00530-004-0150-7
Issue Date:
DOI: https://doi.org/10.1007/s00530-004-0150-7