Abstract
We present a multimodal document alignment framework, which highlights existing alignment relationships between documents that are discussed and recorded during multimedia events such as meetings. These relationships that should help indexing the archives of these events are detected using various techniques from natural language processing and information retrieval. The main alignment strategies studied are based on thematic, quotation and reference relationships. At the analysis level, the alignment framework was applied at several levels of granularity of documents, requiring specific document segmentation techniques. Our framework that is language independent was evaluated on corpora in French and English, including meetings and scientific presentations. The satisfactory evaluation results obtained at several stages show the importance of our approach in bridging the gap between meeting documents, independently from the language and domain. They highlight also the utility of the multimodal alignment in advanced applications, e.g. multimedia document browsing, content-based / temporal-based searching, etc.
Similar content being viewed by others
References
AMIDA project homepage. http:/www.ercim.eu/activity/projects/amida.html. Accessed 20 March 2011
Anderson R, Hoyer C, Prince C, Su J, Videon F, Wolfman S (2004) Speech, ink, and slides: the interaction of content channels. In: Proceedings of ACM multimedia. New York, USA, pp 796–803
Anderson R, Hoyer C, Wolfman S A (2005) A study of diagrammatic ink in lecture. In: Proceedings of computers and graphics, pp 480–489
Anderson R, Davis P, Linnell N, Prince C, Razmov V, Videon F (2007) Classroom Presenter: Enhancing Interactive Education with Digital Ink. IEEE Computer 40–9:56–61
Barras C, Geoffrois E, Wu Z, Liberman M (1998) Transcriber: a free tool for segmenting, labelling and transcribing speech. In: Proceedings of LREC’98. Spain, pp 1373–1376
Behera A, Lalanne D, Ingold R (2008) DocMIR: an automatic document-based indexing system for meeting retrieval. Int J Multimed Tools Appl 37–2:135–167
Bloechle J.L, Rigamonti M, Hadjar K, Lalanne D, Ingold R (2006) XCDF: a canonical and structured document format. In: Proceedings of DAS, the 7th IAPR International Workshop on document analysis systems. New Zealand
Brotherton JA (2001) eClass: building, observing and understanding the impact of capture and access in an educational setting, PhD Thesis. Georgia Institute of Technology, USA
Brotherton JA, Bhalodia JR, Abowd GD (1998) Automated capture, integration, and visualization of multiple media streams. In: Proceedings of IEEE multimedia, pp 54–63
Chiu P, Foote J, Girgensohn A, Boreczky J (2000) Automatically linking multimedia meeting documents by image matching. In: Proceedings of Hypertext’00, ACM Press, Texas, USA, pp 244–245
Chiu P, Kapuskar A, Reitmeier A, Wilcox L (2000) Room with a Rear View: Meeting Capture in a Multimedia Conference Room. IEEE Multimedia 7–4:48–54
Chiu P, Girgensohn A, Liu Q (2004) Stained-glass visualization for highly condensed video summaries. In Proceedings of IEEE International Conference on Multimedia and Expo ICME’04. Taipei, Taiwan
CMU Sphinx system. http://cmusphinx.sourceforge.net/html/cmusphinx.php. Accessed 7 December 2010
Corral D (2005) Including a thesaurus in similarity calculation. A Bachelor Thesis in Computer Science. University of Fribourg, Switzerland
Cutler R, Rui Y, Gupta A, Cadiz J, Tashev I, He L, Colburn A, Zhang Z, Liu Z, Silverberg S (2002) Distributed meetings: a meeting capture and broadcasting system. In: Proceedings of ACM multimedia. France, pp 503–512
Elsweiler D, Ruthven I, Jones C (2007) Towards memory supporting personal information management tools. Am Soc Inf Sci Technol 58–7:924–946
Girgensohn A, Borczkyj WL (2001) Keyframe-based user interfaces for digital video. IEEE Computer 34–9:61–67
Gruenstein A, Seneff A (2007) Releasing a multimodal dialogue system into thewild: user support mechanisms. In: Proceedings of the 8th SIGdial workshop on discourse and dialogue, pp 111–119
Hearst M (1994). Multi-paragraph segmentation of expository text. In: Proceedings of ACL, the 32nd Annual Meeting of the Association for Computational Linguistics. USA, pp 9–16
HTK tool. http://htk.eng.cam.ac.uk/links/asr_tool.shtml. Accessed 7 December 2010
Kornfield EM, Manmatha R, Allan J (2004) Text alignment with handwritten documents. In: Proceedings of DIAL, document image analysis for libraries. San Jose, California, USA, pp 195–211
Lalanne D, Von Rotz D, Ingold R (2005) IM2.DI, Integration de Documents dans des Archives Multimedias de Reunions. In : Flash Informatique, Ecole Polytechnique Federale de Lausanne, FI2/05, pp 15–18
Le QA, Popescu-Belis A (2009) Automatic vs. human question answering over multimedia meeting recordings. In: Proceedings of Interspeech’09 (10th Annual Conference of the International Speech Communication Association). Brighton, UK, pp 624–627
Le Meur JY, Bourillot D (2005) INDICO, un Logiciel de Pointe pour la Gestion de Conference. In: Flash Informatique, Ecole Polytechnique Fédérale de Lausanne, FI2/05, pp 12–14
Little S, Geurts J, Hunter J (2002) Dynamic generation of intelligent multimedia presentations through semantic inferencing. In: Proceedings of ECDL, the 6th European Conference on Research and Advanced Technology for Digital Libraries. Rome, Italy, pp 158–175
Macedo AA, Da Graca CPM, Camacho-Guerrero JA (2001) Latent semantic linking over homogeneous repositories. In; Proceedings of DocEng, the ACM symposium on document engenieer. USA, pp 144–151
Macedo AA, Camacho-Guerrero JA, Cattelan RG, Inacio VR, Da Graca CPM (2004) Interaction alternatives for linking everyday presentations. In: Proceedings of ACM hypertext. USA, pp 112–113
Matrakas M.D, Bortolozzi F (2000) Segmentation and validation of commercial documents logical structure. In: Proceedings of ITCC, International Conference on information technology: coding and computing. USA, pp 242–246
Mekhaldi D (2006) A study on multimodal document alignment: bridging the gap between textual documents and spoken language. PhD Thesis, N° 1521. Fribourg, Switzerland
Mekhaldi D (2007) Multimodal document alignment: towards a fully-indexed multimedia archive. In: Proceedings of multimedia informtation retrieval workshop, SIGIR’07. The Netherlands
Mekhaldi D, Lalanne D (2010) Multimodal document alignment: feature-based validation to strengthen thematic links. J Multimed Proc Technol (JMPT) 1(1):30–46
Mekhaldi D, Lalanne D, Ingold R (2004) Thematic segmentation of meetings through document/speech alignment. In: Proceedings of 12th Annual Conference ACM Multimedia 2004. New York, USA, pp 804–811
Mekhaldi D, Lalanne D, Ingold R (2005) From searching to browsing through multimodal documents linking. In: Proceedings of ICDAR, the 8th International Conference on Document Analysis and Recognition. Korea, pp 924–928
Memoir project homepage. http://dagda.shef.ac.uk/memoir/. Accessed 13 February 2009
Moore D (2002) The IDIAP smart meeting room. Technical report. IDIAP-Com. Martigny, Switzerland
Morde A, Kashi RS, Brown MB, Grove D, Flanagan JL (2002) A multimodal system for accessing driving directions. In: Proceedings of document analysis systems. Princeton, NJ, USA, pp 595–601
Mukhopadhyay S, Smith B (1999) Passive capture and structuring of lectures. In Proceedings of the 17th ACM International Conference on multimedia. Florida, USA, pp 477–487
Olligschlaeger AM, Hauptmann AG (1999) Multimodal information systems and GIS: the informedia digital video library. In: Proceedings of ESRI user conference. California, USA
Ponte JM, Croft WB (1997) Text segmentation by topic. In: Proceedings of ECDL’97. Italy, pp 113–125
Popescu-Belis A, Lalanne D (2004) Reference Resolution over a Restricted Domain: References to Documents. In: Proceedings of ACL Workshop on Reference Resolution and its Applications. Barcelona, Spain, pp 71–78.
Popescu-Belis A, Georgescul M, Clark A, Armstrong S (2004) Building and using a corpus of shallow dialogue annotated meetings. In: Proceedings of LREC’04. Portugal, pp 1451–1454
Popescu-Belis A, Kilgour J, Poller P, Nanchen A, Boertjes E, de Wit J (2010) Automatic content linking: speech-based just-in-time retrieval for multimedia archives. In: Proceedings of SIGIR’10, 33rd Annual International ACM SIGIR Conference on research and development on information retrieval, demonstration session. Geneva, Switzerland
QALLME project. http://qallme.itc.it/. Accessed 7 December 2010
Saetre R, Tveit A, Steigedal TS, Laegreid A (2005) Semantic annotation of biomedical literature using google. In: Proceedings of DMBIO’05. Singapore, pp 327–337
Scansoft system. http://scansoft.crystal-product.com/. Accessed 7 December 2010
Schultz T, Waibel A, Bett M, Metze F, Pan Y, Ries K, Schaaf T, Soltau H, Westphal M, Yu H, Zechner K (2002) The ISL meeting room system. In: Proceedings of HSC, the workshop on hands-free speech communication. Kyoto, Japan
Tang L, Kender, J (2005) Educational video understanding: mapping handwritten text to textbook chapters. In: Proceedings of ICDAR, the 8th International Conference on document analysis and recognition. Seoul, Korea, pp 919–923
The Quranic Arabic Corpus.homepage. http://corpus.quran.com/. Accessed 25 March 2011
The Smart meeting room recorded data. http://diuf.unifr.ch/im2/. Accessed 7 December 2010
Von Rotz D, Bourillot D, Abou Khaled O, Scheurer R, Lalanne D, Ingold R, Le Meur J-Y, Baron T (2006) SMAC—Smart Multimedia Archive for Conferences. In: Flash Informatique FI1/06, Ecole Polytechnique Fédérale de Lausanne, ISSN 1420-7192, pp 3–10
Wahlster W, Andre E, Finkler W, Profitlich HJ, Rist T (1993) Plan-based Integration of Natural Language and Graphics Generation. In Artificial Intelligence 63:387–427
WordNet thesaurus. http://WordNet.princeton.edu/. Accessed 7 December 2010
Yu JH (2004) Alignment of Bilingual web pages based on the MT evaluation method of BLEU. In: Student Workshop of COCLING 14, conference on computational linguistics and speech processing. Taipei, Taiwan
Zhang B, Andre M, Calado P, Cristo M (2004) Combining structural and citation-based evidence for text classification. In: Proceedings of CIKM, the 13th conference on information and knowledge management. Washington D.C., USA 2004, pp 162–163
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mekhaldi, D., Lalanne, D. & Ingold, R. A multimodal alignment framework for spoken documents. Multimed Tools Appl 61, 353–388 (2012). https://doi.org/10.1007/s11042-011-0842-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-011-0842-x