A multimodal alignment framework for spoken documents

Mekhaldi, Dalila; Lalanne, Denis; Ingold, Rolf

doi:10.1007/s11042-011-0842-x

A multimodal alignment framework for spoken documents

Published: 13 July 2011

Volume 61, pages 353–388, (2012)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Dalila Mekhaldi¹,
Denis Lalanne² &
Rolf Ingold²

237 Accesses
3 Citations
Explore all metrics

Abstract

We present a multimodal document alignment framework, which highlights existing alignment relationships between documents that are discussed and recorded during multimedia events such as meetings. These relationships that should help indexing the archives of these events are detected using various techniques from natural language processing and information retrieval. The main alignment strategies studied are based on thematic, quotation and reference relationships. At the analysis level, the alignment framework was applied at several levels of granularity of documents, requiring specific document segmentation techniques. Our framework that is language independent was evaluated on corpora in French and English, including meetings and scientific presentations. The satisfactory evaluation results obtained at several stages show the importance of our approach in bridging the gap between meeting documents, independently from the language and domain. They highlight also the utility of the multimodal alignment in advanced applications, e.g. multimedia document browsing, content-based / temporal-based searching, etc.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

AMIDA project homepage. http:/www.ercim.eu/activity/projects/amida.html. Accessed 20 March 2011
Anderson R, Hoyer C, Prince C, Su J, Videon F, Wolfman S (2004) Speech, ink, and slides: the interaction of content channels. In: Proceedings of ACM multimedia. New York, USA, pp 796–803
Anderson R, Hoyer C, Wolfman S A (2005) A study of diagrammatic ink in lecture. In: Proceedings of computers and graphics, pp 480–489
Anderson R, Davis P, Linnell N, Prince C, Razmov V, Videon F (2007) Classroom Presenter: Enhancing Interactive Education with Digital Ink. IEEE Computer 40–9:56–61
Article Google Scholar
Barras C, Geoffrois E, Wu Z, Liberman M (1998) Transcriber: a free tool for segmenting, labelling and transcribing speech. In: Proceedings of LREC’98. Spain, pp 1373–1376
Behera A, Lalanne D, Ingold R (2008) DocMIR: an automatic document-based indexing system for meeting retrieval. Int J Multimed Tools Appl 37–2:135–167
Article Google Scholar
Bloechle J.L, Rigamonti M, Hadjar K, Lalanne D, Ingold R (2006) XCDF: a canonical and structured document format. In: Proceedings of DAS, the 7th IAPR International Workshop on document analysis systems. New Zealand
Brotherton JA (2001) eClass: building, observing and understanding the impact of capture and access in an educational setting, PhD Thesis. Georgia Institute of Technology, USA
Brotherton JA, Bhalodia JR, Abowd GD (1998) Automated capture, integration, and visualization of multiple media streams. In: Proceedings of IEEE multimedia, pp 54–63
Chiu P, Foote J, Girgensohn A, Boreczky J (2000) Automatically linking multimedia meeting documents by image matching. In: Proceedings of Hypertext’00, ACM Press, Texas, USA, pp 244–245
Chiu P, Kapuskar A, Reitmeier A, Wilcox L (2000) Room with a Rear View: Meeting Capture in a Multimedia Conference Room. IEEE Multimedia 7–4:48–54
Google Scholar
Chiu P, Girgensohn A, Liu Q (2004) Stained-glass visualization for highly condensed video summaries. In Proceedings of IEEE International Conference on Multimedia and Expo ICME’04. Taipei, Taiwan
CMU Sphinx system. http://cmusphinx.sourceforge.net/html/cmusphinx.php. Accessed 7 December 2010
Corral D (2005) Including a thesaurus in similarity calculation. A Bachelor Thesis in Computer Science. University of Fribourg, Switzerland
Cutler R, Rui Y, Gupta A, Cadiz J, Tashev I, He L, Colburn A, Zhang Z, Liu Z, Silverberg S (2002) Distributed meetings: a meeting capture and broadcasting system. In: Proceedings of ACM multimedia. France, pp 503–512
Elsweiler D, Ruthven I, Jones C (2007) Towards memory supporting personal information management tools. Am Soc Inf Sci Technol 58–7:924–946
Article Google Scholar
Girgensohn A, Borczkyj WL (2001) Keyframe-based user interfaces for digital video. IEEE Computer 34–9:61–67
Article Google Scholar
Gruenstein A, Seneff A (2007) Releasing a multimodal dialogue system into thewild: user support mechanisms. In: Proceedings of the 8th SIGdial workshop on discourse and dialogue, pp 111–119
Hearst M (1994). Multi-paragraph segmentation of expository text. In: Proceedings of ACL, the 32nd Annual Meeting of the Association for Computational Linguistics. USA, pp 9–16
HTK tool. http://htk.eng.cam.ac.uk/links/asr_tool.shtml. Accessed 7 December 2010
Kornfield EM, Manmatha R, Allan J (2004) Text alignment with handwritten documents. In: Proceedings of DIAL, document image analysis for libraries. San Jose, California, USA, pp 195–211
Lalanne D, Von Rotz D, Ingold R (2005) IM2.DI, Integration de Documents dans des Archives Multimedias de Reunions. In : Flash Informatique, Ecole Polytechnique Federale de Lausanne, FI2/05, pp 15–18
Le QA, Popescu-Belis A (2009) Automatic vs. human question answering over multimedia meeting recordings. In: Proceedings of Interspeech’09 (10th Annual Conference of the International Speech Communication Association). Brighton, UK, pp 624–627
Le Meur JY, Bourillot D (2005) INDICO, un Logiciel de Pointe pour la Gestion de Conference. In: Flash Informatique, Ecole Polytechnique Fédérale de Lausanne, FI2/05, pp 12–14
Little S, Geurts J, Hunter J (2002) Dynamic generation of intelligent multimedia presentations through semantic inferencing. In: Proceedings of ECDL, the 6th European Conference on Research and Advanced Technology for Digital Libraries. Rome, Italy, pp 158–175
Macedo AA, Da Graca CPM, Camacho-Guerrero JA (2001) Latent semantic linking over homogeneous repositories. In; Proceedings of DocEng, the ACM symposium on document engenieer. USA, pp 144–151
Macedo AA, Camacho-Guerrero JA, Cattelan RG, Inacio VR, Da Graca CPM (2004) Interaction alternatives for linking everyday presentations. In: Proceedings of ACM hypertext. USA, pp 112–113
Matrakas M.D, Bortolozzi F (2000) Segmentation and validation of commercial documents logical structure. In: Proceedings of ITCC, International Conference on information technology: coding and computing. USA, pp 242–246
Mekhaldi D (2006) A study on multimodal document alignment: bridging the gap between textual documents and spoken language. PhD Thesis, N° 1521. Fribourg, Switzerland
Mekhaldi D (2007) Multimodal document alignment: towards a fully-indexed multimedia archive. In: Proceedings of multimedia informtation retrieval workshop, SIGIR’07. The Netherlands
Mekhaldi D, Lalanne D (2010) Multimodal document alignment: feature-based validation to strengthen thematic links. J Multimed Proc Technol (JMPT) 1(1):30–46
Google Scholar
Mekhaldi D, Lalanne D, Ingold R (2004) Thematic segmentation of meetings through document/speech alignment. In: Proceedings of 12th Annual Conference ACM Multimedia 2004. New York, USA, pp 804–811
Mekhaldi D, Lalanne D, Ingold R (2005) From searching to browsing through multimodal documents linking. In: Proceedings of ICDAR, the 8th International Conference on Document Analysis and Recognition. Korea, pp 924–928
Memoir project homepage. http://dagda.shef.ac.uk/memoir/. Accessed 13 February 2009
Moore D (2002) The IDIAP smart meeting room. Technical report. IDIAP-Com. Martigny, Switzerland
Morde A, Kashi RS, Brown MB, Grove D, Flanagan JL (2002) A multimodal system for accessing driving directions. In: Proceedings of document analysis systems. Princeton, NJ, USA, pp 595–601
Mukhopadhyay S, Smith B (1999) Passive capture and structuring of lectures. In Proceedings of the 17th ACM International Conference on multimedia. Florida, USA, pp 477–487
Olligschlaeger AM, Hauptmann AG (1999) Multimodal information systems and GIS: the informedia digital video library. In: Proceedings of ESRI user conference. California, USA
Ponte JM, Croft WB (1997) Text segmentation by topic. In: Proceedings of ECDL’97. Italy, pp 113–125
Popescu-Belis A, Lalanne D (2004) Reference Resolution over a Restricted Domain: References to Documents. In: Proceedings of ACL Workshop on Reference Resolution and its Applications. Barcelona, Spain, pp 71–78.
Popescu-Belis A, Georgescul M, Clark A, Armstrong S (2004) Building and using a corpus of shallow dialogue annotated meetings. In: Proceedings of LREC’04. Portugal, pp 1451–1454
Popescu-Belis A, Kilgour J, Poller P, Nanchen A, Boertjes E, de Wit J (2010) Automatic content linking: speech-based just-in-time retrieval for multimedia archives. In: Proceedings of SIGIR’10, 33rd Annual International ACM SIGIR Conference on research and development on information retrieval, demonstration session. Geneva, Switzerland
QALLME project. http://qallme.itc.it/. Accessed 7 December 2010
Saetre R, Tveit A, Steigedal TS, Laegreid A (2005) Semantic annotation of biomedical literature using google. In: Proceedings of DMBIO’05. Singapore, pp 327–337
Scansoft system. http://scansoft.crystal-product.com/. Accessed 7 December 2010
Schultz T, Waibel A, Bett M, Metze F, Pan Y, Ries K, Schaaf T, Soltau H, Westphal M, Yu H, Zechner K (2002) The ISL meeting room system. In: Proceedings of HSC, the workshop on hands-free speech communication. Kyoto, Japan
Tang L, Kender, J (2005) Educational video understanding: mapping handwritten text to textbook chapters. In: Proceedings of ICDAR, the 8th International Conference on document analysis and recognition. Seoul, Korea, pp 919–923
The Quranic Arabic Corpus.homepage. http://corpus.quran.com/. Accessed 25 March 2011
The Smart meeting room recorded data. http://diuf.unifr.ch/im2/. Accessed 7 December 2010
Von Rotz D, Bourillot D, Abou Khaled O, Scheurer R, Lalanne D, Ingold R, Le Meur J-Y, Baron T (2006) SMAC—Smart Multimedia Archive for Conferences. In: Flash Informatique FI1/06, Ecole Polytechnique Fédérale de Lausanne, ISSN 1420-7192, pp 3–10
Wahlster W, Andre E, Finkler W, Profitlich HJ, Rist T (1993) Plan-based Integration of Natural Language and Graphics Generation. In Artificial Intelligence 63:387–427
Article Google Scholar
WordNet thesaurus. http://WordNet.princeton.edu/. Accessed 7 December 2010
Yu JH (2004) Alignment of Bilingual web pages based on the MT evaluation method of BLEU. In: Student Workshop of COCLING 14, conference on computational linguistics and speech processing. Taipei, Taiwan
Zhang B, Andre M, Calado P, Cristo M (2004) Combining structural and citation-based evidence for text classification. In: Proceedings of CIKM, the 13th conference on information and knowledge management. Washington D.C., USA 2004, pp 162–163

Download references

Author information

Authors and Affiliations

Computational Linguistics Group, University of Wolverhampton, Wolverhampton, UK
Dalila Mekhaldi
Department of Informatics, University of Fribourg, Fribourg, Switzerland
Denis Lalanne & Rolf Ingold

Authors

Dalila Mekhaldi
View author publications
You can also search for this author in PubMed Google Scholar
Denis Lalanne
View author publications
You can also search for this author in PubMed Google Scholar
Rolf Ingold
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dalila Mekhaldi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mekhaldi, D., Lalanne, D. & Ingold, R. A multimodal alignment framework for spoken documents. Multimed Tools Appl 61, 353–388 (2012). https://doi.org/10.1007/s11042-011-0842-x

Download citation

Published: 13 July 2011
Issue Date: November 2012
DOI: https://doi.org/10.1007/s11042-011-0842-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A multimodal alignment framework for spoken documents

Abstract

Access this article

Similar content being viewed by others

Efficient document alignment across scenarios

Tools for Multimodal Annotation

The Spoken Wikipedia Corpus collection: Harvesting, alignment and an application to hyperlistening

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A multimodal alignment framework for spoken documents

Abstract

Access this article

Similar content being viewed by others

Efficient document alignment across scenarios

Tools for Multimodal Annotation

The Spoken Wikipedia Corpus collection: Harvesting, alignment and an application to hyperlistening

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation