PEDIVHANDI: Multimodal Indexation and Retrieval System for Lecture Videos

Van Nguyen, Nhu; Ogier, Jean-Marc; Charneau, Franck

doi:10.1007/978-3-642-37444-9_30

Nhu Van Nguyen²⁰,
Jean-Marc Ogier²⁰ &
Franck Charneau²¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7725))

Included in the following conference series:

Asian Conference on Computer Vision

3847 Accesses
1 Citations

Abstract

Since text in slides and teacher’s speech complementarily represent lecture contents, lecture videos can be indexed and retrieved by using a fully automatic and complete system based on the multimodal analysis of speech and text. In this paper, we present the multimodal lecture content indexing approach used in the PEDIVHANDI project. We use the discretization of speech and changes of slide’s texts to identify lecture slides in the video. We also propose a duplicate verification to remove nearly-duplicate slides. After using the Stroke Width Transfrom (SWT) text detector to obtain text regions, a standard OCR engine is used for text recognition. Finally, a context-based spell check is proposed to correct words recognized. Our system achieves the recognition precision 71% and 57% recall on a corpus of 6 presentation videos for a total duration of 8 hours.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sivic, J., Zisserman, A.: Efficient Visual Search for Objects in Videos. Proceedings of the IEEE Computer Society 96, 548–566 (2008)
Google Scholar
Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33, 31–88 (2001)
Article Google Scholar
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR, pp. 2963–2970 (2010)
Google Scholar
Martin, T., Boucher, A., Ogier, J.M., Rossignol, M., Castelli, E.: Multimedia Scenario Extraction and Content Indexing for E-Learning. In: CBMI 2007, pp. 204–211 (2007)
Google Scholar
Merler, M., Kender, J.R.: Semantic keyword extraction via adaptive text binarization of unstructured unsourced video. In: ICIP, pp. 261–264 (2010)
Google Scholar
Denoue, L., Hilbert, D., Billsus, D., Cooper, M.: Projectorbox: Seamless presentation capture for classroom. In: World Conf. on E-Learning, pp. 144–151 (2005)
Google Scholar
Yang, H., Oehlke, C., Meinel, C.: German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings. In: ICIS 2011, pp. 201–206 (2011)
Google Scholar
Yang, H., Siebert, M., Lühne, P., Sack, H., Meinel, C.: Automatic Lecture Video Indexing Using Video OCR Technology. In: ISM (2011)
Google Scholar
Law-To, J., Gauvain, J., Lamel, L., Grefenstette, G., Gravier, G., Despres, J., Guinaudeau, C., Sebillot, P.: A Scalable Video Search Engine Based on Audio Content Indexing and Topic Segmentation. In: CoRR (2011)
Google Scholar
Adcock, J., Cooper, M., Denoue, L., Pirsiavash, H., Rowe, L.A.: TalkMiner: a lecture webcast search engine. In: MM 2010, pp. 241–250 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

L3I, University of La Rochelle, La Rochelle, France
Nhu Van Nguyen & Jean-Marc Ogier
@ctice, University of La Rochelle, La Rochelle, France
Franck Charneau

Authors

Nhu Van Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Marc Ogier
View author publications
You can also search for this author in PubMed Google Scholar
Franck Charneau
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, 151-744, Seoul, Korea
Kyoung Mu Lee
Microsoft Research Asia, No. 5, Danling st., Haidian district, 100080, Beijing, P.R. China
Yasuyuki Matsushita
School of Interactive Computing, Georgia Institute of Technology, 801 Atlantic Drive, CCB 315, 30332, Atlanta, GA, USA
James M. Rehg
Institute of Automation, National Laboratory of Pattern Recognition, Chinese Academy of Sciences, Zhong Quan Cun East Road 95, Haidian District, 100 190, Beijing, P.R. China
Zhanyi Hu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Van Nguyen, N., Ogier, JM., Charneau, F. (2013). PEDIVHANDI: Multimodal Indexation and Retrieval System for Lecture Videos. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds) Computer Vision – ACCV 2012. ACCV 2012. Lecture Notes in Computer Science, vol 7725. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37444-9_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-37444-9_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37443-2
Online ISBN: 978-3-642-37444-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics