Abstract
Since text in slides and teacher’s speech complementarily represent lecture contents, lecture videos can be indexed and retrieved by using a fully automatic and complete system based on the multimodal analysis of speech and text. In this paper, we present the multimodal lecture content indexing approach used in the PEDIVHANDI project. We use the discretization of speech and changes of slide’s texts to identify lecture slides in the video. We also propose a duplicate verification to remove nearly-duplicate slides. After using the Stroke Width Transfrom (SWT) text detector to obtain text regions, a standard OCR engine is used for text recognition. Finally, a context-based spell check is proposed to correct words recognized. Our system achieves the recognition precision 71% and 57% recall on a corpus of 6 presentation videos for a total duration of 8 hours.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sivic, J., Zisserman, A.: Efficient Visual Search for Objects in Videos. Proceedings of the IEEE Computer Society 96, 548–566 (2008)
Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33, 31–88 (2001)
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR, pp. 2963–2970 (2010)
Martin, T., Boucher, A., Ogier, J.M., Rossignol, M., Castelli, E.: Multimedia Scenario Extraction and Content Indexing for E-Learning. In: CBMI 2007, pp. 204–211 (2007)
Merler, M., Kender, J.R.: Semantic keyword extraction via adaptive text binarization of unstructured unsourced video. In: ICIP, pp. 261–264 (2010)
Denoue, L., Hilbert, D., Billsus, D., Cooper, M.: Projectorbox: Seamless presentation capture for classroom. In: World Conf. on E-Learning, pp. 144–151 (2005)
Yang, H., Oehlke, C., Meinel, C.: German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings. In: ICIS 2011, pp. 201–206 (2011)
Yang, H., Siebert, M., Lühne, P., Sack, H., Meinel, C.: Automatic Lecture Video Indexing Using Video OCR Technology. In: ISM (2011)
Law-To, J., Gauvain, J., Lamel, L., Grefenstette, G., Gravier, G., Despres, J., Guinaudeau, C., Sebillot, P.: A Scalable Video Search Engine Based on Audio Content Indexing and Topic Segmentation. In: CoRR (2011)
Adcock, J., Cooper, M., Denoue, L., Pirsiavash, H., Rowe, L.A.: TalkMiner: a lecture webcast search engine. In: MM 2010, pp. 241–250 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Van Nguyen, N., Ogier, JM., Charneau, F. (2013). PEDIVHANDI: Multimodal Indexation and Retrieval System for Lecture Videos. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds) Computer Vision – ACCV 2012. ACCV 2012. Lecture Notes in Computer Science, vol 7725. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37444-9_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-37444-9_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37443-2
Online ISBN: 978-3-642-37444-9
eBook Packages: Computer ScienceComputer Science (R0)