Skip to main content

PEDIVHANDI: Multimodal Indexation and Retrieval System for Lecture Videos

  • Conference paper
Book cover Computer Vision – ACCV 2012 (ACCV 2012)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7725))

Included in the following conference series:

Abstract

Since text in slides and teacher’s speech complementarily represent lecture contents, lecture videos can be indexed and retrieved by using a fully automatic and complete system based on the multimodal analysis of speech and text. In this paper, we present the multimodal lecture content indexing approach used in the PEDIVHANDI project. We use the discretization of speech and changes of slide’s texts to identify lecture slides in the video. We also propose a duplicate verification to remove nearly-duplicate slides. After using the Stroke Width Transfrom (SWT) text detector to obtain text regions, a standard OCR engine is used for text recognition. Finally, a context-based spell check is proposed to correct words recognized. Our system achieves the recognition precision 71% and 57% recall on a corpus of 6 presentation videos for a total duration of 8 hours.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sivic, J., Zisserman, A.: Efficient Visual Search for Objects in Videos. Proceedings of the IEEE Computer Society 96, 548–566 (2008)

    Google Scholar 

  2. Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33, 31–88 (2001)

    Article  Google Scholar 

  3. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR, pp. 2963–2970 (2010)

    Google Scholar 

  4. Martin, T., Boucher, A., Ogier, J.M., Rossignol, M., Castelli, E.: Multimedia Scenario Extraction and Content Indexing for E-Learning. In: CBMI 2007, pp. 204–211 (2007)

    Google Scholar 

  5. Merler, M., Kender, J.R.: Semantic keyword extraction via adaptive text binarization of unstructured unsourced video. In: ICIP, pp. 261–264 (2010)

    Google Scholar 

  6. Denoue, L., Hilbert, D., Billsus, D., Cooper, M.: Projectorbox: Seamless presentation capture for classroom. In: World Conf. on E-Learning, pp. 144–151 (2005)

    Google Scholar 

  7. Yang, H., Oehlke, C., Meinel, C.: German Speech Recognition: A Solution for the Analysis and Processing of Lecture Recordings. In: ICIS 2011, pp. 201–206 (2011)

    Google Scholar 

  8. Yang, H., Siebert, M., Lühne, P., Sack, H., Meinel, C.: Automatic Lecture Video Indexing Using Video OCR Technology. In: ISM (2011)

    Google Scholar 

  9. Law-To, J., Gauvain, J., Lamel, L., Grefenstette, G., Gravier, G., Despres, J., Guinaudeau, C., Sebillot, P.: A Scalable Video Search Engine Based on Audio Content Indexing and Topic Segmentation. In: CoRR (2011)

    Google Scholar 

  10. Adcock, J., Cooper, M., Denoue, L., Pirsiavash, H., Rowe, L.A.: TalkMiner: a lecture webcast search engine. In: MM 2010, pp. 241–250 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Van Nguyen, N., Ogier, JM., Charneau, F. (2013). PEDIVHANDI: Multimodal Indexation and Retrieval System for Lecture Videos. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds) Computer Vision – ACCV 2012. ACCV 2012. Lecture Notes in Computer Science, vol 7725. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37444-9_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37444-9_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37443-2

  • Online ISBN: 978-3-642-37444-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics