Skip to main content
Log in

Extracting content from instructional videos by statistical modelling and classification

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

This paper presents a robust approach to extracting content from instructional videos for handwritten recognition, indexing and retrieval, and other e-learning applications. For the instructional videos of chalkboard presentations, retrieving the handwritten content (e.g., characters, drawings, figures) on boards is the first and prerequisite step towards further exploration of instructional video content. However, content extraction in instructional videos is still challenging due to video noise, non-uniformity of the color in board regions, light condition changes in a video session, camera movements, and unavoidable occlusions by instructors. To solve this problem, we first segment video frames into multiple regions and estimate the parameters of the board regions based on statistical analysis of the pixels in dominant regions. Then we accurately separate the board regions from irrelevant regions using a probabilistic classifier. Finally, we combine top-hat morphological processing with a gradient-based adaptive thresholding technique to retrieve content pixels from the board regions. Evaluation of the content extraction results on four full-length instructional videos shows the high performance of the proposed method. The extraction of content text facilitates the research on full exploitation of instructional videos, such as content enhancement, indexing, and retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Altman E, Chen Y, Low WC (2002) Semantic exploration of lecture videos. In: ACM conference on multimedia, pp 416–417

  2. Ankush Mittal SJ, Sumit Gupta, Jain A (2006) Content-based adaptive compression of educational videos using phase correlation techniques. IEEE Trans Multimedia 11(3):249–259

    Google Scholar 

  3. Antani S, Crandall D, Kasturi R (2000) Robust extraction of text in video. In: International conference on pattern recognition, pp 831–834

  4. Cai M, Song J, Lyu MR (2002) A new approach for video text detection. In: International conference on image processing, pp 117–120

  5. Comaniciu D, Meer P (2002) Mean shift: a robust approach towards feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619

    Article  Google Scholar 

  6. Davis JL, Smith TW (1994) Computer-assisted distance learning. IEEE Trans Educ 37(2):228–233

    Article  MathSciNet  Google Scholar 

  7. Dorai C, Oria V, Neelavalli V (2003) Structuralizing educational videos based on presentation content. In: International conference on image processing, vol 3, pp 1029–1032

  8. Fan J, Luo H, Elmagarmid AK (2004) Concept-oriented indexing of video databases: toward semantic sensitive retrieval and browsing. IEEE Trans Image Process 13(7):974–992

    Article  Google Scholar 

  9. Gao J, Yang J (2001) An adaptive algorithm for text detection from natural scenes. In: International conference on computer vision and pattern recognition, pp 84–89

  10. Gonzalez RC, Woods RE (2000) Digital image processing. Addison–Wesley, USA

  11. Heng WJ, Tian Q (2002) Content enhancement for e-learning lecture videos using foreground/background separation. In: IEEE workshop on multimedia signal processing, pp 436–439

  12. Ju SX, Black MJ, Minneman S, Kimber D (1998) Summarization of videotaped presentations: automatic analysis of motion and gesture. IEEE Trans Circuits Systems Video Technol 8(5):686–696

    Article  Google Scholar 

  13. Kittler J, Illingworth J (1986) Minimum error thresholding. Pattern Recognit 19(1):41–47

    Article  Google Scholar 

  14. Li H, Doermann D, Kia O (2000) Automatic text detection and tracking in digital video. IEEE Trans Image Process 9(2):147–156

    Article  Google Scholar 

  15. Liang J, Doermann D, Li H (2005) Camera-based analysis of text and documents: a survey. Int J Doc Anal Recognit 7(2–3):84–104

    Article  Google Scholar 

  16. Lienhart R(1996) Automatic text recognition for video indexing. In: ACM conference on multimedia, pp 11–20

  17. Lienhart R, Wernicke A (2002) Localizing and segmenting text in images and videos. IEEE Trans Circuits Syst Video Technol 12(4):256–268

    Article  Google Scholar 

  18. Liu T, Hejelsvold R, Kender JR (2002) Analysis and enhancement of videos of electronic slide presentations. In: International conference on multimedia and expo, vol 1, pp 77–80

  19. Liu T, Kender JR (2003) Spatial-temporal semantic grouping of instructional video content. In: International conference on content-based image and video retrieval, pp 362–372

  20. Liu Y, Kender JR (2003) Fast video segment retrieval by sort-merge feature selection, boundary refinement and lazy evaluation. Comput Vis Image Underst 92(2-3):147–175

    Google Scholar 

  21. Malladi R, Sethian JA, Vemuri BC (1995) Shape modeling with front propagation: a level set approach. IEEE Trans Pattern Anal Mach Intell 17(2):158–175

    Article  Google Scholar 

  22. Mandal MK, Idris F, Panchanathan S (1999) A Critical evaluation of image and video indexing techniques in the compressed domain. Image Vis Comput 17(7):513–529

    Article  Google Scholar 

  23. Mukhopadhyay S, Smith B (1999) Passive capture and structuring of lectures. In: ACM conference on multimedia, pp 477–487

  24. Ngo CW, Chan CK (2005) Video text detection and segmentation for optical character recognition. Multimedia Syst 10(3):261–272

    Article  Google Scholar 

  25. Niblack W (1986) An introduction to image processing. Prentice-Hall, Englewood Cliffs

  26. Onishi M, Izumi M, Fukunaga K (2000) Blackboard segmentation using video image of lecture and its applications. In: International conference on pattern recognition, pp 615–618

  27. Phung DQ, Venkatesh S, Dorai C (2002) High level segmentation of instructional videos based on content density. In: ACM confernce on multimedia, pp 295–298

  28. Sezgin M, Sankur B (2004) Survey over image thresholding techniques and quantitative performance evaluation. J Electron Imaging 13(1):146–168

    Article  Google Scholar 

  29. Stafford-Fraser Q, Robinson P (1996) Brightboard: a video-augmented environment. In: Conference on computer human interface, pp 134–141

  30. Syeda-Mahmood T, Srinivasan S (2000) Detecting topical events in digital video. In: ACM conference on multimedia, pp 85–94

  31. Tang X, Luo B, Gao X, Pissaloux E, and Zhang H (2002) Video text extraction using temporal feature vectors. In: International conference on multimedia and expo, vol 1, 85–88

  32. Wang S, Siskind JM (2003) Image segmentation with ratio cut. IEEE Trans Pattern Anal Mach Intell 25(6):675–690

    Article  Google Scholar 

  33. Wienecke M, Fink GA, Sagerer G (2005) Toward automatic video-based whiteboard reading. Int J Doc Anal Recognit 7(2–3):188–200

    Article  Google Scholar 

  34. Zhang D, Nunamaker JF (2004) A natural language approach to content-based video indexing and retrieval for interactive e-learning. IEEE Trans Multimedia 6(3):450–458

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chekuri Choudary.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Choudary, C., Liu, T. Extracting content from instructional videos by statistical modelling and classification. Pattern Anal Applic 10, 69–81 (2007). https://doi.org/10.1007/s10044-006-0051-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-006-0051-9

Keywords

Navigation