Skip to main content

Advertisement

Log in

A novel approach to automatic detection of presentation slides in educational videos

  • S.I. : Neural Computing in Next Generation Virtual Reality Technology
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Recent advancement in learning and teaching methodology experimented with virtual reality (VR)-based presentation form to create immersive learning and training environment. The quality of such educational VR applications not only relies on the virtual model, but the 2D presentation materials such as text, diagrams and figures. However, manual designing or seeking these educational resources is both labor intensive and time-consuming. In this paper, we introduce a new automatic algorithm to detect and extract presentation slides in educational videos, which will provide abundant resources for creating slide-based immersive presentation environment. The proposed approach mainly involves five core components: shot boundary detection, training instances collection, shot classification, slide region detection and slide transition detection. We conducted comparison experiment to evaluate the performance of the proposed method. The results indicate that, in comparison with peer method, the proposed method improves the precision of slide detection from 81.6 to 92.6% and recall from 74.7 to 86.3% on average. With the detected slides, content analyzer can be employed to further extract reusable elements, which can be used for developing VR-based educational applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Franziska P, Wittstock V, Lorenz M, Riedel T (2013) Immersive presentations: enabling engaging virtual reality based training and teaching by merging slide-based and vr-based elements. In: 5th international conference on changeable, agile, reconfigurable and virtual production (CARV 2013), Springer, pp 125–130

  2. Price CB (2008) Unreal powerpoint: immersing powerpoint presentations in a virtual computer game engine world. Comput Hum Behav 24(6):2486–2495

    Article  Google Scholar 

  3. Guo PJ, Reinecke K (2014) Demographic differences in how students navigate through MOOCs. In: Proceedings of the first ACM conference on learning@ scale conference, ACM, pp 21–30

  4. Krishnan SS, Sitaraman RK (2013) Video stream quality impacts viewer behavior: inferring causality using quasi-experimental designs. IEEE/ACM Trans Netw 21(6):2001–2014

    Article  Google Scholar 

  5. Matejka J, Grossman T, Fitzmaurice G (2012) Swift: reducing the effects of latency in online video scrubbing. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 637–646

  6. Matejka J, Grossman T, Fitzmaurice G (2013) Swifter: improved online video scrubbing. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp. 1159–1168

  7. Goldman DB, Curless B, Salesin D, Seitz SM (2006) Schematic storyboarding for video visualization and editing. In: ACM transactions on graphics (TOG), ACM, vol 5, pp 862–871

  8. Calic J, Gibson DP, Campbell NW (2007) Efficient layout of comic-like video summaries. IEEE Trans Circuits Syst Video Technol 17(7):931–936

    Article  Google Scholar 

  9. Mei T, Yang B, Yang S-Q, Hua X-S (2009) Video collage: presenting a video sequence using a single image. Vis Comput 25(1):39–51

    Article  Google Scholar 

  10. Adjeroh D, Lee MC, Banda N (2009) Adaptive edge-oriented shot boundary detection. EURASIP J Image Video Process 2009(1):1

    Article  Google Scholar 

  11. Yoo H-W, Ryoo H-J, Jang D-S (2006) Gradual shot boundary detection using localized edge blocks. Multimed Tools Appl 28(3):283–300

    Article  Google Scholar 

  12. Li W-K, Lai S-H (2003) Integrated video shot segmentation algorithm. In: Electronic imaging 2003, international society for optics and photonics, pp 264–271

  13. Zhe-Ming L, Shi Y (2013) Fast video shot boundary detection based on svd and pattern matching. IEEE Trans Image Process 22(12):5136–5145

    Article  MathSciNet  Google Scholar 

  14. Boreczky J, Girgensohn A, Golovchinsky G, Uchihashi S (2000) An interactive comic book presentation for exploring video. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 185–192

  15. Chiu P, Girgensohn A, Liu Q (2004) Stained-glass visualization for highly condensed video summaries. In: IEEE international conference on multimedia and expo, 2004. ICME’04, IEEE, vol 3, pp 2059–2062

  16. Teodosio L, Bender W (2005) Salient stills. ACM Trans Multimed Comput Commun Appl (TOMM) 1(1):16–36

    Article  Google Scholar 

  17. Jing G, Yongtao H, Guo Y, Yizhou Y, Wang W (2015) Content-aware video2comics with manga-style layout. IEEE Trans Multimed 17(12):2122–2133

    Article  Google Scholar 

  18. Chen Y-N, Huang Y, Kong S-Y, Lee L-S (2010) Automatic key term extraction from spoken course lectures using branching entropy and prosodic/semantic features. In: Spoken language technology workshop (SLT), 2010 IEEE, pp 265–270

  19. Balasubramanian V, Doraisamy SG, Kanakarajan NK (2016) A multimodal approach for extracting content descriptive metadata from lecture videos. J Intell Inf Syst 46(1):121–145

    Article  Google Scholar 

  20. Haubold A (2004) Analysis and visualization of index words from audio transcripts of instructional videos. In: Proceedings of IEEE sixth international symposium on multimedia software engineering, 2004, pp 570–573. IEEE

  21. Haubold A, Kender JR (2005) Augmented segmentation and visualization for presentation videos. In: Proceedings of the 13th annual ACM international conference on multimedia, ACM, pp 51–60

  22. Zhao B, Xu S, Lin S, Luo X, Duan L (2015) A new visual navigation system for exploring biomedical open educational resource (OER) videos. J Am Med Inf Assoc 23:e34

    Article  Google Scholar 

  23. Xiangyu W, Ramanathan S, Kankanhalli M (2009) A robust framework for aligning lecture slides with video. In: 2009 16th IEEE international conference on image processing (ICIP), IEEE, pp 249–252

  24. Schroth G, Cheung N-M, Steinbach E, Girod B (2011) Synchronization of presentation slides and lecture videos using bit rate sequences. In: 2011 18th IEEE international conference on image processing, IEEE, pp 925–928

  25. Kao JL, Chen SY, Duh DJ (2013) Detecting handwritten annotation by synchronization of lecture slides and videos. In: Proceedings of the international conference on image processing, computer vision, and pattern recognition (IPCV), pp 1. The steering committee of the world congress in computer science, computer engineering and applied computing (WorldComp)

  26. Adcock J, Cooper M, Denoue L, Pirsiavash H, Rowe LA (2010) Talkminer: a lecture webcast search engine. In: Proceedings of the 18th ACM international conference on Multimedia, ACM, pp 241–250

  27. Tuna T, Subhlok J, Barker L, Varghese V, Johnson O, Shah S (2012) Development and evaluation of indexed captioned searchable videos for stem coursework. In: Proceedings of the 43rd ACM technical symposium on computer science education, ACM, pp 129–134

  28. Yang H, Meinel C (2014) Content based lecture video retrieval using speech and video text information. IEEE Trans Learn Technol 7:142–154

    Article  Google Scholar 

  29. Rubner Y, Tomasi C, Guibas LJ (2000) The earth mover’s distance as a metric for image retrieval. Int J Comput Vis 40:99–121

    Article  MATH  Google Scholar 

  30. Baltru T, Robinson P, Morency L-P, et al (2016) Openface: an open source facial behavior analysis toolkit. In: 2016 IEEE winter conference on applications of computer vision (WACV), IEEE, pp 1–10

  31. Smith R, Antonova D, Lee D-S (2009) Adapting the tesseract open source OCR engine for multilingual OCR. In: Proceedings of the international workshop on multilingual OCR, ACM, pp 1

  32. Khan R, Van de Weijer J, Shahbaz KF, Muselet D, Ducottet C, Barat C (2013) Discriminative color descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2866–2873

  33. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175

    Article  MATH  Google Scholar 

  34. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), IEEE, vol 1, pp 886–893

  35. Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987

    Article  MATH  Google Scholar 

  36. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  37. Shechtman E, Irani M (2007) Matching local self-similarities across images and videos. In: 2007 IEEE conference on computer vision and pattern recognition, IEEE, pp 1–8

  38. Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 3360–3367

  39. Jeong HJ, Kim T-E, Kim HG, Kim MH (2015) Automatic detection of slide transitions in lecture videos. Multimed Tools Appl 74(18):7537–7554

    Article  Google Scholar 

  40. Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This research is supported by the National Natural Science Foundation of China (No.61572531, 61232011, 61502546, 61402546), Science and Technology Planning Project of Zhongshan (No. 2016A1044).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaonan Luo.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (txt 0 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, B., Lin, S., Qi, X. et al. A novel approach to automatic detection of presentation slides in educational videos. Neural Comput & Applic 29, 1369–1382 (2018). https://doi.org/10.1007/s00521-017-3276-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-017-3276-1

Keywords

Navigation