A novel approach to automatic detection of presentation slides in educational videos

Zhao, Baoquan; Lin, Shujin; Qi, Xin; Wang, Ruomei; Luo, Xiaonan

doi:10.1007/s00521-017-3276-1

A novel approach to automatic detection of presentation slides in educational videos

S.I. : Neural Computing in Next Generation Virtual Reality Technology
Published: 19 December 2017

Volume 29, pages 1369–1382, (2018)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Baoquan Zhao¹,
Shujin Lin²,
Xin Qi¹,
Ruomei Wang¹ &
…
Xiaonan Luo³

1087 Accesses
7 Citations
Explore all metrics

Abstract

Recent advancement in learning and teaching methodology experimented with virtual reality (VR)-based presentation form to create immersive learning and training environment. The quality of such educational VR applications not only relies on the virtual model, but the 2D presentation materials such as text, diagrams and figures. However, manual designing or seeking these educational resources is both labor intensive and time-consuming. In this paper, we introduce a new automatic algorithm to detect and extract presentation slides in educational videos, which will provide abundant resources for creating slide-based immersive presentation environment. The proposed approach mainly involves five core components: shot boundary detection, training instances collection, shot classification, slide region detection and slide transition detection. We conducted comparison experiment to evaluate the performance of the proposed method. The results indicate that, in comparison with peer method, the proposed method improves the precision of slide detection from 81.6 to 92.6% and recall from 74.7 to 86.3% on average. With the detected slides, content analyzer can be employed to further extract reusable elements, which can be used for developing VR-based educational applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

Deep learning for video object segmentation: a review

Article Open access 08 April 2022

Student Class Behavior Dataset: a video dataset for recognizing, detecting, and captioning students’ behaviors in classroom scenes

Article 30 January 2021

References

Franziska P, Wittstock V, Lorenz M, Riedel T (2013) Immersive presentations: enabling engaging virtual reality based training and teaching by merging slide-based and vr-based elements. In: 5th international conference on changeable, agile, reconfigurable and virtual production (CARV 2013), Springer, pp 125–130
Price CB (2008) Unreal powerpoint: immersing powerpoint presentations in a virtual computer game engine world. Comput Hum Behav 24(6):2486–2495
Article Google Scholar
Guo PJ, Reinecke K (2014) Demographic differences in how students navigate through MOOCs. In: Proceedings of the first ACM conference on learning@ scale conference, ACM, pp 21–30
Krishnan SS, Sitaraman RK (2013) Video stream quality impacts viewer behavior: inferring causality using quasi-experimental designs. IEEE/ACM Trans Netw 21(6):2001–2014
Article Google Scholar
Matejka J, Grossman T, Fitzmaurice G (2012) Swift: reducing the effects of latency in online video scrubbing. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 637–646
Matejka J, Grossman T, Fitzmaurice G (2013) Swifter: improved online video scrubbing. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp. 1159–1168
Goldman DB, Curless B, Salesin D, Seitz SM (2006) Schematic storyboarding for video visualization and editing. In: ACM transactions on graphics (TOG), ACM, vol 5, pp 862–871
Calic J, Gibson DP, Campbell NW (2007) Efficient layout of comic-like video summaries. IEEE Trans Circuits Syst Video Technol 17(7):931–936
Article Google Scholar
Mei T, Yang B, Yang S-Q, Hua X-S (2009) Video collage: presenting a video sequence using a single image. Vis Comput 25(1):39–51
Article Google Scholar
Adjeroh D, Lee MC, Banda N (2009) Adaptive edge-oriented shot boundary detection. EURASIP J Image Video Process 2009(1):1
Article Google Scholar
Yoo H-W, Ryoo H-J, Jang D-S (2006) Gradual shot boundary detection using localized edge blocks. Multimed Tools Appl 28(3):283–300
Article Google Scholar
Li W-K, Lai S-H (2003) Integrated video shot segmentation algorithm. In: Electronic imaging 2003, international society for optics and photonics, pp 264–271
Zhe-Ming L, Shi Y (2013) Fast video shot boundary detection based on svd and pattern matching. IEEE Trans Image Process 22(12):5136–5145
Article MathSciNet Google Scholar
Boreczky J, Girgensohn A, Golovchinsky G, Uchihashi S (2000) An interactive comic book presentation for exploring video. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 185–192
Chiu P, Girgensohn A, Liu Q (2004) Stained-glass visualization for highly condensed video summaries. In: IEEE international conference on multimedia and expo, 2004. ICME’04, IEEE, vol 3, pp 2059–2062
Teodosio L, Bender W (2005) Salient stills. ACM Trans Multimed Comput Commun Appl (TOMM) 1(1):16–36
Article Google Scholar
Jing G, Yongtao H, Guo Y, Yizhou Y, Wang W (2015) Content-aware video2comics with manga-style layout. IEEE Trans Multimed 17(12):2122–2133
Article Google Scholar
Chen Y-N, Huang Y, Kong S-Y, Lee L-S (2010) Automatic key term extraction from spoken course lectures using branching entropy and prosodic/semantic features. In: Spoken language technology workshop (SLT), 2010 IEEE, pp 265–270
Balasubramanian V, Doraisamy SG, Kanakarajan NK (2016) A multimodal approach for extracting content descriptive metadata from lecture videos. J Intell Inf Syst 46(1):121–145
Article Google Scholar
Haubold A (2004) Analysis and visualization of index words from audio transcripts of instructional videos. In: Proceedings of IEEE sixth international symposium on multimedia software engineering, 2004, pp 570–573. IEEE
Haubold A, Kender JR (2005) Augmented segmentation and visualization for presentation videos. In: Proceedings of the 13th annual ACM international conference on multimedia, ACM, pp 51–60
Zhao B, Xu S, Lin S, Luo X, Duan L (2015) A new visual navigation system for exploring biomedical open educational resource (OER) videos. J Am Med Inf Assoc 23:e34
Article Google Scholar
Xiangyu W, Ramanathan S, Kankanhalli M (2009) A robust framework for aligning lecture slides with video. In: 2009 16th IEEE international conference on image processing (ICIP), IEEE, pp 249–252
Schroth G, Cheung N-M, Steinbach E, Girod B (2011) Synchronization of presentation slides and lecture videos using bit rate sequences. In: 2011 18th IEEE international conference on image processing, IEEE, pp 925–928
Kao JL, Chen SY, Duh DJ (2013) Detecting handwritten annotation by synchronization of lecture slides and videos. In: Proceedings of the international conference on image processing, computer vision, and pattern recognition (IPCV), pp 1. The steering committee of the world congress in computer science, computer engineering and applied computing (WorldComp)
Adcock J, Cooper M, Denoue L, Pirsiavash H, Rowe LA (2010) Talkminer: a lecture webcast search engine. In: Proceedings of the 18th ACM international conference on Multimedia, ACM, pp 241–250
Tuna T, Subhlok J, Barker L, Varghese V, Johnson O, Shah S (2012) Development and evaluation of indexed captioned searchable videos for stem coursework. In: Proceedings of the 43rd ACM technical symposium on computer science education, ACM, pp 129–134
Yang H, Meinel C (2014) Content based lecture video retrieval using speech and video text information. IEEE Trans Learn Technol 7:142–154
Article Google Scholar
Rubner Y, Tomasi C, Guibas LJ (2000) The earth mover’s distance as a metric for image retrieval. Int J Comput Vis 40:99–121
Article MATH Google Scholar
Baltru T, Robinson P, Morency L-P, et al (2016) Openface: an open source facial behavior analysis toolkit. In: 2016 IEEE winter conference on applications of computer vision (WACV), IEEE, pp 1–10
Smith R, Antonova D, Lee D-S (2009) Adapting the tesseract open source OCR engine for multilingual OCR. In: Proceedings of the international workshop on multilingual OCR, ACM, pp 1
Khan R, Van de Weijer J, Shahbaz KF, Muselet D, Ducottet C, Barat C (2013) Discriminative color descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2866–2873
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Article MATH Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), IEEE, vol 1, pp 886–893
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Article MATH Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Shechtman E, Irani M (2007) Matching local self-similarities across images and videos. In: 2007 IEEE conference on computer vision and pattern recognition, IEEE, pp 1–8
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 3360–3367
Jeong HJ, Kim T-E, Kim HG, Kim MH (2015) Automatic detection of slide transitions in lecture videos. Multimed Tools Appl 74(18):7537–7554
Article Google Scholar
Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66
Article MathSciNet Google Scholar

Download references

Acknowledgements

This research is supported by the National Natural Science Foundation of China (No.61572531, 61232011, 61502546, 61402546), Science and Technology Planning Project of Zhongshan (No. 2016A1044).

Author information

Authors and Affiliations

National Engineering Research Center of Digital Life, School of Data and Computer Science, Sun Yat-sen University, Guangzhou, 510006, China
Baoquan Zhao, Xin Qi & Ruomei Wang
School of Communication and Design, Sun Yat-sen University, Guangzhou, 510006, China
Shujin Lin
School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China
Xiaonan Luo

Authors

Baoquan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Shujin Lin
View author publications
You can also search for this author in PubMed Google Scholar
Xin Qi
View author publications
You can also search for this author in PubMed Google Scholar
Ruomei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaonan Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaonan Luo.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (txt 0 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, B., Lin, S., Qi, X. et al. A novel approach to automatic detection of presentation slides in educational videos. Neural Comput & Applic 29, 1369–1382 (2018). https://doi.org/10.1007/s00521-017-3276-1

Download citation

Received: 18 December 2016
Accepted: 13 November 2017
Published: 19 December 2017
Issue Date: March 2018
DOI: https://doi.org/10.1007/s00521-017-3276-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel approach to automatic detection of presentation slides in educational videos

Abstract

Access this article

Similar content being viewed by others

Video summarization using deep learning techniques: a detailed analysis and investigation

Deep learning for video object segmentation: a review

Student Class Behavior Dataset: a video dataset for recognizing, detecting, and captioning students’ behaviors in classroom scenes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Electronic supplementary material

Supplementary material 1 (txt 0 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel approach to automatic detection of presentation slides in educational videos

Abstract

Access this article

Similar content being viewed by others

Video summarization using deep learning techniques: a detailed analysis and investigation

Deep learning for video object segmentation: a review

Student Class Behavior Dataset: a video dataset for recognizing, detecting, and captioning students’ behaviors in classroom scenes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Electronic supplementary material

Supplementary material 1 (txt 0 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation