Enhancing Student Classroom Behavior Detection Using Improved SlowFast

Zhao, Fuzhe; Yang, Wen; Han, Wenlin

doi:10.1007/978-3-031-71464-1_24

Fuzhe Zhao¹¹,
Wen Yang¹¹ &
Wenlin Han¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14997))

Included in the following conference series:

International Conference on Wireless Artificial Intelligent Computing Systems and Applications

294 Accesses

Abstract

Addressing the challenge of low accuracy in detecting classroom behaviors from teaching videos, compounded by the absence of a public classroom behavior dataset, this paper presents a novel approach. We construct a dataset specific to student classroom behavior and propose a detection method based on an improved SlowFast architecture. Our method integrates several innovations to enhance performance. Firstly, we introduce a pyramid segmentation attention module to replace the 3x3 convolution in the residual network. This module establishes a dependency relationship for long-term channel attention while capturing multi-scale spatial information. Secondly, we augment the fast branch of the network with a Transformer coding module to capture more timing information, thereby improving model accuracy. Furthermore, we modify the loss function of the original model to a dynamically scaled cross-entropy loss function. This adjustment reduces the loss weight of easily trainable samples, enabling the network to prioritize challenging samples and address the issue of unbalanced positive and negative samples. Experimental results on our self-constructed classroom behavior dataset demonstrate that the improved model achieves an average precision of 89.04%, a 3.58% improvement over the original model, and effectively reduces false detections.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A WAD-YOLOv8-based method for classroom student behavior detection

Article Open access 20 March 2025

Student Classroom Behavior Detection Based on YOLOv7+BRA and Multi-model Fusion

Csb-yolo: a rapid and efficient real-time algorithm for classroom student behavior detection

Article 27 July 2024

References

Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition (2018). https://doi.org/10.48550/arXiv.1801.07455
Qu, Z., Gao, L.Y., Wang, S.Y., et al.: An improved YOLOv5 method for large objects detection with multi-scale feature cross-layer fusion network. Image and Vision Computing (2022)
Google Scholar
Tran, D., Bourdev, L., Fergus, R., et al.: Learning spatiotemporal features with 3D conventional networks (2015). https://doi.org/10.1109/ICCV.2015.510
Wang, L., Xiong, Y., Wang, Z., et al.: Temporal segment networks: towards good practices for deep action recognition. ArXiv e-prints (2016)
Google Scholar
Feichtenhofer, C., Fan, H., Malik, J., et al.: SlowFast networks for video recognition. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019). https://doi.org/10.1109/ICCV.2019.00630
Zhang, H., Zu, K., Lu, J., et al.: EPSANet: an efficient pyramid squeeze attention block on convolutional neural network (2021)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018). https://doi.org/10.1109/CVPR.2018.00745
Zhang, Z., Wang, M.: Convolutional neural network with convolutional block attention module for finger ve- in recognition. ArXiv e-prints (2022). https://doi.org/10.48550/arXiv.2202.06673
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale (2020). https://doi.org/10.48550/arXiv.2010.11929
Liu, Z., Lin, Y., Cao, Y., et al.: Swin transformer: hierarchical vision transformer using shifted windows (2021). https://doi.org/10.48550/arXiv.2103.14030
Haizhong Q. I3D: an improved three-dimensional CNN model on hyperspectral remote sensing image classification. Hindawi Limited (2021). https://doi.org/10.1155/2021/5217578
Tran, D., Wang, H., Torresani, L., et al.: A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City and UT, America, pp. 6450–6459 (2018)
Google Scholar
Feichtenhofer, C.: X3d: expanding architectures for efficient video recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 203–213 (2020)
Google Scholar
Bertasius, G., Wang, H., Torresani, L.: Is space-time attention all you need for video understanding? (2021). https://doi.org/10.48550/arXiv.2102.05095

Download references

Acknowledgments

This research is funded by National Natural Science Foundation of China (62377026) and Hubei Provincial Key Laboratory of Artificial Intelligence and smart Learning, Central China Normal University, Wuhan, Hubei, 430079, PR. China. We are also grateful to Professor Xie Wei who devote much time to reading this paper and give us much advice.

Author information

Authors and Affiliations

School of Computer Science, Central China Normal University, Wuhan, China
Fuzhe Zhao & Wen Yang
Department of Computer Science, California State University, Fullerton, USA
Wenlin Han

Authors

Fuzhe Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Wen Yang
View author publications
You can also search for this author in PubMed Google Scholar
Wenlin Han
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenlin Han .

Editor information

Editors and Affiliations

Georgia State University, Atlanta, GA, USA
Zhipeng Cai
Old Dominion University, Norfolk, VA, USA
Daniel Takabi
Beijing University of Posts and Telecommunications, Beijing, China
Shaoyong Guo
Shandong University, Qingdao, China
Yifei Zou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, F., Yang, W., Han, W. (2025). Enhancing Student Classroom Behavior Detection Using Improved SlowFast. In: Cai, Z., Takabi, D., Guo, S., Zou, Y. (eds) Wireless Artificial Intelligent Computing Systems and Applications. WASA 2024. Lecture Notes in Computer Science, vol 14997. Springer, Cham. https://doi.org/10.1007/978-3-031-71464-1_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-71464-1_24
Published: 13 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-71463-4
Online ISBN: 978-3-031-71464-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics