Abstract
Addressing the challenge of low accuracy in detecting classroom behaviors from teaching videos, compounded by the absence of a public classroom behavior dataset, this paper presents a novel approach. We construct a dataset specific to student classroom behavior and propose a detection method based on an improved SlowFast architecture. Our method integrates several innovations to enhance performance. Firstly, we introduce a pyramid segmentation attention module to replace the 3x3 convolution in the residual network. This module establishes a dependency relationship for long-term channel attention while capturing multi-scale spatial information. Secondly, we augment the fast branch of the network with a Transformer coding module to capture more timing information, thereby improving model accuracy. Furthermore, we modify the loss function of the original model to a dynamically scaled cross-entropy loss function. This adjustment reduces the loss weight of easily trainable samples, enabling the network to prioritize challenging samples and address the issue of unbalanced positive and negative samples. Experimental results on our self-constructed classroom behavior dataset demonstrate that the improved model achieves an average precision of 89.04%, a 3.58% improvement over the original model, and effectively reduces false detections.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition (2018). https://doi.org/10.48550/arXiv.1801.07455
Qu, Z., Gao, L.Y., Wang, S.Y., et al.: An improved YOLOv5 method for large objects detection with multi-scale feature cross-layer fusion network. Image and Vision Computing (2022)
Tran, D., Bourdev, L., Fergus, R., et al.: Learning spatiotemporal features with 3D conventional networks (2015). https://doi.org/10.1109/ICCV.2015.510
Wang, L., Xiong, Y., Wang, Z., et al.: Temporal segment networks: towards good practices for deep action recognition. ArXiv e-prints (2016)
Feichtenhofer, C., Fan, H., Malik, J., et al.: SlowFast networks for video recognition. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019). https://doi.org/10.1109/ICCV.2019.00630
Zhang, H., Zu, K., Lu, J., et al.: EPSANet: an efficient pyramid squeeze attention block on convolutional neural network (2021)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018). https://doi.org/10.1109/CVPR.2018.00745
Zhang, Z., Wang, M.: Convolutional neural network with convolutional block attention module for finger ve- in recognition. ArXiv e-prints (2022). https://doi.org/10.48550/arXiv.2202.06673
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale (2020). https://doi.org/10.48550/arXiv.2010.11929
Liu, Z., Lin, Y., Cao, Y., et al.: Swin transformer: hierarchical vision transformer using shifted windows (2021). https://doi.org/10.48550/arXiv.2103.14030
Haizhong Q. I3D: an improved three-dimensional CNN model on hyperspectral remote sensing image classification. Hindawi Limited (2021). https://doi.org/10.1155/2021/5217578
Tran, D., Wang, H., Torresani, L., et al.: A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City and UT, America, pp. 6450–6459 (2018)
Feichtenhofer, C.: X3d: expanding architectures for efficient video recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 203–213 (2020)
Bertasius, G., Wang, H., Torresani, L.: Is space-time attention all you need for video understanding? (2021). https://doi.org/10.48550/arXiv.2102.05095
Acknowledgments
This research is funded by National Natural Science Foundation of China (62377026) and Hubei Provincial Key Laboratory of Artificial Intelligence and smart Learning, Central China Normal University, Wuhan, Hubei, 430079, PR. China. We are also grateful to Professor Xie Wei who devote much time to reading this paper and give us much advice.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhao, F., Yang, W., Han, W. (2025). Enhancing Student Classroom Behavior Detection Using Improved SlowFast. In: Cai, Z., Takabi, D., Guo, S., Zou, Y. (eds) Wireless Artificial Intelligent Computing Systems and Applications. WASA 2024. Lecture Notes in Computer Science, vol 14997. Springer, Cham. https://doi.org/10.1007/978-3-031-71464-1_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-71464-1_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-71463-4
Online ISBN: 978-3-031-71464-1
eBook Packages: Computer ScienceComputer Science (R0)