Skip to main content

Enhancing Student Classroom Behavior Detection Using Improved SlowFast

  • Conference paper
  • First Online:
Wireless Artificial Intelligent Computing Systems and Applications (WASA 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14997))

  • 294 Accesses

Abstract

Addressing the challenge of low accuracy in detecting classroom behaviors from teaching videos, compounded by the absence of a public classroom behavior dataset, this paper presents a novel approach. We construct a dataset specific to student classroom behavior and propose a detection method based on an improved SlowFast architecture. Our method integrates several innovations to enhance performance. Firstly, we introduce a pyramid segmentation attention module to replace the 3x3 convolution in the residual network. This module establishes a dependency relationship for long-term channel attention while capturing multi-scale spatial information. Secondly, we augment the fast branch of the network with a Transformer coding module to capture more timing information, thereby improving model accuracy. Furthermore, we modify the loss function of the original model to a dynamically scaled cross-entropy loss function. This adjustment reduces the loss weight of easily trainable samples, enabling the network to prioritize challenging samples and address the issue of unbalanced positive and negative samples. Experimental results on our self-constructed classroom behavior dataset demonstrate that the improved model achieves an average precision of 89.04%, a 3.58% improvement over the original model, and effectively reduces false detections.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition (2018). https://doi.org/10.48550/arXiv.1801.07455

  2. Qu, Z., Gao, L.Y., Wang, S.Y., et al.: An improved YOLOv5 method for large objects detection with multi-scale feature cross-layer fusion network. Image and Vision Computing (2022)

    Google Scholar 

  3. Tran, D., Bourdev, L., Fergus, R., et al.: Learning spatiotemporal features with 3D conventional networks (2015). https://doi.org/10.1109/ICCV.2015.510

  4. Wang, L., Xiong, Y., Wang, Z., et al.: Temporal segment networks: towards good practices for deep action recognition. ArXiv e-prints (2016)

    Google Scholar 

  5. Feichtenhofer, C., Fan, H., Malik, J., et al.: SlowFast networks for video recognition. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019). https://doi.org/10.1109/ICCV.2019.00630

  6. Zhang, H., Zu, K., Lu, J., et al.: EPSANet: an efficient pyramid squeeze attention block on convolutional neural network (2021)

    Google Scholar 

  7. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2018). https://doi.org/10.1109/CVPR.2018.00745

  8. Zhang, Z., Wang, M.: Convolutional neural network with convolutional block attention module for finger ve- in recognition. ArXiv e-prints (2022). https://doi.org/10.48550/arXiv.2202.06673

  9. Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale (2020). https://doi.org/10.48550/arXiv.2010.11929

  10. Liu, Z., Lin, Y., Cao, Y., et al.: Swin transformer: hierarchical vision transformer using shifted windows (2021). https://doi.org/10.48550/arXiv.2103.14030

  11. Haizhong Q. I3D: an improved three-dimensional CNN model on hyperspectral remote sensing image classification. Hindawi Limited (2021). https://doi.org/10.1155/2021/5217578

  12. Tran, D., Wang, H., Torresani, L., et al.: A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City and UT, America, pp. 6450–6459 (2018)

    Google Scholar 

  13. Feichtenhofer, C.: X3d: expanding architectures for efficient video recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 203–213 (2020)

    Google Scholar 

  14. Bertasius, G., Wang, H., Torresani, L.: Is space-time attention all you need for video understanding? (2021). https://doi.org/10.48550/arXiv.2102.05095

Download references

Acknowledgments

This research is funded by National Natural Science Foundation of China (62377026) and Hubei Provincial Key Laboratory of Artificial Intelligence and smart Learning, Central China Normal University, Wuhan, Hubei, 430079, PR. China. We are also grateful to Professor Xie Wei who devote much time to reading this paper and give us much advice.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenlin Han .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhao, F., Yang, W., Han, W. (2025). Enhancing Student Classroom Behavior Detection Using Improved SlowFast. In: Cai, Z., Takabi, D., Guo, S., Zou, Y. (eds) Wireless Artificial Intelligent Computing Systems and Applications. WASA 2024. Lecture Notes in Computer Science, vol 14997. Springer, Cham. https://doi.org/10.1007/978-3-031-71464-1_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-71464-1_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-71463-4

  • Online ISBN: 978-3-031-71464-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics