Abstract
The assessment of students’ classroom behavior is an important part of classroom teaching evaluation. However, teachers cannot timely, objectively and accurately evaluate the listening status of each student in the class. We offer a multitask classroom behavior recognition method that combines human pose estimation and object detection. First, the target detector extracts the individual region from the keyframe as the network’s input. Then, the multitask heatmap network (MTHN) module extracts the intermediate heatmap of multiscale feature association. The attitude estimation and target detection tasks are constructed by mapping relations to obtain the keypoints and object position information. Finally, the keypoints behavior vector and the metric vector are used to model the behavior, and a classroom behavior detection algorithm based on the fully connected network is designed. Additionally, we created a classroom dataset with pose estimation, objects, and behavior labels. Meanwhile, transfer learning is used to solve the problem of insufficient sample size. After several experiments, we show that the detection accuracy of the proposed multitask learning-based student behavior recognition algorithm reaches more than 90%.
References
Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934
Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162
Cao Z, Simon T, Wei SE et al (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7291– 7299
Carreira J, Zisserman A (2017) Quo vadis, action recognition a new model and the kinetics dataset. In: proceedings of the IEEE conference on computer vision and pattern recognition. pp 6299–6308
Chen Y, Wang Z, Peng Y et al (2018) Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7103– 7112
Cheng B, Xiao B, Wang J et al (2020) Higherhrnet: scale-aware representation learning for bottom-up human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5386–5395
COCO: COCO Leader Board. http://cocodataset.org. Accessed 14 June 2021
Feichtenhofer C (2020) X3d: expanding architectures for efficient video recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 203–213
Feichtenhofer C, Fan H, Malik J et al (2019) Slowfast networks for video recognition. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6202–6211
Fu R, Wu T, Luo Z et al (2019) Learning behavior analysis in classroom based on deep learning. In: 2019 tenth international conference on intelligent control and information processing (ICICIP). IEEE, pp 206–212
Ge Z, Liu S, Wang F et al (2021) Yolox: exceeding yolo series in 2021. arXiv:2107.08430
Huang W, Li N, Qiu Z et al (2020) An automatic recognition method for students’ classroom behaviors based on image processing. Traitement du Signal 37(3)
Kaiming H, Gkioxari G, Dollár P (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Kreiss S, Bertoni L, Alahi A (2019) Pifpaf: composite fields for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11977–11986
Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp 734–750
Li Y, Li K, Wang X (2020) Recognizing actions in images by fusing multiple body structure cues. Pattern Recogn 104:107341
Lv X, Zhang W (2021) Student action recognition and early warning machine based on online class. In: 2021 IEEE 3rd international conference on frontiers technology of information and computer (ICFTIC). IEEE, pp 154–157
Mohammadi S, Majelan SG, Shokouhi SB (2019) Ensembles of deep neural networks for action recognition in still images. In: 2019 9th international conference on computer and knowledge engineering (ICCKE). IEEE, pp 315–318
Pei J, Shan P (2019) A micro-expression recognition algorithm for students in classroom learning based on convolutional neural network. Traitement du Signal 36(6)
Pise A, Vadapalli H, Sanders I (2020) Facial emotion recognition using temporal relational network: an application to E-learning. Multimed Tools Appl:1–21
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv:1804.02767
Redmon J, Divvala S, Girshick R et al (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779– 788
Ren S, He K, Girshick R et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Advances Neural Inf Process Syst 28
Su K, Yu D, Xu Z et al (2019) Multi-person pose estimation with enhanced channel-wise and spatial information. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5674–5682
Sun K, Xiao B, Liu D et al (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5693–5703
Wei SE, Ramakrishna V, Kanade T et al (2016) Convolutional pose machines. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4724–4732
Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 466–481
Yan S, Smith JS, Lu W et al (2017) Multibranch attention networks for action recognition in still images. IEEE Trans Cogn Dev Syst 10(4):1116–1125
Yolov5 [CP/OL]. [2020-05-30]. https://github.com/ultralytics/yolov5. Accessed 8 July 2021
Zhao J, Li J, Jia J (2021) A study on posture-based teacher-student behavioral engagement pattern. Sustain Cities Soc 67:2749
Zhang YW, Wu Z, Chen XJ et al (2020) Classroom behavior recognition based on improved yolov3. In: 2020 international conference on artificial intelligence and education (ICAIE). IEEE, pp 93–97
Zheng Y, Zheng X, Lu X et al (2020) Spatial attention based visual semantic learning for action recognition in still images. Neurocomputing 413:383–396
Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv:1904.07850
Funding
This work was supported by The National Natural Science Foundation of China (Grant Number 62177012, 62001133, and 61967005). Innovation Project of GUET Graduate Education (Grant Number 2021YCXS027).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare no conflict of interest.
Additional information
Availability of data and materials
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mo, J., Zhu, R., Yuan, H. et al. Student behavior recognition based on multitask learning. Multimed Tools Appl 82, 19091–19108 (2023). https://doi.org/10.1007/s11042-022-14100-7
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-14100-7