Abstract
For multi-object behaviour recognition in classroom scenes, crowded objects have heavy occlusion, invisible keypoints, scale variation, which directly overwhelms the recognition performance. Due to the dense student objects and similar student behaviours, multi-object behaviour recognition brings great challenges. Therefore, we proposed multi-object behaviour recognition based on object detection cascaded image classification. Specifically, object detection extracts student objects, followed by Vision Transformer (ViT) classification of student behaviour. To ensure the accuracy of behaviour recognition, it is first necessary to improve the detection performance of object detection. This paper proposes the Shallow Auxiliary Module for object detection to assist the backbone network in extracting hybrid multi-scale feature information. The multi-scale and multi-channel feature information is fused to alleviate object overlap and scale variation. We propose a Scale Assignment Fusion Mechanism that non-heuristically guides objects to learn the optimal feature layer. Furthermore, the Anchor-free Dynamic Label Assignment can suppress the prediction of low-quality bounding boxes, stabling training and improving detection performance. The proposed student object detector achieves the state-of-the-art mAP\(^{50}\) of 88.03 and AP\(_l\) of 57.64, outperforming state-of-the-art object detection methods. Our multi-object behaviour recognition method achieves the recognition of four behaviour classes, which is significantly better than the results of other comparison methods.
Similar content being viewed by others
Data Availability and Access
Data will be made available on request.
References
Chen Z, Liang M, Xue Z, Yu W (2023) Stran: Student expression recognition based on spatio-temporal residual attention network in classroom teaching videos. Appl Intell pp 1–20
Zhao Y, Luo Z, Quan C, Liu D, Wang G (2020) Cluster-wise learning network for multi-person pose estimation. Pattern Recognition 98:107074
Liu C, Tian Y, Chen Z, Jiao J, Ye Q (2021) Adaptive linear span network for object skeleton detection. IEEE Transactions on Image Processing 30:5096–5108
Wu Q, Wu Y, Zhang Y, Zhang L (2022) A local-global estimator based on large kernel cnn and transformer for human pose estimation and running pose measurement. IEEE Transactions on Instrumentation and Measurement 71:1–12
Lin F-C, Ngo H-H, Dow C-R, Lam K-H, Le HL (2021) Student behavior recognition system for the classroom environment based on skeleton pose estimation and person detection. Sensors 21(e16):5314
Zhang Y, Guan S, Xu C, Liu H (2021) Based on spatio-temporal graph convolution networks with residual connection for intelligence behavior recognition. Int J Electr Eng Educ
Tang L, Gao C, Chen X, Zhao Y (2019) Pose detection in complex classroom environment based on improved faster r-cnn. IET Image Processing 13(e3):451–457
Gao C, Ye S, Tian H, Yan Y (2021) Multi-scale single-stage pose detection with adaptive sample training in the classroom scene. Knowledge-Based Systems 222:107008
Tang L, Xie T, Yang Y, Wang H (2022) Classroom behavior detection based on improved yolov5 algorithm combining multi-scale feature fusion and attention mechanism. Applied Sciences 12(e13):6790
Zhao J, Zhu H (2023) Cbph-net: A small object detector for behavior recognition in classroom scenarios. IEEE Trans Instrum Meas
Jocher G, Stoken A, Borovec J, Christopher S, Laughing LC (2021) ultralytics/yolov5: v4. 0-nn. silu () activations, weights & biases logging, pytorch hub integration. Zenodo
Han K, Wang Y, Chen H, Chen X, Guo J, Liu Z, Tang Y, Xiao A, Xu C, Xu Y et al (2022) A survey on vision transformer. IEEE transactions on pattern analysis and machine intelligence 45(e1):87–110
dos Reis ES, Seewald LA, Antunes RS, Rodrigues VF, da Rosa Righi R, da Costa CA, da Silveira Jr LG, Eskofier B, Maier A, Horz T et al (2021) Monocular multi-person pose estimation: A survey. Pattern Recognition 118:108046
Huang W, Li N, Qiu Z, Jiang N, Wu B, Liu B (2020) An automatic recognition method for students’ classroom behaviors based on image processing. Traitement du Signal 37(3)
Chen Y, Xie X, Yin W, Li B, Li F (2023) Structure guided network for human pose estimation. Applied Intelligence, pp 1–15
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 5693–5703
Cheng B, Xiao B, Wang J, Shi H, Huang TS, Zhang L (2020) Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5386–5395
Benzine A, Luvison B, Pham QC, Achard C (2021) Single-shot 3d multi-person pose estimation in complex images. Pattern Recognition 112:107534
Zhao L, Wang N, Gong C, Yang J, Gao X (2021) Estimating human pose efficiently by parallel pyramid networks. IEEE Transactions on Image Processing 30:6785–6800
Dang M, Liu G, Xu Q, Li K, Wang D, He L (2024) Multi-object behavior recognition based on object detection for dense crowds. Expert Syst Appl p 123397
Gang Z, Wenjuan Z, Biling H, Jie C, Hui H, Qing X (2021) A simple teacher behavior recognition method for massive teaching videos based on teacher set. Applied Intelligence 51:8828–8849
Tang Z, Yang J, Pei Z, Song X (2021) Coordinate-based anchor-free module for object detection. Applied intelligence 51(e12):9066–9080
Cheng G, Wang J, Li K, Xie X, Lang C, Yao Y, Han J (2022) Anchor-free oriented proposal generator for object detection. IEEE Transactions on Geoscience and Remote Sensing 60:1–11
Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: An evaluation of the state of the art. IEEE transactions on pattern analysis and machine intelligence 34(e4):743–761
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Cai Z, Vasconcelos N (2018) Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162
Tian Z, Shen C, Chen H, He T (2020) Fcos: A simple and strong anchor-free object detector. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(e4):1922–1933
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Kong T, Sun F, Liu H, Jiang Y, Li L, Shi J (2020) Foveabox: Beyound anchor-based object detection. IEEE Transactions on Image Processing 29:7389–7398
Su H, He Y, Jiang R, Zhang J, Zou W, Fan B (2022) Dsla: Dynamic smooth label assignment for efficient anchor-free object detection. Pattern Recognition 131:108868
Wang C-Y, Bochkovskiy A, Liao H-YM (2023) Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7464–7475
Yuan L, Wang Z, Chen H, Tian H, Ren Y, Wang X, Li P (2022) Multi-category fruit image classification based on interactive segmentation. In: 2022 IEEE 4th Eurasia conference on IOT, communication and engineering (ECICE), IEEE, pp 346–349
Chang D, Ding Y, Xie J, Bhunia AK, Li X, Ma Z, Wu M, Guo J, Song Y-Z (2020) The devil is in the channels: Mutual-channel loss for fine-grained image classification. IEEE Transactions on Image Processing 29:4683–4695
Lanchantin, J., Wang, T., Ordonez, V., Qi, Y.: General multi-label image classification with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16478–16488 (2021)
Wang F, Kong T, Zhang R, Liu H, Li H (2023) Self-supervised learning by estimating twin class distribution. IEEE Trans Image Process
Novack Z, McAuley J, Lipton ZC, Garg S (2023) Chils: Zero-shot image classification with hierarchical label sets. In: International conference on machine learning, PMLR, pp 26342–26362
Wang L, He K, Feng X, Ma X (2022) Multilayer feature fusion with parallel convolutional block for fine-grained image classification. Applied Intelligence 52(e3):2872–2883
Wang Z, Wang S, Zhang P, Li H, Zhong W, Li J (2019) Weakly supervised fine-grained image classification via correlation-guided discriminative learning. In: Proceedings of the 27th ACM international conference on multimedia, pp 1851–1860
Mazzia V, Angarano S, Salvetti F, Angelini F, Chiaberge M (2022) Action transformer: A self-attention model for short-time pose-based human action recognition. Pattern Recognition 124:108487
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, Springer, pp 740–755
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. International journal of computer vision 88(e2):303–338
Singh S, Ahuja U, Kumar M, Kumar K, Sachdeva M (2021) Face mask detection using yolov3 and faster r-cnn models: Covid-19 environment. Multimedia Tools and Applications 80:19753–19768
Gai R, Chen N, Yuan H (2023) A detection algorithm for cherry fruits based on the improved yolo-v4 model. Neural Computing and Applications 35(e19):13895–13906
Shi Y, Wang N, Guo X (2023) Yolov: making still image object detectors great at video object detection. In: Proceedings of the AAAI conference on artificial intelligence, vol 37, pp 2254–2262
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Li Z, Liu F, Yang W, Peng S, Zhou J (2021) A survey of convolutional neural networks: analysis, applications, and prospects. IEEE Trans Neural Netw Learn Syst
Acknowledgements
This work was supported in part by the Key Research and Development Program of Shaanxi Province, China (Program No. 2023-YBGY-205), the Natural Science Basic Research Program of Shaanxi, China (Program No. 2024JC-ZDXM-40), the Key Research and Development Program of Shaanxi, China (Program No. 2024GX-YBXM-039) and the Innovation Capability Support Program of Shaanxi, China (No. 2023-CX-TD-08).
Author information
Authors and Affiliations
Contributions
Min Dang: Conceptualization, Methodology, Writing - Original Draft. Gang Liu: Investigation, Project administration, Formal analysis. Hao Li: Validation, Writing - Review & Editing. Qijie Xu: Resources, Data Curation. Xu Wang: Resources, Visualization. Rong Pan: Writing - Review & Editing.
Corresponding author
Ethics declarations
Ethical and Informed Consent for Data Used
Since the data used in this study has the privacy of students, we guaranteed to Classroom Video Management Center of Xidian University that the data will only be used for our research and will not be disclosed to the public. To protect the rights and interests of students, we guaranteed that the original data will not be shared. The Classroom Video Management Center of Xidian University provided written informed consent for this study.
Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dang, M., Liu, G., Li, H. et al. Multi-object behaviour recognition based on object detection cascaded image classification in classroom scenes. Appl Intell 54, 4935–4951 (2024). https://doi.org/10.1007/s10489-024-05409-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05409-x