Skip to main content
Log in

Multi-object behaviour recognition based on object detection cascaded image classification in classroom scenes

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

For multi-object behaviour recognition in classroom scenes, crowded objects have heavy occlusion, invisible keypoints, scale variation, which directly overwhelms the recognition performance. Due to the dense student objects and similar student behaviours, multi-object behaviour recognition brings great challenges. Therefore, we proposed multi-object behaviour recognition based on object detection cascaded image classification. Specifically, object detection extracts student objects, followed by Vision Transformer (ViT) classification of student behaviour. To ensure the accuracy of behaviour recognition, it is first necessary to improve the detection performance of object detection. This paper proposes the Shallow Auxiliary Module for object detection to assist the backbone network in extracting hybrid multi-scale feature information. The multi-scale and multi-channel feature information is fused to alleviate object overlap and scale variation. We propose a Scale Assignment Fusion Mechanism that non-heuristically guides objects to learn the optimal feature layer. Furthermore, the Anchor-free Dynamic Label Assignment can suppress the prediction of low-quality bounding boxes, stabling training and improving detection performance. The proposed student object detector achieves the state-of-the-art mAP\(^{50}\) of 88.03 and AP\(_l\) of 57.64, outperforming state-of-the-art object detection methods. Our multi-object behaviour recognition method achieves the recognition of four behaviour classes, which is significantly better than the results of other comparison methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability and Access

Data will be made available on request.

References

  1. Chen Z, Liang M, Xue Z, Yu W (2023) Stran: Student expression recognition based on spatio-temporal residual attention network in classroom teaching videos. Appl Intell pp 1–20

  2. Zhao Y, Luo Z, Quan C, Liu D, Wang G (2020) Cluster-wise learning network for multi-person pose estimation. Pattern Recognition 98:107074

    Article  Google Scholar 

  3. Liu C, Tian Y, Chen Z, Jiao J, Ye Q (2021) Adaptive linear span network for object skeleton detection. IEEE Transactions on Image Processing 30:5096–5108

    Article  Google Scholar 

  4. Wu Q, Wu Y, Zhang Y, Zhang L (2022) A local-global estimator based on large kernel cnn and transformer for human pose estimation and running pose measurement. IEEE Transactions on Instrumentation and Measurement 71:1–12

    Google Scholar 

  5. Lin F-C, Ngo H-H, Dow C-R, Lam K-H, Le HL (2021) Student behavior recognition system for the classroom environment based on skeleton pose estimation and person detection. Sensors 21(e16):5314

    Article  Google Scholar 

  6. Zhang Y, Guan S, Xu C, Liu H (2021) Based on spatio-temporal graph convolution networks with residual connection for intelligence behavior recognition. Int J Electr Eng Educ

  7. Tang L, Gao C, Chen X, Zhao Y (2019) Pose detection in complex classroom environment based on improved faster r-cnn. IET Image Processing 13(e3):451–457

    Article  Google Scholar 

  8. Gao C, Ye S, Tian H, Yan Y (2021) Multi-scale single-stage pose detection with adaptive sample training in the classroom scene. Knowledge-Based Systems 222:107008

    Article  Google Scholar 

  9. Tang L, Xie T, Yang Y, Wang H (2022) Classroom behavior detection based on improved yolov5 algorithm combining multi-scale feature fusion and attention mechanism. Applied Sciences 12(e13):6790

    Article  Google Scholar 

  10. Zhao J, Zhu H (2023) Cbph-net: A small object detector for behavior recognition in classroom scenarios. IEEE Trans Instrum Meas

  11. Jocher G, Stoken A, Borovec J, Christopher S, Laughing LC (2021) ultralytics/yolov5: v4. 0-nn. silu () activations, weights & biases logging, pytorch hub integration. Zenodo

  12. Han K, Wang Y, Chen H, Chen X, Guo J, Liu Z, Tang Y, Xiao A, Xu C, Xu Y et al (2022) A survey on vision transformer. IEEE transactions on pattern analysis and machine intelligence 45(e1):87–110

    Google Scholar 

  13. dos Reis ES, Seewald LA, Antunes RS, Rodrigues VF, da Rosa Righi R, da Costa CA, da Silveira Jr LG, Eskofier B, Maier A, Horz T et al (2021) Monocular multi-person pose estimation: A survey. Pattern Recognition 118:108046

    Article  Google Scholar 

  14. Huang W, Li N, Qiu Z, Jiang N, Wu B, Liu B (2020) An automatic recognition method for students’ classroom behaviors based on image processing. Traitement du Signal 37(3)

  15. Chen Y, Xie X, Yin W, Li B, Li F (2023) Structure guided network for human pose estimation. Applied Intelligence, pp 1–15

  16. Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 5693–5703

  17. Cheng B, Xiao B, Wang J, Shi H, Huang TS, Zhang L (2020) Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5386–5395

  18. Benzine A, Luvison B, Pham QC, Achard C (2021) Single-shot 3d multi-person pose estimation in complex images. Pattern Recognition 112:107534

    Article  Google Scholar 

  19. Zhao L, Wang N, Gong C, Yang J, Gao X (2021) Estimating human pose efficiently by parallel pyramid networks. IEEE Transactions on Image Processing 30:6785–6800

    Article  Google Scholar 

  20. Dang M, Liu G, Xu Q, Li K, Wang D, He L (2024) Multi-object behavior recognition based on object detection for dense crowds. Expert Syst Appl p 123397

  21. Gang Z, Wenjuan Z, Biling H, Jie C, Hui H, Qing X (2021) A simple teacher behavior recognition method for massive teaching videos based on teacher set. Applied Intelligence 51:8828–8849

    Article  Google Scholar 

  22. Tang Z, Yang J, Pei Z, Song X (2021) Coordinate-based anchor-free module for object detection. Applied intelligence 51(e12):9066–9080

    Article  Google Scholar 

  23. Cheng G, Wang J, Li K, Xie X, Lang C, Yao Y, Han J (2022) Anchor-free oriented proposal generator for object detection. IEEE Transactions on Geoscience and Remote Sensing 60:1–11

    Google Scholar 

  24. Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: An evaluation of the state of the art. IEEE transactions on pattern analysis and machine intelligence 34(e4):743–761

    Google Scholar 

  25. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28

  26. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  27. Cai Z, Vasconcelos N (2018) Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162

  28. Tian Z, Shen C, Chen H, He T (2020) Fcos: A simple and strong anchor-free object detector. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(e4):1922–1933

    Google Scholar 

  29. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37

  30. Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988

  31. Kong T, Sun F, Liu H, Jiang Y, Li L, Shi J (2020) Foveabox: Beyound anchor-based object detection. IEEE Transactions on Image Processing 29:7389–7398

    Article  Google Scholar 

  32. Su H, He Y, Jiang R, Zhang J, Zou W, Fan B (2022) Dsla: Dynamic smooth label assignment for efficient anchor-free object detection. Pattern Recognition 131:108868

    Article  Google Scholar 

  33. Wang C-Y, Bochkovskiy A, Liao H-YM (2023) Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7464–7475

  34. Yuan L, Wang Z, Chen H, Tian H, Ren Y, Wang X, Li P (2022) Multi-category fruit image classification based on interactive segmentation. In: 2022 IEEE 4th Eurasia conference on IOT, communication and engineering (ECICE), IEEE, pp 346–349

  35. Chang D, Ding Y, Xie J, Bhunia AK, Li X, Ma Z, Wu M, Guo J, Song Y-Z (2020) The devil is in the channels: Mutual-channel loss for fine-grained image classification. IEEE Transactions on Image Processing 29:4683–4695

    Article  Google Scholar 

  36. Lanchantin, J., Wang, T., Ordonez, V., Qi, Y.: General multi-label image classification with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16478–16488 (2021)

  37. Wang F, Kong T, Zhang R, Liu H, Li H (2023) Self-supervised learning by estimating twin class distribution. IEEE Trans Image Process

  38. Novack Z, McAuley J, Lipton ZC, Garg S (2023) Chils: Zero-shot image classification with hierarchical label sets. In: International conference on machine learning, PMLR, pp 26342–26362

  39. Wang L, He K, Feng X, Ma X (2022) Multilayer feature fusion with parallel convolutional block for fine-grained image classification. Applied Intelligence 52(e3):2872–2883

    Article  Google Scholar 

  40. Wang Z, Wang S, Zhang P, Li H, Zhong W, Li J (2019) Weakly supervised fine-grained image classification via correlation-guided discriminative learning. In: Proceedings of the 27th ACM international conference on multimedia, pp 1851–1860

  41. Mazzia V, Angarano S, Salvetti F, Angelini F, Chiaberge M (2022) Action transformer: A self-attention model for short-time pose-based human action recognition. Pattern Recognition 124:108487

    Article  Google Scholar 

  42. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, Springer, pp 740–755

  43. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. International journal of computer vision 88(e2):303–338

    Article  Google Scholar 

  44. Singh S, Ahuja U, Kumar M, Kumar K, Sachdeva M (2021) Face mask detection using yolov3 and faster r-cnn models: Covid-19 environment. Multimedia Tools and Applications 80:19753–19768

    Article  Google Scholar 

  45. Gai R, Chen N, Yuan H (2023) A detection algorithm for cherry fruits based on the improved yolo-v4 model. Neural Computing and Applications 35(e19):13895–13906

    Article  Google Scholar 

  46. Shi Y, Wang N, Guo X (2023) Yolov: making still image object detectors great at video object detection. In: Proceedings of the AAAI conference on artificial intelligence, vol 37, pp 2254–2262

  47. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520

  48. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  49. Li Z, Liu F, Yang W, Peng S, Zhou J (2021) A survey of convolutional neural networks: analysis, applications, and prospects. IEEE Trans Neural Netw Learn Syst

Download references

Acknowledgements

This work was supported in part by the Key Research and Development Program of Shaanxi Province, China (Program No. 2023-YBGY-205), the Natural Science Basic Research Program of Shaanxi, China (Program No. 2024JC-ZDXM-40), the Key Research and Development Program of Shaanxi, China (Program No. 2024GX-YBXM-039) and the Innovation Capability Support Program of Shaanxi, China (No. 2023-CX-TD-08).

Author information

Authors and Affiliations

Authors

Contributions

Min Dang: Conceptualization, Methodology, Writing - Original Draft. Gang Liu: Investigation, Project administration, Formal analysis. Hao Li: Validation, Writing - Review & Editing. Qijie Xu: Resources, Data Curation. Xu Wang: Resources, Visualization. Rong Pan: Writing - Review & Editing.

Corresponding author

Correspondence to Gang Liu.

Ethics declarations

Ethical and Informed Consent for Data Used

Since the data used in this study has the privacy of students, we guaranteed to Classroom Video Management Center of Xidian University that the data will only be used for our research and will not be disclosed to the public. To protect the rights and interests of students, we guaranteed that the original data will not be shared. The Classroom Video Management Center of Xidian University provided written informed consent for this study.

Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dang, M., Liu, G., Li, H. et al. Multi-object behaviour recognition based on object detection cascaded image classification in classroom scenes. Appl Intell 54, 4935–4951 (2024). https://doi.org/10.1007/s10489-024-05409-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05409-x

Keywords

Navigation