Multi-object behaviour recognition based on object detection cascaded image classification in classroom scenes

Dang, Min; Liu, Gang; Li, Hao; Xu, Qijie; Wang, Xu; Pan, Rong

doi:10.1007/s10489-024-05409-x

Multi-object behaviour recognition based on object detection cascaded image classification in classroom scenes

Published: 11 April 2024

Volume 54, pages 4935–4951, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

173 Accesses
1 Altmetric
Explore all metrics

Abstract

For multi-object behaviour recognition in classroom scenes, crowded objects have heavy occlusion, invisible keypoints, scale variation, which directly overwhelms the recognition performance. Due to the dense student objects and similar student behaviours, multi-object behaviour recognition brings great challenges. Therefore, we proposed multi-object behaviour recognition based on object detection cascaded image classification. Specifically, object detection extracts student objects, followed by Vision Transformer (ViT) classification of student behaviour. To ensure the accuracy of behaviour recognition, it is first necessary to improve the detection performance of object detection. This paper proposes the Shallow Auxiliary Module for object detection to assist the backbone network in extracting hybrid multi-scale feature information. The multi-scale and multi-channel feature information is fused to alleviate object overlap and scale variation. We propose a Scale Assignment Fusion Mechanism that non-heuristically guides objects to learn the optimal feature layer. Furthermore, the Anchor-free Dynamic Label Assignment can suppress the prediction of low-quality bounding boxes, stabling training and improving detection performance. The proposed student object detector achieves the state-of-the-art mAP\(^{50}\) of 88.03 and AP\(_l\) of 57.64, outperforming state-of-the-art object detection methods. Our multi-object behaviour recognition method achieves the recognition of four behaviour classes, which is significantly better than the results of other comparison methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning behaviour recognition based on multi-object image in single viewpoint

Article 22 August 2019

Multi-object Detection Based on Deep Learning in Real Classrooms

Post-secondary classroom teaching quality evaluation using small object detection model

Article Open access 09 March 2024

Data Availability and Access

Data will be made available on request.

References

Chen Z, Liang M, Xue Z, Yu W (2023) Stran: Student expression recognition based on spatio-temporal residual attention network in classroom teaching videos. Appl Intell pp 1–20
Zhao Y, Luo Z, Quan C, Liu D, Wang G (2020) Cluster-wise learning network for multi-person pose estimation. Pattern Recognition 98:107074
Article Google Scholar
Liu C, Tian Y, Chen Z, Jiao J, Ye Q (2021) Adaptive linear span network for object skeleton detection. IEEE Transactions on Image Processing 30:5096–5108
Article Google Scholar
Wu Q, Wu Y, Zhang Y, Zhang L (2022) A local-global estimator based on large kernel cnn and transformer for human pose estimation and running pose measurement. IEEE Transactions on Instrumentation and Measurement 71:1–12
Google Scholar
Lin F-C, Ngo H-H, Dow C-R, Lam K-H, Le HL (2021) Student behavior recognition system for the classroom environment based on skeleton pose estimation and person detection. Sensors 21(e16):5314
Article Google Scholar
Zhang Y, Guan S, Xu C, Liu H (2021) Based on spatio-temporal graph convolution networks with residual connection for intelligence behavior recognition. Int J Electr Eng Educ
Tang L, Gao C, Chen X, Zhao Y (2019) Pose detection in complex classroom environment based on improved faster r-cnn. IET Image Processing 13(e3):451–457
Article Google Scholar
Gao C, Ye S, Tian H, Yan Y (2021) Multi-scale single-stage pose detection with adaptive sample training in the classroom scene. Knowledge-Based Systems 222:107008
Article Google Scholar
Tang L, Xie T, Yang Y, Wang H (2022) Classroom behavior detection based on improved yolov5 algorithm combining multi-scale feature fusion and attention mechanism. Applied Sciences 12(e13):6790
Article Google Scholar
Zhao J, Zhu H (2023) Cbph-net: A small object detector for behavior recognition in classroom scenarios. IEEE Trans Instrum Meas
Jocher G, Stoken A, Borovec J, Christopher S, Laughing LC (2021) ultralytics/yolov5: v4. 0-nn. silu () activations, weights & biases logging, pytorch hub integration. Zenodo
Han K, Wang Y, Chen H, Chen X, Guo J, Liu Z, Tang Y, Xiao A, Xu C, Xu Y et al (2022) A survey on vision transformer. IEEE transactions on pattern analysis and machine intelligence 45(e1):87–110
Google Scholar
dos Reis ES, Seewald LA, Antunes RS, Rodrigues VF, da Rosa Righi R, da Costa CA, da Silveira Jr LG, Eskofier B, Maier A, Horz T et al (2021) Monocular multi-person pose estimation: A survey. Pattern Recognition 118:108046
Article Google Scholar
Huang W, Li N, Qiu Z, Jiang N, Wu B, Liu B (2020) An automatic recognition method for students’ classroom behaviors based on image processing. Traitement du Signal 37(3)
Chen Y, Xie X, Yin W, Li B, Li F (2023) Structure guided network for human pose estimation. Applied Intelligence, pp 1–15
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 5693–5703
Cheng B, Xiao B, Wang J, Shi H, Huang TS, Zhang L (2020) Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5386–5395
Benzine A, Luvison B, Pham QC, Achard C (2021) Single-shot 3d multi-person pose estimation in complex images. Pattern Recognition 112:107534
Article Google Scholar
Zhao L, Wang N, Gong C, Yang J, Gao X (2021) Estimating human pose efficiently by parallel pyramid networks. IEEE Transactions on Image Processing 30:6785–6800
Article Google Scholar
Dang M, Liu G, Xu Q, Li K, Wang D, He L (2024) Multi-object behavior recognition based on object detection for dense crowds. Expert Syst Appl p 123397
Gang Z, Wenjuan Z, Biling H, Jie C, Hui H, Qing X (2021) A simple teacher behavior recognition method for massive teaching videos based on teacher set. Applied Intelligence 51:8828–8849
Article Google Scholar
Tang Z, Yang J, Pei Z, Song X (2021) Coordinate-based anchor-free module for object detection. Applied intelligence 51(e12):9066–9080
Article Google Scholar
Cheng G, Wang J, Li K, Xie X, Lang C, Yao Y, Han J (2022) Anchor-free oriented proposal generator for object detection. IEEE Transactions on Geoscience and Remote Sensing 60:1–11
Google Scholar
Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: An evaluation of the state of the art. IEEE transactions on pattern analysis and machine intelligence 34(e4):743–761
Google Scholar
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 28
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Cai Z, Vasconcelos N (2018) Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162
Tian Z, Shen C, Chen H, He T (2020) Fcos: A simple and strong anchor-free object detector. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(e4):1922–1933
Google Scholar
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Kong T, Sun F, Liu H, Jiang Y, Li L, Shi J (2020) Foveabox: Beyound anchor-based object detection. IEEE Transactions on Image Processing 29:7389–7398
Article Google Scholar
Su H, He Y, Jiang R, Zhang J, Zou W, Fan B (2022) Dsla: Dynamic smooth label assignment for efficient anchor-free object detection. Pattern Recognition 131:108868
Article Google Scholar
Wang C-Y, Bochkovskiy A, Liao H-YM (2023) Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7464–7475
Yuan L, Wang Z, Chen H, Tian H, Ren Y, Wang X, Li P (2022) Multi-category fruit image classification based on interactive segmentation. In: 2022 IEEE 4th Eurasia conference on IOT, communication and engineering (ECICE), IEEE, pp 346–349
Chang D, Ding Y, Xie J, Bhunia AK, Li X, Ma Z, Wu M, Guo J, Song Y-Z (2020) The devil is in the channels: Mutual-channel loss for fine-grained image classification. IEEE Transactions on Image Processing 29:4683–4695
Article Google Scholar
Lanchantin, J., Wang, T., Ordonez, V., Qi, Y.: General multi-label image classification with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16478–16488 (2021)
Wang F, Kong T, Zhang R, Liu H, Li H (2023) Self-supervised learning by estimating twin class distribution. IEEE Trans Image Process
Novack Z, McAuley J, Lipton ZC, Garg S (2023) Chils: Zero-shot image classification with hierarchical label sets. In: International conference on machine learning, PMLR, pp 26342–26362
Wang L, He K, Feng X, Ma X (2022) Multilayer feature fusion with parallel convolutional block for fine-grained image classification. Applied Intelligence 52(e3):2872–2883
Article Google Scholar
Wang Z, Wang S, Zhang P, Li H, Zhong W, Li J (2019) Weakly supervised fine-grained image classification via correlation-guided discriminative learning. In: Proceedings of the 27th ACM international conference on multimedia, pp 1851–1860
Mazzia V, Angarano S, Salvetti F, Angelini F, Chiaberge M (2022) Action transformer: A self-attention model for short-time pose-based human action recognition. Pattern Recognition 124:108487
Article Google Scholar
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, Springer, pp 740–755
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. International journal of computer vision 88(e2):303–338
Article Google Scholar
Singh S, Ahuja U, Kumar M, Kumar K, Sachdeva M (2021) Face mask detection using yolov3 and faster r-cnn models: Covid-19 environment. Multimedia Tools and Applications 80:19753–19768
Article Google Scholar
Gai R, Chen N, Yuan H (2023) A detection algorithm for cherry fruits based on the improved yolo-v4 model. Neural Computing and Applications 35(e19):13895–13906
Article Google Scholar
Shi Y, Wang N, Guo X (2023) Yolov: making still image object detectors great at video object detection. In: Proceedings of the AAAI conference on artificial intelligence, vol 37, pp 2254–2262
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Li Z, Liu F, Yang W, Peng S, Zhou J (2021) A survey of convolutional neural networks: analysis, applications, and prospects. IEEE Trans Neural Netw Learn Syst

Download references

Acknowledgements

This work was supported in part by the Key Research and Development Program of Shaanxi Province, China (Program No. 2023-YBGY-205), the Natural Science Basic Research Program of Shaanxi, China (Program No. 2024JC-ZDXM-40), the Key Research and Development Program of Shaanxi, China (Program No. 2024GX-YBXM-039) and the Innovation Capability Support Program of Shaanxi, China (No. 2023-CX-TD-08).

Author information

Authors and Affiliations

School of Computer Science and Technology, Xidian University, No. 266, Xinglong Section, Xifeng Road, Xi’an, 710126, Shaanxi, China
Min Dang, Gang Liu, Hao Li, Qijie Xu, Xu Wang & Rong Pan
Guangzhou Institute of Technology, Xidian University, No. 83, Zhiming Road, Xinlong Town, Huangpu District, Guangzhou, 510555, Guangdong, China
Gang Liu
Key Laboratory of Smart Human-Computer Interaction and Wearable Technology of Shaanxi Province, No. 266, Xinglong Section, Xifeng Road, Xi’an, 710126, Shaanxi, China
Min Dang, Gang Liu, Hao Li, Qijie Xu, Xu Wang & Rong Pan

Authors

Min Dang
View author publications
You can also search for this author in PubMed Google Scholar
Gang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hao Li
View author publications
You can also search for this author in PubMed Google Scholar
Qijie Xu
View author publications
You can also search for this author in PubMed Google Scholar
Xu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Rong Pan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Min Dang: Conceptualization, Methodology, Writing - Original Draft. Gang Liu: Investigation, Project administration, Formal analysis. Hao Li: Validation, Writing - Review & Editing. Qijie Xu: Resources, Data Curation. Xu Wang: Resources, Visualization. Rong Pan: Writing - Review & Editing.

Corresponding author

Correspondence to Gang Liu.

Ethics declarations

Ethical and Informed Consent for Data Used

Since the data used in this study has the privacy of students, we guaranteed to Classroom Video Management Center of Xidian University that the data will only be used for our research and will not be disclosed to the public. To protect the rights and interests of students, we guaranteed that the original data will not be shared. The Classroom Video Management Center of Xidian University provided written informed consent for this study.

Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Dang, M., Liu, G., Li, H. et al. Multi-object behaviour recognition based on object detection cascaded image classification in classroom scenes. Appl Intell 54, 4935–4951 (2024). https://doi.org/10.1007/s10489-024-05409-x

Download citation

Accepted: 19 March 2024
Published: 11 April 2024
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10489-024-05409-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-object behaviour recognition based on object detection cascaded image classification in classroom scenes

Abstract

Access this article

Similar content being viewed by others

Learning behaviour recognition based on multi-object image in single viewpoint

Multi-object Detection Based on Deep Learning in Real Classrooms

Post-secondary classroom teaching quality evaluation using small object detection model

Data Availability and Access

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical and Informed Consent for Data Used

Competing Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-object behaviour recognition based on object detection cascaded image classification in classroom scenes

Abstract

Access this article

Similar content being viewed by others

Learning behaviour recognition based on multi-object image in single viewpoint

Multi-object Detection Based on Deep Learning in Real Classrooms

Post-secondary classroom teaching quality evaluation using small object detection model

Data Availability and Access

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical and Informed Consent for Data Used

Competing Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation