Skip to main content
Log in

Surveillance video analysis for student action recognition and localization inside computer laboratories of a smart campus

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In the era of smart campus, unobtrusive methods for students’ monitoring is a challenging task. The monitoring system must have the ability to recognize and detect the actions performed by the students. Recently many deep neural network based approaches have been proposed to automate Human Action Recognition (HAR) in different domains, but these are not explored in learning environments. HAR can be used in classrooms, laboratories, and libraries to make the teaching-learning process more effective. To make the learning process more effective in computer laboratories, in this study, we proposed a system for recognition and localization of student actions from still images extracted from (Closed Circuit Television) CCTV videos. The proposed method uses (You Only Look Once) YOLOv3, state-of-the-art real-time object detection technology, for localization, recognition of students’ actions. Further, the image template matching method is used to decrease the number of image frames and thus processing the video quickly. As actions performed by the humans are domain specific and since no standard dataset is available for students’ action recognition in smart computer laboratories, thus we created the STUDENT ACTION dataset using the image frames obtained from the CCTV cameras placed in the computer laboratory of a university campus. The proposed method recognizes various actions performed by students in different locations within an image frame. It shows excellent performance in identifying the actions with more samples compared to actions with fewer samples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Ashwin T, Guddeti RMR (2020) Automatic detection of students’ affective states in classroom environment using hybrid convolutional neural networks. Educ Inf Technol 25(2):1387–1415

    Article  Google Scholar 

  2. Ashwin TS, Guddeti RMR (2019) Unobtrusive behavioral analysis of students in classroom environment using non-verbal cues. IEEE Access 7:150,693–150,709

    Article  Google Scholar 

  3. Bian C, Zhang Y, Yang F, Bi W, Lu W (2019) Spontaneous facial expression database for academic emotion inference in online learning. IET Comput Vis 13(3):329–337

    Article  Google Scholar 

  4. Bosch N, D’Mello S (2019) Automatic detection of mind wandering from video in the lab and in the classroom. IEEE Trans Affect Comput 1–1. https://doi.org/10.1109/TAFFC.2019.2908837

  5. Brownlee J How and when to use roc curves and precision-recall curves for classification in python,. https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-classification-in-python/. Accessed: 14 05 2019

  6. Candra Kirana K, Wibawanto S, Wahyu Herwanto H (2018) Facial emotion recognition based on viola-jones algorithm in the learning environment. In: 2018 International seminar on application for technology of information and communication, pp 406–410

  7. Cartucho: map (mean average precision), https://github.com/Cartucho/mAP. Accessed: 12-06-2020

  8. Cartucho J, Ventura R, Veloso M (2018) Robust object recognition through symbiotic deep learning in mobile robots. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp 2336–2341

  9. Castañón G, Elgharib M, Saligrama V, Jodoin P (2016) Retrieval in long-surveillance videos using user-described motion and object attributes. IEEE Trans Circuits Sys Video Technol 26(12):2313–2327

    Article  Google Scholar 

  10. Chamba L, Aguilar J (2016) Design of an augmented reality component from the theory of agents for smart classrooms. IEEE Lat Am Trans 14(8):3826–3837

    Article  Google Scholar 

  11. Chaudhary S, Murala S (2019) Depth-based end-to-end deep network for human action recognition. IET Comput Vis 13(1):15–22

    Article  Google Scholar 

  12. Cheng H, Liu Z, Zhao Y, Ye G, Sun X (2014) Real world activity summary for senior home monitoring. Multimed Tools Appl 70(1):177–197. https://doi.org/10.1007/s11042-012-1162-5

    Article  Google Scholar 

  13. Chintalapati S, Raghunadh MV (2013) Automated attendance management system based on face recognition algorithms. In: 2013 IEEE International conference on computational intelligence and computing research, pp 1–5

  14. Chou K, Prasad M, Wu D, Sharma N, Li D, Lin Y, Blumenstein M, Lin W, Lin C (2018) Robust feature-based automated multi-view human action recognition system. IEEE Access 6:15,283–15,296

    Article  Google Scholar 

  15. Cohen J (1960) A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1):37–46. https://doi.org/10.1177/001316446002000104

    Article  Google Scholar 

  16. Conte D, Foggia P, Percannella G, Tufano F, Vento M (2010) A method for counting moving people in video surveillance videos. EURASIP Journal on Advances in Signal Processing 2010(1):231–240

    Google Scholar 

  17. Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, ICML ’06, pp 233–240

  18. D’Mello S, Picard RW, Graesser A (2007) Toward an affect-sensitive autotutor. IEEE Intell Syst 22(4):53–61

    Article  Google Scholar 

  19. Du S, Meng F, Gao B (2016) Research on the application system of smart campus in the context of smart city. In: 2016 8th International Conference on Information Technology in Medicine and Education (ITME), pp 714–718

  20. Eweiwi A, Cheema MS, Bauckhage C (2015) Action recognition in still images by learning spatial interest regions from videos. Pattern Recogn Lett 51 (C):8–15

    Article  Google Scholar 

  21. Ghazal S, Khan US (2018) Human posture classification using skeleton information. In: 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), pp 1–4

  22. Girshick RB (2015) Fast r-cnn. arXiv:1504.08083

  23. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge. http://www.deeplearningbook.org

    MATH  Google Scholar 

  24. Gu J (2019) Bbox-label-tool. https://github.com/jxgu1016/BBox-Label-Tool-Multi-Class. Accessed 02 Aug 2019

  25. Gupta SK, Ashwin T, Reddy Guddeti RM (2018) Cvucams: Computer vision based unobtrusive classroom attendance management system. In: 2018 IEEE 18th International Conference on Advanced Learning Technologies (ICALT), pp 101–102

  26. Huang M, Su SZ, Zhang HB, Cai GR, Gong D, Cao D, Li SZ (2018) Multifeature selection for 3d human action recognition. ACM Trans Multimedia Comput Commun Appl 14(2):45:1–45:18

    Article  Google Scholar 

  27. Jo H, Na Y, Song J (2017) Data augmentation using synthesized images for object detection. In: 2017 17th International Conference on Control, Automation and Systems (ICCAS), pp 1035–1038

  28. Kamel A, Sheng B, Yang P, Li P, Shen R, Feng DD (2018) Deep convolutional neural networks for human action recognition using depth maps and postures. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 1–14

  29. Kim Y, Soyata T, Behnagh RF (2018) Towards emotionally aware ai smart classroom: Current issues and directions for engineering and education. IEEE Access 6:5308–5331

    Article  Google Scholar 

  30. Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174. http://www.jstor.org/stable/2529310

    Article  Google Scholar 

  31. Li R, Liu Z, Tan J (2018) Reassessing hierarchical representation for action recognition in still images. IEEE Access 6:61,386–61,400

    Article  Google Scholar 

  32. Li W, Nie W, Su Y (2018) Human action recognition based on selected spatio-temporal features via bidirectional lstm. IEEE Access 6:44,211–44,220

    Article  Google Scholar 

  33. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision – ECCV 2016. Springer International Publishing, Cham, pp 21–37

  34. Monkaresi H, Bosch N, Calvo RA, D’Mello SK (2017) Automated detection of engagement using video-based estimation of facial expressions and heart rate. IEEE Trans Affect Comput 8(1):15–28. https://doi.org/10.1109/TAFFC.2016.2515084

    Article  Google Scholar 

  35. OpenCV -Object Detection: Opencv -object detection, https://docs.opencv.org/3.4.3/df/dfb/group__imgproc__object.html. Accessed: 12-04-2019

  36. Picard RW (2000) Affective computing. MIT Press, Cambridge

    Book  Google Scholar 

  37. Popoola OP, Wang K (2012) Video-based abnormal human behavior recognition—a review. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42(6):865–878

    Article  Google Scholar 

  38. Poulisse GJ, Patsis Y, Moens MF (2014) Unsupervised scene detection and commentator building using multi-modal chains. Multimedia Tools and Applications 70(1):159–175. https://doi.org/10.1007/s11042-012-1086-0

    Article  Google Scholar 

  39. qqwweee: keras-yolo3. https://github.com/qqwweee/keras-yolo3. Accessed: 05-01-2019

  40. Ramezani M, Yaghmaee F (2016) A review on human action analysis in videos for retrieval applications. Artif Intell Rev 46(4):485–514

    Article  Google Scholar 

  41. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 779–788

  42. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767

  43. Sargano AB, Wang X, Angelov P, Habib Z (2017) Human action recognition using transfer learning with deep representations. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp 463–469

  44. Sari MW, Ciptadi PW, Hardyanto RH (2017) Study of smart campus development using internet of things technology. IAES International Conference on Electrical Engineering, Computer Science and Informatics IOP Conf Series: Materials Science and Engineering 190(2017):012032. https://doi.org/10.1088/1757-899X/190/1/012032

    Article  Google Scholar 

  45. Sivabalan K, Ramaraj E (2020) Shortwave infrared-based phenology index method for satellite image land cover classification. In: Das K, Bansal J, Deep K, Nagar A, Pathipooranam P, Naidu R (eds) Soft computing for problem solving advances in intelligent systems and computing. Springer 1057. https://doi.org/10.1007/978-981-15-0184-5_75

  46. Szczuko P (2014) Genetic programming extension to apf-based monocular human body pose estimation. Multimedia Tools and Applications 68. https://doi.org/10.1007/s11042-012-1147-4

  47. Szczuko P (2019) Deep neural networks for human pose estimation from a very low resolution depth image. Multimedia Tools and Applications 1–21. https://doi.org/10.1007/s11042-019-7433-7

  48. Wang C, Li X, Wang A, Zhou X (2017) A classroom scheduling service for smart classes. IEEE Trans Serv Comput 10(2):155–164

    Article  Google Scholar 

  49. Whitehill J, Serpell Z, Lin Y, Foster A, Movellan JR (2014) The faces of engagement: Automatic recognition of student engagementfrom facial expressions. IEEE Trans Affect Comput 5(1):86–98. https://doi.org/10.1109/TAFFC.2014.2316163

    Article  Google Scholar 

  50. Wong SC, Gatt A, Stamatescu V, McDonnell MD (2016) Understanding data augmentation for classification: when to warp? arXiv:1609.08764

  51. Zhang Y, Cheng L, Wu J, Cai J, Do MN, Lu J (2016) Action recognition in still images with minimum annotation efforts. IEEE Trans Image Process 25(11):5479–5490

    Article  MathSciNet  Google Scholar 

  52. Zheng Y, Zhang Y, Li X, Liu B (2012) Action recognition in still images using a combination of human pose and context information. In: 2012 19th IEEE International Conference on Image Processing, pp 785–788

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Rashmi.

Ethics declarations

Authors have obtained all ethical approvals from the Institutional Ethics Committee (IEC) of National Institute of Technology Karnataka Surathkal, Mangalore, India and a written consent was also obtained from the human subjects.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rashmi, M., Ashwin, T.S. & Guddeti, R.M.R. Surveillance video analysis for student action recognition and localization inside computer laboratories of a smart campus. Multimed Tools Appl 80, 2907–2929 (2021). https://doi.org/10.1007/s11042-020-09741-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09741-5

Keywords

Navigation