Skip to main content
Log in

FEMFER: feature enhancement for multi-faces expression recognition in classroom images

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Facial expression recognition is gradually being integrated into the classroom environment for educational assessment. Most existing algorithms are based on single frontal faces and are less effective for processing multi-facial images in a real classroom environment. In particular, detecting small faces is a common and challenging problem due to low video resolution, blurred images and little feature information. To address these issues, we improved YOLOv5 with the idea of feature enhancement (FE-YOLOv5) and applied it to classroom teaching scenarios. With Resnet-34_Focal as the expression classification network, the overall framework was FEMFER. The Feature Enhancement fused more information of feature maps by the proposed upsampling (UPS) module and the Convolution-Batch normalization-Leaky ReLU (CBL) module. The UPS module reduced the network’s local perceptual field and effectively learned detailed information from the backbone. The CBL module speeded up the model convergence while increasing the nonlinearity of the features. The network with the feature enhancement method could extract and fuse features efficiently, which was more suitable for small face detection in the classroom situation and solved the problem of inaccurate recognition of small targets in the original network. Our method achieved 81.42% (+7.18%) in mAP compared with the original YOLOv5 algorithm. The FEMFER intelligently assessed positive, neutral, and negative emotions, but it was currently limited to single-modal information extraction. Further research could be carried out from the perspective of fusing multi-modal information such as gestures and voice to realize more accurate affective computing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Data availability

All relevant data are included within the article.

References

  1. Bochkovskiy A, Wang CY, Liao HYM (2020) YOLOv4: Optimal speed and accuracy of object detection. arXiv:2004.10934

  2. Bougourzi F, Dornaika F, Mokrani K et al (2020) Fusing transformed deep and shallow features (FTDS) for image-based facial expression recognition. Expert Syst Appl 156:113459. https://doi.org/10.1016/j.eswa.2020.113459

    Article  Google Scholar 

  3. Calvo RA, D'Mello SK (eds) (2011) New perspectives on affect and learning technologies, vol 3. Springer Science & Business Media

    Google Scholar 

  4. Chen R, Jin Y, Xu L (2020) A classroom student counting system based on improved context-based face detector. In: International Conference on Web Information Systems and Applications, Springer, Cham, 326–332

  5. Chen Z, Liang M, Yu W et al (2021) Intelligent teaching evaluation system integrating facial expression and behavior recognition in teaching video. In: 2021 IEEE International Conference on Big Data and Smart Computing (BigComp), IEEE, 52–59. https://doi.org/10.1109/BigComp51126.2021.00019

  6. Cheng WJ, Huang HB, Peng S et al (2021) YOLO-face: a real-time face detector. Vis Comput 37:805–813. https://doi.org/10.1007/s00371-020-01831-7

  7. Deng Z, Yang R, Lan R et al (2020) SE-IYOLOV3: An accurate small scale face detector for outdoor security. Mathematics 8(1):93. https://doi.org/10.3390/math8010093

    Article  Google Scholar 

  8. DeVries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552

  9. Glenn J (2020) YOLOv5. https://github.com/ultralytics/yolov5

  10. Goswami G, Ratha N, Agarwal A et al (2018) Unravelling robustness of deep learning based face recognition against adversarial attacks. In: Proceedings of the AAAI Conference on Artificial Intelligence 32(1). arXiv:1803.00401

  11. Gotwals AW, Birmingham D (2016) Eliciting, identifying, interpreting, and responding to students' ideas: Teacher candidates' growth in formative assessment practices. Res Sci Educ 46(3):365–388. https://doi.org/10.1007/s11165-015-9461-2

    Article  Google Scholar 

  12. Graesser A, Chipman P, King B et al (2007) Emotions and learning with auto tutor. Front Artif Intell Appl 158:569

    Google Scholar 

  13. Gupta SK, Ashwin TS, Guddeti RMR (2019) Students' affective content analysis in smart classroom environment using deep learning techniques. Multimed Tools Appl 78:25321–25348. https://doi.org/10.1007/s11042-019-7651-z

    Article  Google Scholar 

  14. He K, Zhang X, Ren S et al (2016) Deep Residual Learning for Image Recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778. https://doi.org/10.1109/CVPR.2016.90

  15. Joshi A, Allessio D, Magee J et al (2019) Affect-driven learning outcomes prediction in intelligent tutoring systems. In: 14th IEEE international conference on automatic face & gesture recognition (FG 2019). IEEE, 1–5. https://doi.org/10.1109/FG.2019.8756624

  16. Kansal S, Purwar S, Tripathi RK (2018) Image contrast enhancement using unsharp masking and histogram equalization. Multimed Tools Appl 77(20):26919–26938. https://doi.org/10.1007/s11042-018-5894-8

    Article  Google Scholar 

  17. Khan MA, Zhang YD, Allison M et al (2021) A fused heterogeneous deep neural network and robust feature selection framework for human actions recognition. Arab J Sci Eng:1–16

  18. Khan S, Khan MA, Alhaisoni M et al (2021) Human action recognition: a paradigm of best deep learning features selection and serial based extended fusion. Sensors 21(23):7941

    Article  Google Scholar 

  19. Khan MA, Zhang YD, Khan SA et al (2021) A resource conscious human action recognition framework using 26-layered deep convolutional neural network. Multimed Tools Appl 80(28):35827–35849

    Article  Google Scholar 

  20. Khan MA, Alhaisoni M, Armghan A et al (2021) Video analytics framework for human action recognition. Comput Mater Contin 68(3):3841–3859. https://doi.org/10.32604/cmc.2021.016864

    Article  Google Scholar 

  21. Kim JH, Kim N, Won CS (2022) Facial expression recognition with swin transformer. arXiv:2203.13472

  22. Kiran S, Khan MA, Javed MY et al (2021) Multi-layered deep learning features fusion for human action recognition. Comput Mater Contin 69(3):4061–4075. https://doi.org/10.32604/cmc.2021.017800

    Article  Google Scholar 

  23. Kuo CM, Lai SH, Sarkis M(2018) A compact deep learning model for robust facial expression recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops 2121–2129. https://doi.org/10.1109/CVPRW.2018.00286

  24. LabelImg tool Github repository: (n.d.) https://github.com/tzutalin/labelImg

  25. Lee HJ, Lee D (2020) Study of process-focused assessment using an algorithm for facial expression recognition based on a deep neural network model. Electronics 10(1):54. https://doi.org/10.3390/electronics10010054

    Article  Google Scholar 

  26. Lehman BA, Zapata-Rivera D (2018) Student emotions in conversation-based assessments. IEEE Trans Learn Technol 11(1):41–53

  27. Li T (2021) Research on intelligent classroom attendance management based on feature recognition. J Ambient Intell Humaniz Comput 13:1–8. https://doi.org/10.1007/s12652-021-03042-x

    Article  Google Scholar 

  28. Li S, Deng W (2020) Deep facial expression recognition: A survey. IEEE Trans Affect Comput 13:1195–1215. https://doi.org/10.1109/TAFFC.2020.2981446

    Article  Google Scholar 

  29. Li J, Zhang D, Zhang J et al (2017) Facial expression recognition with faster R-CNN. Procedia Comput Sci 107:135–140. https://doi.org/10.1016/j.procs.2017.03.069

    Article  Google Scholar 

  30. Li J, Wang Y, Wang C et al (2019) DSFD: Dual shot face detector. In: Proceeding of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 5062–5069. https://doi.org/10.1109/CVPR.2019.00520

  31. Li M, Li X, Sun W et al (2021) Efficient convolutional neural network with multi-kernel enhancement features for real-time facial expression recognition. J Real-Time Image Proc 18(6):2111–2122

    Article  Google Scholar 

  32. Li Z, Zeng W, Zhang X (2021) Research on student’s learning efficiency based on classroom facial expression analysis. In: 2021 2nd International Conference on Computing, Networks and Internet of Things. 2021:1–6. https://doi.org/10.1145/3468691.3468719

  33. Lin TY, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision 2980–2988

  34. Liu P, Han S, Meng Z et al (2014) Facial expression recognition via a boosted deep belief network. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1805–1812. https://doi.org/10.1109/CVPR.2014.233

  35. Lu Z, Jiang X, Kot A (2018) Deep coupled resnet for low-resolution face recognition. IEEE Signal Process Lett 25(4):526–530. https://doi.org/10.1109/LSP.2018.2810121

    Article  Google Scholar 

  36. Majeed F, Khan FZ, Iqbal MJ et al (2021) Real-Time surveillance system based on facial recognition using YOLOv5. In: 2021 Mohammad Ali Jinnah University International Conference on Computing (MAJICC). IEEE, 1–6. https://doi.org/10.1109/MAJICC53071.2021.9526254

  37. Mehrabian A (2017) Communication without words. Routledge, Abingdon, pp 193–200

  38. Mindoro JN, Pilueta NU, Austria YD et al (2020) Capturing students' attention through visible behavior: a prediction utilizing YOLOv3 approach. In: 11th IEEE Control and System Graduate Research Colloquium (ICSGRC), IEEE, 328–333. https://doi.org/10.1109/ICSGRC49013.2020.9232659

  39. Nasir IM, Raza M, Shah JH et al (2021) Human action recognition using machine learning in uncontrolled environment. In: 2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA). IEEE, 182–187

  40. Nasir IM, Raza M, Shah JH et al (2022) HAREDNet: A deep learning based architecture for autonomous video surveillance by recognizing human actions. Comput Electr Eng 99:107805

    Article  Google Scholar 

  41. Nie XH (2021) Intelligent analysis of classroom student state based on neural network algorithm and emotional feature recognition. J Intell Fuzzy Syst 40:7171–7182. https://doi.org/10.3233/JIFS-189545

    Article  Google Scholar 

  42. Pizer SM, Amburn EP, Austin JD et al (1987) Adaptive histogram equalization and its variations. Comput Vis Graph Image Process 39(3):355–368. https://doi.org/10.1016/S0734-189X(87)80186-X

    Article  Google Scholar 

  43. Qi D, Tan W, Yao Q et al (2021) YOLO5Face: Why reinventing a face detector. arXiv:2105.12931

  44. Qian L, Zhou X, Mou X et al (2021) Multi-Scale tiny region gesture recognition towards 3D object manipulation in industrial design. In: 2021 IEEE International Conference on Artificial Intelligence and Industrial Design (AIID). IEEE, 369–376. https://doi.org/10.1109/AIID51893.2021.9456471

  45. Shan S, Gao W, Cao B et al (2003) Illumination normalization for robust face recognition against varying lighting conditions. In: 2003 IEEE International SOI Conference. Proceedings (Cat. No. 03CH37443). IEEE, 157–164

  46. Sharma P, Joshi S, Gautam S et al (2019) Student engagement detection using emotion analysis, eye tracking and head movement with machine learning. arXiv:1909.12913

  47. Shi D, Tang H (2022) A new multiface target detection algorithm for students in class based on bayesian optimized YOLOv3 model. J Electr Comput Eng 2022:1–12

    Article  Google Scholar 

  48. Singh S, Ahuja U, Kumar M et al (2021) Face mask detection using YOLOv3 and faster R-CNN models: COVID-19 environment. Multimed Tools Appl 80:19753–19768. https://doi.org/10.1007/s11042-021-10711-8

    Article  Google Scholar 

  49. Solovyev R, Wang W, Gabruseva T (2021) Weighted boxes fusion: Ensembling boxes from different object detection models. Image Vis Comput 107:104117

    Article  Google Scholar 

  50. Sun X, Lv M (2019) Facial expression recognition based on a hybrid model combining deep and shallow features. Cogn Comput 11(4):587–597. https://doi.org/10.1007/s12559-019-09654-y

    Article  Google Scholar 

  51. Sun A, Li Y, Huang YM et al (2018) The exploration of facial expression recognition in distance education learning system. International Conference on Innovative Technologies and Learning (ICITL): Innovative Technologies and Learning, Springer, Cham, 111–121. https://doi.org/10.1007/978-3-319-99737-7_11

  52. Tang CG, Xu PF, Luo ZY et al (2015) Automatic facial expression analysis of students in teaching environments. Chinese Conference on Biometric Recognition (CCBR), 439–447. https://doi.org/10.1007/978-3-319-25417-3_52

  53. Tian W, Wang Z, Shen H et al (2018) Learning better features for face detection with feature fusion and segmentation supervision. arXiv:1811.08557

  54. Wang F, Wu S, Zhang W et al (2020) Emotion recognition with convolutional neural network and EEG-based EFDMs. Neuropsychologia 146:107506. https://doi.org/10.1016/j.neuropsychologia.2020.107506

    Article  Google Scholar 

  55. Wang K, Peng X, Yang J et al (2020) Region attention networks for pose and occlusion robust facial expression recognition. IEEE Tran Image Process 29:4057–4069. https://doi.org/10.1109/TIP.2019.2956143

    Article  Google Scholar 

  56. Wang ZZ, Xie K, Zhang XY et al (2021) Small-object detection based on YOLO and dense block via image super-resolution. IEEE Access 9:56416–56429. https://doi.org/10.1109/ACCESS.2021.3072211

    Article  Google Scholar 

  57. Wang Y, Wang L, Qiu J et al (2021) Feature enhancement: predict more detailed and crisper edges. SIViP 15(7):1635–1642. https://doi.org/10.1007/s11760-021-01899-1

    Article  Google Scholar 

  58. Whitehill J, Serpell Z, Lin YC et al (2014) The faces of engagement: automatic recognition of student engagement from facial expressions. IEEE Trans Affect Comput 5(1):86–98. https://doi.org/10.1109/TAFFC.2014.2316163

    Article  Google Scholar 

  59. Wilson M, Sloane K (2000) From principles to practice: An embedded assessment system. Appl Meas Educ 13(2):181–208

    Article  Google Scholar 

  60. Yang L, Tian Y, Song Y et al (2020) A novel feature separation model Exchange-GAN for facial expression recognition. Knowl-Based Syst 204:106217. https://doi.org/10.1016/j.knosys.2020.106217

    Article  Google Scholar 

  61. Zeng Y, Zhang L, Zhao J et al (2021) JRL-YOLO: A novel jump-join repetitious learning structure for real-time dangerous object detection. Comput Intell Neurosci 2021:5536152. https://doi.org/10.1155/2021/5536152

    Article  Google Scholar 

  62. Zhang F, Zhang T, Mao Q et al (2018) Joint pose and expression modeling for facial expression recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3359–3368. https://doi.org/10.1109/CVPR.2018.0035

  63. Zhenzhou C, Pengcheng D (2019) Face recognition based on improved residual neural network. In: 2019 Chinese Control And Decision Conference (CCDC). IEEE, 4626–4629. https://doi.org/10.1109/CCDC.2019.8833363

Download references

Acknowledgments

This work was financially supported by National Natural Science Foundation of China under Grant 62172184, Science and Technology Development Plan of Jilin Province of China under Grant 20200401077GX, 20200201292JC, Social Science Research of the Education Department of Jilin Province (JJKH20210901SK), Jilin Educational Scientific Research Leading Group (ZD21003) and Humanities and Social Science Foundation of Changchun Normal University(2020[011]).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiangjiu Che.

Ethics declarations

Conflict of interest

The authors declare no potential conflicts of interest for the research, authorship, and/or publication of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

ESM 1

(DOCX 931 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bie, M., Liu, Q., Xu, H. et al. FEMFER: feature enhancement for multi-faces expression recognition in classroom images. Multimed Tools Appl 83, 6183–6203 (2024). https://doi.org/10.1007/s11042-023-15808-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15808-w

Keywords

Navigation