Abstract
Facial expression recognition is gradually being integrated into the classroom environment for educational assessment. Most existing algorithms are based on single frontal faces and are less effective for processing multi-facial images in a real classroom environment. In particular, detecting small faces is a common and challenging problem due to low video resolution, blurred images and little feature information. To address these issues, we improved YOLOv5 with the idea of feature enhancement (FE-YOLOv5) and applied it to classroom teaching scenarios. With Resnet-34_Focal as the expression classification network, the overall framework was FEMFER. The Feature Enhancement fused more information of feature maps by the proposed upsampling (UPS) module and the Convolution-Batch normalization-Leaky ReLU (CBL) module. The UPS module reduced the network’s local perceptual field and effectively learned detailed information from the backbone. The CBL module speeded up the model convergence while increasing the nonlinearity of the features. The network with the feature enhancement method could extract and fuse features efficiently, which was more suitable for small face detection in the classroom situation and solved the problem of inaccurate recognition of small targets in the original network. Our method achieved 81.42% (+7.18%) in mAP compared with the original YOLOv5 algorithm. The FEMFER intelligently assessed positive, neutral, and negative emotions, but it was currently limited to single-modal information extraction. Further research could be carried out from the perspective of fusing multi-modal information such as gestures and voice to realize more accurate affective computing.
Data availability
All relevant data are included within the article.
References
Bochkovskiy A, Wang CY, Liao HYM (2020) YOLOv4: Optimal speed and accuracy of object detection. arXiv:2004.10934
Bougourzi F, Dornaika F, Mokrani K et al (2020) Fusing transformed deep and shallow features (FTDS) for image-based facial expression recognition. Expert Syst Appl 156:113459. https://doi.org/10.1016/j.eswa.2020.113459
Calvo RA, D'Mello SK (eds) (2011) New perspectives on affect and learning technologies, vol 3. Springer Science & Business Media
Chen R, Jin Y, Xu L (2020) A classroom student counting system based on improved context-based face detector. In: International Conference on Web Information Systems and Applications, Springer, Cham, 326–332
Chen Z, Liang M, Yu W et al (2021) Intelligent teaching evaluation system integrating facial expression and behavior recognition in teaching video. In: 2021 IEEE International Conference on Big Data and Smart Computing (BigComp), IEEE, 52–59. https://doi.org/10.1109/BigComp51126.2021.00019
Cheng WJ, Huang HB, Peng S et al (2021) YOLO-face: a real-time face detector. Vis Comput 37:805–813. https://doi.org/10.1007/s00371-020-01831-7
Deng Z, Yang R, Lan R et al (2020) SE-IYOLOV3: An accurate small scale face detector for outdoor security. Mathematics 8(1):93. https://doi.org/10.3390/math8010093
DeVries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552
Glenn J (2020) YOLOv5. https://github.com/ultralytics/yolov5
Goswami G, Ratha N, Agarwal A et al (2018) Unravelling robustness of deep learning based face recognition against adversarial attacks. In: Proceedings of the AAAI Conference on Artificial Intelligence 32(1). arXiv:1803.00401
Gotwals AW, Birmingham D (2016) Eliciting, identifying, interpreting, and responding to students' ideas: Teacher candidates' growth in formative assessment practices. Res Sci Educ 46(3):365–388. https://doi.org/10.1007/s11165-015-9461-2
Graesser A, Chipman P, King B et al (2007) Emotions and learning with auto tutor. Front Artif Intell Appl 158:569
Gupta SK, Ashwin TS, Guddeti RMR (2019) Students' affective content analysis in smart classroom environment using deep learning techniques. Multimed Tools Appl 78:25321–25348. https://doi.org/10.1007/s11042-019-7651-z
He K, Zhang X, Ren S et al (2016) Deep Residual Learning for Image Recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778. https://doi.org/10.1109/CVPR.2016.90
Joshi A, Allessio D, Magee J et al (2019) Affect-driven learning outcomes prediction in intelligent tutoring systems. In: 14th IEEE international conference on automatic face & gesture recognition (FG 2019). IEEE, 1–5. https://doi.org/10.1109/FG.2019.8756624
Kansal S, Purwar S, Tripathi RK (2018) Image contrast enhancement using unsharp masking and histogram equalization. Multimed Tools Appl 77(20):26919–26938. https://doi.org/10.1007/s11042-018-5894-8
Khan MA, Zhang YD, Allison M et al (2021) A fused heterogeneous deep neural network and robust feature selection framework for human actions recognition. Arab J Sci Eng:1–16
Khan S, Khan MA, Alhaisoni M et al (2021) Human action recognition: a paradigm of best deep learning features selection and serial based extended fusion. Sensors 21(23):7941
Khan MA, Zhang YD, Khan SA et al (2021) A resource conscious human action recognition framework using 26-layered deep convolutional neural network. Multimed Tools Appl 80(28):35827–35849
Khan MA, Alhaisoni M, Armghan A et al (2021) Video analytics framework for human action recognition. Comput Mater Contin 68(3):3841–3859. https://doi.org/10.32604/cmc.2021.016864
Kim JH, Kim N, Won CS (2022) Facial expression recognition with swin transformer. arXiv:2203.13472
Kiran S, Khan MA, Javed MY et al (2021) Multi-layered deep learning features fusion for human action recognition. Comput Mater Contin 69(3):4061–4075. https://doi.org/10.32604/cmc.2021.017800
Kuo CM, Lai SH, Sarkis M(2018) A compact deep learning model for robust facial expression recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops 2121–2129. https://doi.org/10.1109/CVPRW.2018.00286
LabelImg tool Github repository: (n.d.) https://github.com/tzutalin/labelImg
Lee HJ, Lee D (2020) Study of process-focused assessment using an algorithm for facial expression recognition based on a deep neural network model. Electronics 10(1):54. https://doi.org/10.3390/electronics10010054
Lehman BA, Zapata-Rivera D (2018) Student emotions in conversation-based assessments. IEEE Trans Learn Technol 11(1):41–53
Li T (2021) Research on intelligent classroom attendance management based on feature recognition. J Ambient Intell Humaniz Comput 13:1–8. https://doi.org/10.1007/s12652-021-03042-x
Li S, Deng W (2020) Deep facial expression recognition: A survey. IEEE Trans Affect Comput 13:1195–1215. https://doi.org/10.1109/TAFFC.2020.2981446
Li J, Zhang D, Zhang J et al (2017) Facial expression recognition with faster R-CNN. Procedia Comput Sci 107:135–140. https://doi.org/10.1016/j.procs.2017.03.069
Li J, Wang Y, Wang C et al (2019) DSFD: Dual shot face detector. In: Proceeding of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 5062–5069. https://doi.org/10.1109/CVPR.2019.00520
Li M, Li X, Sun W et al (2021) Efficient convolutional neural network with multi-kernel enhancement features for real-time facial expression recognition. J Real-Time Image Proc 18(6):2111–2122
Li Z, Zeng W, Zhang X (2021) Research on student’s learning efficiency based on classroom facial expression analysis. In: 2021 2nd International Conference on Computing, Networks and Internet of Things. 2021:1–6. https://doi.org/10.1145/3468691.3468719
Lin TY, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision 2980–2988
Liu P, Han S, Meng Z et al (2014) Facial expression recognition via a boosted deep belief network. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1805–1812. https://doi.org/10.1109/CVPR.2014.233
Lu Z, Jiang X, Kot A (2018) Deep coupled resnet for low-resolution face recognition. IEEE Signal Process Lett 25(4):526–530. https://doi.org/10.1109/LSP.2018.2810121
Majeed F, Khan FZ, Iqbal MJ et al (2021) Real-Time surveillance system based on facial recognition using YOLOv5. In: 2021 Mohammad Ali Jinnah University International Conference on Computing (MAJICC). IEEE, 1–6. https://doi.org/10.1109/MAJICC53071.2021.9526254
Mehrabian A (2017) Communication without words. Routledge, Abingdon, pp 193–200
Mindoro JN, Pilueta NU, Austria YD et al (2020) Capturing students' attention through visible behavior: a prediction utilizing YOLOv3 approach. In: 11th IEEE Control and System Graduate Research Colloquium (ICSGRC), IEEE, 328–333. https://doi.org/10.1109/ICSGRC49013.2020.9232659
Nasir IM, Raza M, Shah JH et al (2021) Human action recognition using machine learning in uncontrolled environment. In: 2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA). IEEE, 182–187
Nasir IM, Raza M, Shah JH et al (2022) HAREDNet: A deep learning based architecture for autonomous video surveillance by recognizing human actions. Comput Electr Eng 99:107805
Nie XH (2021) Intelligent analysis of classroom student state based on neural network algorithm and emotional feature recognition. J Intell Fuzzy Syst 40:7171–7182. https://doi.org/10.3233/JIFS-189545
Pizer SM, Amburn EP, Austin JD et al (1987) Adaptive histogram equalization and its variations. Comput Vis Graph Image Process 39(3):355–368. https://doi.org/10.1016/S0734-189X(87)80186-X
Qi D, Tan W, Yao Q et al (2021) YOLO5Face: Why reinventing a face detector. arXiv:2105.12931
Qian L, Zhou X, Mou X et al (2021) Multi-Scale tiny region gesture recognition towards 3D object manipulation in industrial design. In: 2021 IEEE International Conference on Artificial Intelligence and Industrial Design (AIID). IEEE, 369–376. https://doi.org/10.1109/AIID51893.2021.9456471
Shan S, Gao W, Cao B et al (2003) Illumination normalization for robust face recognition against varying lighting conditions. In: 2003 IEEE International SOI Conference. Proceedings (Cat. No. 03CH37443). IEEE, 157–164
Sharma P, Joshi S, Gautam S et al (2019) Student engagement detection using emotion analysis, eye tracking and head movement with machine learning. arXiv:1909.12913
Shi D, Tang H (2022) A new multiface target detection algorithm for students in class based on bayesian optimized YOLOv3 model. J Electr Comput Eng 2022:1–12
Singh S, Ahuja U, Kumar M et al (2021) Face mask detection using YOLOv3 and faster R-CNN models: COVID-19 environment. Multimed Tools Appl 80:19753–19768. https://doi.org/10.1007/s11042-021-10711-8
Solovyev R, Wang W, Gabruseva T (2021) Weighted boxes fusion: Ensembling boxes from different object detection models. Image Vis Comput 107:104117
Sun X, Lv M (2019) Facial expression recognition based on a hybrid model combining deep and shallow features. Cogn Comput 11(4):587–597. https://doi.org/10.1007/s12559-019-09654-y
Sun A, Li Y, Huang YM et al (2018) The exploration of facial expression recognition in distance education learning system. International Conference on Innovative Technologies and Learning (ICITL): Innovative Technologies and Learning, Springer, Cham, 111–121. https://doi.org/10.1007/978-3-319-99737-7_11
Tang CG, Xu PF, Luo ZY et al (2015) Automatic facial expression analysis of students in teaching environments. Chinese Conference on Biometric Recognition (CCBR), 439–447. https://doi.org/10.1007/978-3-319-25417-3_52
Tian W, Wang Z, Shen H et al (2018) Learning better features for face detection with feature fusion and segmentation supervision. arXiv:1811.08557
Wang F, Wu S, Zhang W et al (2020) Emotion recognition with convolutional neural network and EEG-based EFDMs. Neuropsychologia 146:107506. https://doi.org/10.1016/j.neuropsychologia.2020.107506
Wang K, Peng X, Yang J et al (2020) Region attention networks for pose and occlusion robust facial expression recognition. IEEE Tran Image Process 29:4057–4069. https://doi.org/10.1109/TIP.2019.2956143
Wang ZZ, Xie K, Zhang XY et al (2021) Small-object detection based on YOLO and dense block via image super-resolution. IEEE Access 9:56416–56429. https://doi.org/10.1109/ACCESS.2021.3072211
Wang Y, Wang L, Qiu J et al (2021) Feature enhancement: predict more detailed and crisper edges. SIViP 15(7):1635–1642. https://doi.org/10.1007/s11760-021-01899-1
Whitehill J, Serpell Z, Lin YC et al (2014) The faces of engagement: automatic recognition of student engagement from facial expressions. IEEE Trans Affect Comput 5(1):86–98. https://doi.org/10.1109/TAFFC.2014.2316163
Wilson M, Sloane K (2000) From principles to practice: An embedded assessment system. Appl Meas Educ 13(2):181–208
Yang L, Tian Y, Song Y et al (2020) A novel feature separation model Exchange-GAN for facial expression recognition. Knowl-Based Syst 204:106217. https://doi.org/10.1016/j.knosys.2020.106217
Zeng Y, Zhang L, Zhao J et al (2021) JRL-YOLO: A novel jump-join repetitious learning structure for real-time dangerous object detection. Comput Intell Neurosci 2021:5536152. https://doi.org/10.1155/2021/5536152
Zhang F, Zhang T, Mao Q et al (2018) Joint pose and expression modeling for facial expression recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3359–3368. https://doi.org/10.1109/CVPR.2018.0035
Zhenzhou C, Pengcheng D (2019) Face recognition based on improved residual neural network. In: 2019 Chinese Control And Decision Conference (CCDC). IEEE, 4626–4629. https://doi.org/10.1109/CCDC.2019.8833363
Acknowledgments
This work was financially supported by National Natural Science Foundation of China under Grant 62172184, Science and Technology Development Plan of Jilin Province of China under Grant 20200401077GX, 20200201292JC, Social Science Research of the Education Department of Jilin Province (JJKH20210901SK), Jilin Educational Scientific Research Leading Group (ZD21003) and Humanities and Social Science Foundation of Changchun Normal University(2020[011]).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no potential conflicts of interest for the research, authorship, and/or publication of this article.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
ESM 1
(DOCX 931 kb)
Rights and permissions
About this article
Cite this article
Bie, M., Liu, Q., Xu, H. et al. FEMFER: feature enhancement for multi-faces expression recognition in classroom images. Multimed Tools Appl 83, 6183–6203 (2024). https://doi.org/10.1007/s11042-023-15808-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15808-w