FEMFER: feature enhancement for multi-faces expression recognition in classroom images

Bie, Mei; Liu, Quanle; Xu, Huan; Gao, Yan; Che, Xiangjiu

doi:10.1007/s11042-023-15808-w

FEMFER: feature enhancement for multi-faces expression recognition in classroom images

Published: 17 May 2023

Volume 83, pages 6183–6203, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Mei Bie^1,2,
Quanle Liu¹,
Huan Xu¹,
Yan Gao¹ &
…
Xiangjiu Che ORCID: orcid.org/0000-0003-3799-1963¹

214 Accesses
1 Altmetric
Explore all metrics

Abstract

Facial expression recognition is gradually being integrated into the classroom environment for educational assessment. Most existing algorithms are based on single frontal faces and are less effective for processing multi-facial images in a real classroom environment. In particular, detecting small faces is a common and challenging problem due to low video resolution, blurred images and little feature information. To address these issues, we improved YOLOv5 with the idea of feature enhancement (FE-YOLOv5) and applied it to classroom teaching scenarios. With Resnet-34_Focal as the expression classification network, the overall framework was FEMFER. The Feature Enhancement fused more information of feature maps by the proposed upsampling (UPS) module and the Convolution-Batch normalization-Leaky ReLU (CBL) module. The UPS module reduced the network’s local perceptual field and effectively learned detailed information from the backbone. The CBL module speeded up the model convergence while increasing the nonlinearity of the features. The network with the feature enhancement method could extract and fuse features efficiently, which was more suitable for small face detection in the classroom situation and solved the problem of inaccurate recognition of small targets in the original network. Our method achieved 81.42% (+7.18%) in mAP compared with the original YOLOv5 algorithm. The FEMFER intelligently assessed positive, neutral, and negative emotions, but it was currently limited to single-modal information extraction. Further research could be carried out from the perspective of fusing multi-modal information such as gestures and voice to realize more accurate affective computing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data availability

All relevant data are included within the article.

References

Bochkovskiy A, Wang CY, Liao HYM (2020) YOLOv4: Optimal speed and accuracy of object detection. arXiv:2004.10934
Bougourzi F, Dornaika F, Mokrani K et al (2020) Fusing transformed deep and shallow features (FTDS) for image-based facial expression recognition. Expert Syst Appl 156:113459. https://doi.org/10.1016/j.eswa.2020.113459
Article Google Scholar
Calvo RA, D'Mello SK (eds) (2011) New perspectives on affect and learning technologies, vol 3. Springer Science & Business Media
Google Scholar
Chen R, Jin Y, Xu L (2020) A classroom student counting system based on improved context-based face detector. In: International Conference on Web Information Systems and Applications, Springer, Cham, 326–332
Chen Z, Liang M, Yu W et al (2021) Intelligent teaching evaluation system integrating facial expression and behavior recognition in teaching video. In: 2021 IEEE International Conference on Big Data and Smart Computing (BigComp), IEEE, 52–59. https://doi.org/10.1109/BigComp51126.2021.00019
Cheng WJ, Huang HB, Peng S et al (2021) YOLO-face: a real-time face detector. Vis Comput 37:805–813. https://doi.org/10.1007/s00371-020-01831-7
Deng Z, Yang R, Lan R et al (2020) SE-IYOLOV3: An accurate small scale face detector for outdoor security. Mathematics 8(1):93. https://doi.org/10.3390/math8010093
Article Google Scholar
DeVries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552
Glenn J (2020) YOLOv5. https://github.com/ultralytics/yolov5
Goswami G, Ratha N, Agarwal A et al (2018) Unravelling robustness of deep learning based face recognition against adversarial attacks. In: Proceedings of the AAAI Conference on Artificial Intelligence 32(1). arXiv:1803.00401
Gotwals AW, Birmingham D (2016) Eliciting, identifying, interpreting, and responding to students' ideas: Teacher candidates' growth in formative assessment practices. Res Sci Educ 46(3):365–388. https://doi.org/10.1007/s11165-015-9461-2
Article Google Scholar
Graesser A, Chipman P, King B et al (2007) Emotions and learning with auto tutor. Front Artif Intell Appl 158:569
Google Scholar
Gupta SK, Ashwin TS, Guddeti RMR (2019) Students' affective content analysis in smart classroom environment using deep learning techniques. Multimed Tools Appl 78:25321–25348. https://doi.org/10.1007/s11042-019-7651-z
Article Google Scholar
He K, Zhang X, Ren S et al (2016) Deep Residual Learning for Image Recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778. https://doi.org/10.1109/CVPR.2016.90
Joshi A, Allessio D, Magee J et al (2019) Affect-driven learning outcomes prediction in intelligent tutoring systems. In: 14th IEEE international conference on automatic face & gesture recognition (FG 2019). IEEE, 1–5. https://doi.org/10.1109/FG.2019.8756624
Kansal S, Purwar S, Tripathi RK (2018) Image contrast enhancement using unsharp masking and histogram equalization. Multimed Tools Appl 77(20):26919–26938. https://doi.org/10.1007/s11042-018-5894-8
Article Google Scholar
Khan MA, Zhang YD, Allison M et al (2021) A fused heterogeneous deep neural network and robust feature selection framework for human actions recognition. Arab J Sci Eng:1–16
Khan S, Khan MA, Alhaisoni M et al (2021) Human action recognition: a paradigm of best deep learning features selection and serial based extended fusion. Sensors 21(23):7941
Article Google Scholar
Khan MA, Zhang YD, Khan SA et al (2021) A resource conscious human action recognition framework using 26-layered deep convolutional neural network. Multimed Tools Appl 80(28):35827–35849
Article Google Scholar
Khan MA, Alhaisoni M, Armghan A et al (2021) Video analytics framework for human action recognition. Comput Mater Contin 68(3):3841–3859. https://doi.org/10.32604/cmc.2021.016864
Article Google Scholar
Kim JH, Kim N, Won CS (2022) Facial expression recognition with swin transformer. arXiv:2203.13472
Kiran S, Khan MA, Javed MY et al (2021) Multi-layered deep learning features fusion for human action recognition. Comput Mater Contin 69(3):4061–4075. https://doi.org/10.32604/cmc.2021.017800
Article Google Scholar
Kuo CM, Lai SH, Sarkis M(2018) A compact deep learning model for robust facial expression recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops 2121–2129. https://doi.org/10.1109/CVPRW.2018.00286
LabelImg tool Github repository: (n.d.) https://github.com/tzutalin/labelImg
Lee HJ, Lee D (2020) Study of process-focused assessment using an algorithm for facial expression recognition based on a deep neural network model. Electronics 10(1):54. https://doi.org/10.3390/electronics10010054
Article Google Scholar
Lehman BA, Zapata-Rivera D (2018) Student emotions in conversation-based assessments. IEEE Trans Learn Technol 11(1):41–53
Li T (2021) Research on intelligent classroom attendance management based on feature recognition. J Ambient Intell Humaniz Comput 13:1–8. https://doi.org/10.1007/s12652-021-03042-x
Article Google Scholar
Li S, Deng W (2020) Deep facial expression recognition: A survey. IEEE Trans Affect Comput 13:1195–1215. https://doi.org/10.1109/TAFFC.2020.2981446
Article Google Scholar
Li J, Zhang D, Zhang J et al (2017) Facial expression recognition with faster R-CNN. Procedia Comput Sci 107:135–140. https://doi.org/10.1016/j.procs.2017.03.069
Article Google Scholar
Li J, Wang Y, Wang C et al (2019) DSFD: Dual shot face detector. In: Proceeding of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 5062–5069. https://doi.org/10.1109/CVPR.2019.00520
Li M, Li X, Sun W et al (2021) Efficient convolutional neural network with multi-kernel enhancement features for real-time facial expression recognition. J Real-Time Image Proc 18(6):2111–2122
Article Google Scholar
Li Z, Zeng W, Zhang X (2021) Research on student’s learning efficiency based on classroom facial expression analysis. In: 2021 2nd International Conference on Computing, Networks and Internet of Things. 2021:1–6. https://doi.org/10.1145/3468691.3468719
Lin TY, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision 2980–2988
Liu P, Han S, Meng Z et al (2014) Facial expression recognition via a boosted deep belief network. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1805–1812. https://doi.org/10.1109/CVPR.2014.233
Lu Z, Jiang X, Kot A (2018) Deep coupled resnet for low-resolution face recognition. IEEE Signal Process Lett 25(4):526–530. https://doi.org/10.1109/LSP.2018.2810121
Article Google Scholar
Majeed F, Khan FZ, Iqbal MJ et al (2021) Real-Time surveillance system based on facial recognition using YOLOv5. In: 2021 Mohammad Ali Jinnah University International Conference on Computing (MAJICC). IEEE, 1–6. https://doi.org/10.1109/MAJICC53071.2021.9526254
Mehrabian A (2017) Communication without words. Routledge, Abingdon, pp 193–200
Mindoro JN, Pilueta NU, Austria YD et al (2020) Capturing students' attention through visible behavior: a prediction utilizing YOLOv3 approach. In: 11th IEEE Control and System Graduate Research Colloquium (ICSGRC), IEEE, 328–333. https://doi.org/10.1109/ICSGRC49013.2020.9232659
Nasir IM, Raza M, Shah JH et al (2021) Human action recognition using machine learning in uncontrolled environment. In: 2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA). IEEE, 182–187
Nasir IM, Raza M, Shah JH et al (2022) HAREDNet: A deep learning based architecture for autonomous video surveillance by recognizing human actions. Comput Electr Eng 99:107805
Article Google Scholar
Nie XH (2021) Intelligent analysis of classroom student state based on neural network algorithm and emotional feature recognition. J Intell Fuzzy Syst 40:7171–7182. https://doi.org/10.3233/JIFS-189545
Article Google Scholar
Pizer SM, Amburn EP, Austin JD et al (1987) Adaptive histogram equalization and its variations. Comput Vis Graph Image Process 39(3):355–368. https://doi.org/10.1016/S0734-189X(87)80186-X
Article Google Scholar
Qi D, Tan W, Yao Q et al (2021) YOLO5Face: Why reinventing a face detector. arXiv:2105.12931
Qian L, Zhou X, Mou X et al (2021) Multi-Scale tiny region gesture recognition towards 3D object manipulation in industrial design. In: 2021 IEEE International Conference on Artificial Intelligence and Industrial Design (AIID). IEEE, 369–376. https://doi.org/10.1109/AIID51893.2021.9456471
Shan S, Gao W, Cao B et al (2003) Illumination normalization for robust face recognition against varying lighting conditions. In: 2003 IEEE International SOI Conference. Proceedings (Cat. No. 03CH37443). IEEE, 157–164
Sharma P, Joshi S, Gautam S et al (2019) Student engagement detection using emotion analysis, eye tracking and head movement with machine learning. arXiv:1909.12913
Shi D, Tang H (2022) A new multiface target detection algorithm for students in class based on bayesian optimized YOLOv3 model. J Electr Comput Eng 2022:1–12
Article Google Scholar
Singh S, Ahuja U, Kumar M et al (2021) Face mask detection using YOLOv3 and faster R-CNN models: COVID-19 environment. Multimed Tools Appl 80:19753–19768. https://doi.org/10.1007/s11042-021-10711-8
Article Google Scholar
Solovyev R, Wang W, Gabruseva T (2021) Weighted boxes fusion: Ensembling boxes from different object detection models. Image Vis Comput 107:104117
Article Google Scholar
Sun X, Lv M (2019) Facial expression recognition based on a hybrid model combining deep and shallow features. Cogn Comput 11(4):587–597. https://doi.org/10.1007/s12559-019-09654-y
Article Google Scholar
Sun A, Li Y, Huang YM et al (2018) The exploration of facial expression recognition in distance education learning system. International Conference on Innovative Technologies and Learning (ICITL): Innovative Technologies and Learning, Springer, Cham, 111–121. https://doi.org/10.1007/978-3-319-99737-7_11
Tang CG, Xu PF, Luo ZY et al (2015) Automatic facial expression analysis of students in teaching environments. Chinese Conference on Biometric Recognition (CCBR), 439–447. https://doi.org/10.1007/978-3-319-25417-3_52
Tian W, Wang Z, Shen H et al (2018) Learning better features for face detection with feature fusion and segmentation supervision. arXiv:1811.08557
Wang F, Wu S, Zhang W et al (2020) Emotion recognition with convolutional neural network and EEG-based EFDMs. Neuropsychologia 146:107506. https://doi.org/10.1016/j.neuropsychologia.2020.107506
Article Google Scholar
Wang K, Peng X, Yang J et al (2020) Region attention networks for pose and occlusion robust facial expression recognition. IEEE Tran Image Process 29:4057–4069. https://doi.org/10.1109/TIP.2019.2956143
Article Google Scholar
Wang ZZ, Xie K, Zhang XY et al (2021) Small-object detection based on YOLO and dense block via image super-resolution. IEEE Access 9:56416–56429. https://doi.org/10.1109/ACCESS.2021.3072211
Article Google Scholar
Wang Y, Wang L, Qiu J et al (2021) Feature enhancement: predict more detailed and crisper edges. SIViP 15(7):1635–1642. https://doi.org/10.1007/s11760-021-01899-1
Article Google Scholar
Whitehill J, Serpell Z, Lin YC et al (2014) The faces of engagement: automatic recognition of student engagement from facial expressions. IEEE Trans Affect Comput 5(1):86–98. https://doi.org/10.1109/TAFFC.2014.2316163
Article Google Scholar
Wilson M, Sloane K (2000) From principles to practice: An embedded assessment system. Appl Meas Educ 13(2):181–208
Article Google Scholar
Yang L, Tian Y, Song Y et al (2020) A novel feature separation model Exchange-GAN for facial expression recognition. Knowl-Based Syst 204:106217. https://doi.org/10.1016/j.knosys.2020.106217
Article Google Scholar
Zeng Y, Zhang L, Zhao J et al (2021) JRL-YOLO: A novel jump-join repetitious learning structure for real-time dangerous object detection. Comput Intell Neurosci 2021:5536152. https://doi.org/10.1155/2021/5536152
Article Google Scholar
Zhang F, Zhang T, Mao Q et al (2018) Joint pose and expression modeling for facial expression recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3359–3368. https://doi.org/10.1109/CVPR.2018.0035
Zhenzhou C, Pengcheng D (2019) Face recognition based on improved residual neural network. In: 2019 Chinese Control And Decision Conference (CCDC). IEEE, 4626–4629. https://doi.org/10.1109/CCDC.2019.8833363

Download references

Acknowledgments

This work was financially supported by National Natural Science Foundation of China under Grant 62172184, Science and Technology Development Plan of Jilin Province of China under Grant 20200401077GX, 20200201292JC, Social Science Research of the Education Department of Jilin Province (JJKH20210901SK), Jilin Educational Scientific Research Leading Group (ZD21003) and Humanities and Social Science Foundation of Changchun Normal University(2020[011]).

Author information

Authors and Affiliations

College of Computer Science and Technology, Jilin University, Changchun, 130012, China
Mei Bie, Quanle Liu, Huan Xu, Yan Gao & Xiangjiu Che
Institute of Education, Changchun Normal University, Changchun, 130032, China
Mei Bie

Authors

Mei Bie
View author publications
You can also search for this author in PubMed Google Scholar
Quanle Liu
View author publications
You can also search for this author in PubMed Google Scholar
Huan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yan Gao
View author publications
You can also search for this author in PubMed Google Scholar
Xiangjiu Che
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiangjiu Che.

Ethics declarations

Conflict of interest

The authors declare no potential conflicts of interest for the research, authorship, and/or publication of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

ESM 1

(DOCX 931 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bie, M., Liu, Q., Xu, H. et al. FEMFER: feature enhancement for multi-faces expression recognition in classroom images. Multimed Tools Appl 83, 6183–6203 (2024). https://doi.org/10.1007/s11042-023-15808-w

Download citation

Received: 10 February 2022
Revised: 25 October 2022
Accepted: 09 May 2023
Published: 17 May 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11042-023-15808-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FEMFER: feature enhancement for multi-faces expression recognition in classroom images

Abstract

Access this article

Data availability

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation