Abstract
E-learning enables the dissemination of valuable academic information to all users regardless of where they are situated. One of the challenges faced by e-learning systems is the lack of constant interaction between the user and the system. This observability feature is an essential feature of a typical classroom setting and a means of detecting or observing feature reactions and thus such features in the form of expressions should be incorporated into an e-learning platform. The proposed solution is the implementation of a deep-learning-based facial image analysis model to estimate the learning affect and to reflect on the level of student engagement. This work proposes the use of a Temporal Relational Network (TRN), for identifying the changes in the emotions on students’ faces during e-learning session. It is observed that TRN sparsely samples individual frames and then learns their causal relations, which is much more efficient than sampling dense frames and convolving them. In this paper, single-scale and multi-scale temporal relations are considered to achieve the proposed goal. Furthermore, a Multi-Layer Perceptron (MLP) is also tested as a baseline classifier. The proposed framework is end-to-end trainable for video-based Facial Emotion Recognition (FER). The proposed FER model was tested on the open-source DISFA+ database. The TRN based model showed a significant reduction in the length of the feature set which were effective in recognizing expressions. It is observed that the multi-scale TRN has produced better accuracy than the single-scale TRN and MLP with an accuracy of 92.7%, 89.4%, and 86.6% respectively.










Similar content being viewed by others
References
Boughrara H, Chtourou M, Amar CB, Chen L (2016) Facial expression recognition based on a mlp neural network using constructive training algorithm. Multimed Tools Appl 75(2):709–731
Byeon Y-H, Kwak K-C (2014) Facial expression recognition using 3D convolutional neural network. Int J Adv Comput Sci Appl 5(12)
Cabada RZ, Rangel HR, Estrada MLB, Lopez HMC (2020) Hyperparameter optimization in CNN for learning-centered emotion recognition for intelligent tutoring systems. Soft Comput 24(10):7593–7602
Collins A, Halverson R (2018) Rethinking education in the age of technology: the digital revolution and schooling in America. Teachers College Press, New York
Fan Y, Lu X, Li D, Liu Y (2016) Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In: Proceedings of the 18th ACM international conference on multimodal interaction, pp 445–450
Gaikwad AS (2018) Pruning convolution neural network (SqueezeNet) for efficient hardware deployment. Ph.D. dissertation Purdue University
Geron A (2017) Hands-on machine learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build Intelligent Systems. O’Reilly Media Inc.
Hasani B, Mahoor MH (2017) Facial expression recognition using enhanced deep 3D convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 30–40
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Huang X, Zhao G, Pietikäinen M, Zheng W (2010) Dynamic facial expression recognition using boosted component-based spatiotemporal features and multi-classifier fusion. In: International conference on advanced concepts for intelligent vision systems. Springer, pp 312–322
Hossain MS, Muhammad G, Alhamid MF, Song B, Al-Mutib K (2016) Audio-visual emotion recognition using big data towards 5g. Mob Netw Appl 21(5):753–763
Hossain MS, Muhammad G (2016) Audio-visual emotion recognition using multi-directional regression and ridgelet transform. J Multimodal User Interfaces 10(4):325–333
Hur MH, Im Y (2013) The influence of e-learning on individual and collective empowerment in the public sector: an empirical study of Korean Government employees. Int Rev Res Open Distance Learn 14:191–213, 09
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5MB model size. arXiv:1602.07360
Jabbar H, Khan D (2015) Methods to avoid over-fitting and under-fitting in supervised machine learning (comparative study). Computer Science, Communication and Instrumentation Devices
Jan A, Ding H, Meng H, Chen L, Li H (2018) Accurate facial parts localization and deep learning for 3D facial expression recognition. In: 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018). IEEE, pp 466–472
Kaya H, Gürpınar F, Salah AA (2017) Video-based emotion recognition in the wild using deep transfer learning and score fusion. Image Vis Comput 65:66–75
Ko BC (2018) A brief review of facial emotion recognition based on visual information. Sensors 18(2):401
Kozina A (2017) Designing an effective e-learning experience: thesis project: Memocate. Effective e-learning experience
Li H, Sun J, Xu Z, Chen L (2017) Multimodal 2D + 3D facial expression recognition with deep fusion convolutional neural network. IEEE Trans Multimed 19(12):2816–2831
Li B, Wei W, Ferreira A, Tan S (2018) Rest-Net: diverse activation modules and parallel subnets-based CNN for spatial image steg analysis. IEEE Signal Process Lett 25(5):650–654
Ly TS, Do N-T, Kim S-H, Yang H-J, Lee G-S (2019) A novel 2D and 3D multimodal approach for in-the-wild facial expression recognition. Image Vis Comput 92:103817
Mattivi R, Shao L (2009) Human action recognition using LBP-TOP as sparse spatio-temporal feature descriptor. In: International conference on computer analysis of images and patterns. Springer, pp 740–747
Mavadati M, Sanger P, Mahoor MH (2016) Extended DISFA dataset: investigating posed and spontaneous facial expressions. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 1–8
Mollahosseini A, Hasani B, Salvador MJ, Abdollahi H, Chan D, Mahoor MH (2016) Facial expression recognition from World Wild Web. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 58–65
Mollahosseini A, Chan D, Mahoor MH (2016) Going deeper in facial expression recognition using deep neural networks. In: 2016 IEEE Winter conference on applications of computer vision (WACV). IEEE, pp 1–10
Oyedotun O K, Demisse G, El Rahman Shabayek A, Aouada D, Ottersten B (2017) Facial expression recognition via joint deep learning of RGB-Depth map latent representations. In: Proceedings of the IEEE international conference on computer vision workshops, pp 3161–3168
Pan X, Ying G, Chen G, Li H, Li W (2019) A deep spatial and temporal aggregation framework for video-based facial expression recognition. IEEE Access 7:48807–48815
Pan X, Guo W, Guo X, Li W, Xu J, Wu J (2019) Deep temporal–spatial aggregation for video-based facial expression recognition. Symmetry 11 (1):52
Park SY, Lee SH, Ro YM (2015) Subtle facial expression recognition using adaptive magnification of discriminative facial motion. In: Proceedings of the 23rd ACM international conference on multimedia, ser. MM ’15. [Online]. Available: http://doi.acm.org/10.1145/2733373.2806362. ACM, New York, pp 911–914
Pietikäinen M, Hadid A, Zhao G, Ahonen T (2011) Computer vision using local binary patterns, vol 40. Springer Science & Business Media
Pranav E, Kamal S, Satheesh Chandran C, Supriya MH (2020) Facial emotion recognition using deep convolutional neural network. In: 2020 6th International conference on advanced computing and communication Systems (ICACCS), pp 317–320
Ranganathan H, Chakraborty S, Panchanathan S (2016) Multimodal emotion recognition using deep learning architectures. In: 2016 IEEE Winter conference on applications of computer vision (WACV), pp 1–9
Reddy SPT, Karri ST, Dubey SR, Mukherjee S (2019) Spontaneous facial micro-expression recognition using 3D spatiotemporal convolutional neural networks. arXiv:1904.01390
Santoro A, Raposo D, Barrett D G, Malinowski M, Pascanu R, Battaglia P, Lillicrap T (2017) A simple neural network module for relational reasoning. In: Guyon I, Luxburg U V, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems 30. [Online]. Available: http://papers.nips.cc/paper/7082-a-simple-neural-network-module-for-relational-reasoning.pdf. Curran Associates, Inc., pp 4967–4976
Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 806–813
Sivaraman K, Murthy A (2018) Object recognition under lighting variations using pre-trained networks. In: 2018 IEEE applied imagery pattern recognition workshop (AIPR). IEEE, pp 1–7
Spizhevoy A (2016) Robust dynamic facial expressions recognition using LBP-TOP descriptors and bag-of-words classification model. Pattern Recognit Image Anal 26(1):216–220
Sumathi C, Santhanam T, Mahadevi M (2012) Automatic facial expression analysis: a survey. Int J Comput Sci Eng Surv 3(6):47
Sun W, Zhao H, Jin Z (2018) A visual attention based ROI detection method for facial expression recognition. Neurocomputing 296:12–22
Wang Y, Yu H, Stevens B, Liu H (2015) Dynamic facial expression recognition using local patch and LBP-TOP. In: 2015 8th International conference on human system interaction (HSI). IEEE, pp 362–367
Wang Y, See J, Phan RC-W, Oh Y-H (2015) Efficient spatio-temporal local binary patterns for spontaneous facial micro-expression recognition. PloS One 10(5):e0124674
Ye H, Wu Z, Zhao R-W, Wang X, Jiang Y-G, Xue X (2015) Evaluating two-stream CNN for video classification. In: Proceedings of the 5th ACM on international conference on multimedia retrieval. ACM, pp 435–442
Yue-Hei Ng J, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: Deep networks for video classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4694–4702
Zhang S, Zhang S, Huang T, Gao W (2016) Multimodal deep convolutional neural network for audio-visual emotion recognition. In: Proceedings of the 2016 ACM on international conference on multimedia retrieval. ACM, pp 281–284
Zhang S, Zhang S, Huang T, Gao W, Tian Q (2018) Learning affective features with a hybrid deep model for audio–visual emotion recognition. IEEE Trans Circ Syst Video Technol 28(10):3030–3043
Zhao L, Wang Z, Zhang G (2017) Facial expression recognition from video sequences based on spatial-temporal motion local binary pattern and Gabor multiorientation fusion histogram. Math Probl Eng
Zheng WQ, Yu JS, Zou YX (2015) An experimental study of speech emotion recognition based on deep convolutional neural networks. In: 2015 International conference on affective computing and intelligent interaction (ACII), pp 827–831
Zhou B, Andonian A, Oliva A, Torralba A (2018) Temporal relational reasoning in videos. In: Proceedings of the European conference on computer vision (ECCV), pp 803–818
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pise, A., Vadapalli, H. & Sanders, I. Facial emotion recognition using temporal relational network: an application to E-learning. Multimed Tools Appl 81, 26633–26653 (2022). https://doi.org/10.1007/s11042-020-10133-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-10133-y