ABSTRACT
This paper proposes an advanced multi-instance learning method with multi-features engineering and conservative optimization for engagement intensity prediction. It was applied to the EmotiW Challenge 2020 and the results demonstrated the proposed method's good performance. The task is to predict the engagement level when a subject-student is watching an educational video under a range of conditions and in various environments. As engagement intensity has a strong correlation with facial movements, upper-body posture movements and overall environmental movements in a given time interval, we extract and incorporate these motion features into a deep regression model consisting of layers with a combination of long short-term memory(LSTM), gated recurrent unit (GRU) and a fully connected layer. In order to precisely and robustly predict the engagement level in a long video with various situations such as darkness and complex backgrounds, a multi-features engineering function is used to extract synchronized multi-model features in a given period of time by considering both short-term and long-term dependencies. Based on these well-processed engineered multi-features, in the 1st training stage, we train and generate the best models covering all the model configurations to maximize validation accuracy. Furthermore, in the 2nd training stage, to avoid the overfitting problem attributable to the extremely small engagement dataset, we conduct conservative optimization by applying a single Bi-LSTM layer with only 16 units to minimize the overfitting, and split the engagement dataset (train + validation) with 5-fold cross validation (stratified k-fold) to train a conservative model. The proposed method, by using decision-level ensemble for the two training stages' models, finally win the second place in the challenge (MSE: 0.061110 on the testing set).
Supplemental Material
- Dhall Abhinav, Sharma Garima, Goecke Roland, and Gedeon Tom. 2020. EmotiW 2020: Driver Gaze, Group Emotion, Student Engagement and Physiological Signal based Challenges. In Proceedings of the 2020 on International Conference on Multimodal Interaction. ACM.Google Scholar
- Dhall Abhinav, Goecke Roland, Ghosh Shreya, and Gedeon Tom. 2019. EmotiW 2019: Automatic Emotion, Engagement and Cohesion PredictionTasks. In Proceedings of the 2019 on International Conference on Multimodal Interaction. ACM.Google Scholar
- Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299--6308.Google ScholarCross Ref
- Jennifer A Fredricks, Phyllis C Blumenfeld, and Alison H Paris. 2004. School engagement: Potential of the concept, state of the evidence. Review of educational research 74, 1 (2004), 59--109.Google Scholar
- Amanjot Kaur, Aamir Mustafa, Love Mehta, and Abhinav Dhall. 2018. Prediction and localization of student engagement in the wild. In 2018 Digital Image Computing: Techniques and Applications (DICTA). IEEE, 1--8.Google Scholar
- Anuruth Lertpiya, Chalothorn Tawunrat, and Chuangsuwanich Ekapol. 2020. Thai Spelling Correction and Word Normalization on Social Text Using a Two- Stage Pipeline With Neural Contextual Attention. IEEE Access 8 133403, 133419 (2020).Google Scholar
- Xuesong Niu, Hu Han, Jiabei Zeng, Xuran Sun, Shiguang Shan, Yan Huang, Songfan Yang, and Xilin Chen. 2018. Automatic engagement prediction with GAP feature. In Proceedings of the 2018 on International Conference on Multimodal Interaction. ACM, 599--603.Google ScholarDigital Library
- Leslie N Smith. 2017. Cyclical learning rates for training neural networks. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 464--472.Google ScholarCross Ref
- Christian Stöhr, Natalia Stathakarou, Franziska Mueller, Sokratis Nifakos, and Cormac McGrath. 2019. Videos as learning objects in MOOCs: A study of specialist and non-specialist participants? video activity in MOOCs. British Journal of Educational Technology 50, 1 (2019), 166--176.Google ScholarCross Ref
- Baltrusaitis Tadas, Zadeh Amir, Lim Yao, Chong, and Morency Louis-Philippe. 2018. Openface 2.0: Facial behavior analysis toolkit. 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). 59, 66 (2018).Google Scholar
- Lisa Wang, Angela Sy, Larry Liu, and Chris Piech. 2017. Learning to Represent Student Knowledge on Programming Exercises Using Deep Learning. In EDM.Google Scholar
- Yanan Wang, Jianming Wu, and Keiichiro Hoashi. 2019. Lightweight Deep Convolutional Neural Networks for Facial Expression Recognition. In 2019 IEEE 21th International Workshop on Multimedia Signal Processing (MMSP).Google Scholar
- Justin M. Weinhardt and Traci Sitzmann. 2019. Revolutionizing training and education? Three questions regarding massive open online courses (MOOCs). Human Resource Management Review 29, 2 (2019), 218--225.Google ScholarCross Ref
- Kay Will, Carreira Joao, Simonyan Karen, Zhang Brian, Hillier Chloe, Vijayanarasimhan Sudheendra, Viola Fabio, Green Tim, Back Trevor, Natsev Paul, Suleyman Mustafa, and Zisserman. Andrew. 2017. The kinetics human action video dataset. arXiv preprint 1705, 06950 (2017).Google Scholar
- Jianfei Yang, Kai Wang, Xiaojiang Peng, and Yu Qiao. 2018. Deep recurrent multi-instance learning with spatio-temporal features for engagement intensity prediction. In Proceedings of the 2018 on International Conference on Multimodal Interaction. ACM, 594--598.Google ScholarDigital Library
- Cao Zhe, Hidalgo Gines, Simon Tomas, Wei Shih-En, and Sheikh Yaser. 2018. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. arXiv preprint 1812, 08008 (2018).Google Scholar
- Peng Zhou, Wei Shi, Jun Tian, Zhenyu Qi, Bingchen Li, Hongwei Hao, and Bo Xu. 2016. Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 207--212.Google ScholarCross Ref
Index Terms
- Advanced Multi-Instance Learning Method with Multi-features Engineering and Conservative Optimization for Engagement Intensity Prediction
Recommendations
Multi-feature and Multi-instance Learning with Anti-overfitting Strategy for Engagement Intensity Prediction
ICMI '19: 2019 International Conference on Multimodal InteractionThis paper proposes a novel engagement intensity prediction approach, which is also applied in the EmotiW Challenge 2019 and resulted in good performance. The task is to predict the engagement level when a subject student is watching an educational ...
Multi-instance clustering with applications to multi-instance prediction
In the setting of multi-instance learning, each object is represented by a bag composed of multiple instances instead of by a single instance in a traditional learning setting. Previous works in this area only concern multi-instance prediction problems ...
Cost‐effective multi‐instance multilabel active learning
AbstractMulti‐instance multi‐label (MIML) Active Learning (M2AL) aims to improve the learner while reducing the cost as much as possible by querying informative labels of complex bags composed of diverse instances. Existing M2AL solutions suffer high ...
Comments