skip to main content
10.1145/3382507.3417959acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Advanced Multi-Instance Learning Method with Multi-features Engineering and Conservative Optimization for Engagement Intensity Prediction

Authors Info & Claims
Published:22 October 2020Publication History

ABSTRACT

This paper proposes an advanced multi-instance learning method with multi-features engineering and conservative optimization for engagement intensity prediction. It was applied to the EmotiW Challenge 2020 and the results demonstrated the proposed method's good performance. The task is to predict the engagement level when a subject-student is watching an educational video under a range of conditions and in various environments. As engagement intensity has a strong correlation with facial movements, upper-body posture movements and overall environmental movements in a given time interval, we extract and incorporate these motion features into a deep regression model consisting of layers with a combination of long short-term memory(LSTM), gated recurrent unit (GRU) and a fully connected layer. In order to precisely and robustly predict the engagement level in a long video with various situations such as darkness and complex backgrounds, a multi-features engineering function is used to extract synchronized multi-model features in a given period of time by considering both short-term and long-term dependencies. Based on these well-processed engineered multi-features, in the 1st training stage, we train and generate the best models covering all the model configurations to maximize validation accuracy. Furthermore, in the 2nd training stage, to avoid the overfitting problem attributable to the extremely small engagement dataset, we conduct conservative optimization by applying a single Bi-LSTM layer with only 16 units to minimize the overfitting, and split the engagement dataset (train + validation) with 5-fold cross validation (stratified k-fold) to train a conservative model. The proposed method, by using decision-level ensemble for the two training stages' models, finally win the second place in the challenge (MSE: 0.061110 on the testing set).

Skip Supplemental Material Section

Supplemental Material

3382507.3417959.mp4

mp4

10.1 MB

References

  1. Dhall Abhinav, Sharma Garima, Goecke Roland, and Gedeon Tom. 2020. EmotiW 2020: Driver Gaze, Group Emotion, Student Engagement and Physiological Signal based Challenges. In Proceedings of the 2020 on International Conference on Multimodal Interaction. ACM.Google ScholarGoogle Scholar
  2. Dhall Abhinav, Goecke Roland, Ghosh Shreya, and Gedeon Tom. 2019. EmotiW 2019: Automatic Emotion, Engagement and Cohesion PredictionTasks. In Proceedings of the 2019 on International Conference on Multimodal Interaction. ACM.Google ScholarGoogle Scholar
  3. Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299--6308.Google ScholarGoogle ScholarCross RefCross Ref
  4. Jennifer A Fredricks, Phyllis C Blumenfeld, and Alison H Paris. 2004. School engagement: Potential of the concept, state of the evidence. Review of educational research 74, 1 (2004), 59--109.Google ScholarGoogle Scholar
  5. Amanjot Kaur, Aamir Mustafa, Love Mehta, and Abhinav Dhall. 2018. Prediction and localization of student engagement in the wild. In 2018 Digital Image Computing: Techniques and Applications (DICTA). IEEE, 1--8.Google ScholarGoogle Scholar
  6. Anuruth Lertpiya, Chalothorn Tawunrat, and Chuangsuwanich Ekapol. 2020. Thai Spelling Correction and Word Normalization on Social Text Using a Two- Stage Pipeline With Neural Contextual Attention. IEEE Access 8 133403, 133419 (2020).Google ScholarGoogle Scholar
  7. Xuesong Niu, Hu Han, Jiabei Zeng, Xuran Sun, Shiguang Shan, Yan Huang, Songfan Yang, and Xilin Chen. 2018. Automatic engagement prediction with GAP feature. In Proceedings of the 2018 on International Conference on Multimodal Interaction. ACM, 599--603.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Leslie N Smith. 2017. Cyclical learning rates for training neural networks. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 464--472.Google ScholarGoogle ScholarCross RefCross Ref
  9. Christian Stöhr, Natalia Stathakarou, Franziska Mueller, Sokratis Nifakos, and Cormac McGrath. 2019. Videos as learning objects in MOOCs: A study of specialist and non-specialist participants? video activity in MOOCs. British Journal of Educational Technology 50, 1 (2019), 166--176.Google ScholarGoogle ScholarCross RefCross Ref
  10. Baltrusaitis Tadas, Zadeh Amir, Lim Yao, Chong, and Morency Louis-Philippe. 2018. Openface 2.0: Facial behavior analysis toolkit. 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). 59, 66 (2018).Google ScholarGoogle Scholar
  11. Lisa Wang, Angela Sy, Larry Liu, and Chris Piech. 2017. Learning to Represent Student Knowledge on Programming Exercises Using Deep Learning. In EDM.Google ScholarGoogle Scholar
  12. Yanan Wang, Jianming Wu, and Keiichiro Hoashi. 2019. Lightweight Deep Convolutional Neural Networks for Facial Expression Recognition. In 2019 IEEE 21th International Workshop on Multimedia Signal Processing (MMSP).Google ScholarGoogle Scholar
  13. Justin M. Weinhardt and Traci Sitzmann. 2019. Revolutionizing training and education? Three questions regarding massive open online courses (MOOCs). Human Resource Management Review 29, 2 (2019), 218--225.Google ScholarGoogle ScholarCross RefCross Ref
  14. Kay Will, Carreira Joao, Simonyan Karen, Zhang Brian, Hillier Chloe, Vijayanarasimhan Sudheendra, Viola Fabio, Green Tim, Back Trevor, Natsev Paul, Suleyman Mustafa, and Zisserman. Andrew. 2017. The kinetics human action video dataset. arXiv preprint 1705, 06950 (2017).Google ScholarGoogle Scholar
  15. Jianfei Yang, Kai Wang, Xiaojiang Peng, and Yu Qiao. 2018. Deep recurrent multi-instance learning with spatio-temporal features for engagement intensity prediction. In Proceedings of the 2018 on International Conference on Multimodal Interaction. ACM, 594--598.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Cao Zhe, Hidalgo Gines, Simon Tomas, Wei Shih-En, and Sheikh Yaser. 2018. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. arXiv preprint 1812, 08008 (2018).Google ScholarGoogle Scholar
  17. Peng Zhou, Wei Shi, Jun Tian, Zhenyu Qi, Bingchen Li, Hongwei Hao, and Bo Xu. 2016. Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 207--212.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Advanced Multi-Instance Learning Method with Multi-features Engineering and Conservative Optimization for Engagement Intensity Prediction

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction
        October 2020
        920 pages
        ISBN:9781450375818
        DOI:10.1145/3382507

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 October 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate453of1,080submissions,42%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader