research-article

Advanced Multi-Instance Learning Method with Multi-features Engineering and Conservative Optimization for Engagement Intensity Prediction

Authors:
Jianming Wu

KDDI Research, Inc., Fujimino-shi, Saitama, Japan

KDDI Research, Inc., Fujimino-shi, Saitama, Japan
View Profile

,
Bo Yang

KDDI Research, Inc., Fujimino-shi, Saitama, Japan

KDDI Research, Inc., Fujimino-shi, Saitama, Japan
View Profile

,
Yanan Wang

KDDI Research, Inc., Fujimino-shi, Saitama, Japan

KDDI Research, Inc., Fujimino-shi, Saitama, Japan
View Profile

,
Gen Hattori

KDDI Research, Inc., Fujimino-shi, Saitama, Japan

KDDI Research, Inc., Fujimino-shi, Saitama, Japan
View Profile

ICMI '20: Proceedings of the 2020 International Conference on Multimodal InteractionOctober 2020Pages 777–783https://doi.org/10.1145/3382507.3417959

Published:22 October 2020Publication History

ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction

Pages 777–783

ABSTRACT

This paper proposes an advanced multi-instance learning method with multi-features engineering and conservative optimization for engagement intensity prediction. It was applied to the EmotiW Challenge 2020 and the results demonstrated the proposed method's good performance. The task is to predict the engagement level when a subject-student is watching an educational video under a range of conditions and in various environments. As engagement intensity has a strong correlation with facial movements, upper-body posture movements and overall environmental movements in a given time interval, we extract and incorporate these motion features into a deep regression model consisting of layers with a combination of long short-term memory(LSTM), gated recurrent unit (GRU) and a fully connected layer. In order to precisely and robustly predict the engagement level in a long video with various situations such as darkness and complex backgrounds, a multi-features engineering function is used to extract synchronized multi-model features in a given period of time by considering both short-term and long-term dependencies. Based on these well-processed engineered multi-features, in the 1st training stage, we train and generate the best models covering all the model configurations to maximize validation accuracy. Furthermore, in the 2nd training stage, to avoid the overfitting problem attributable to the extremely small engagement dataset, we conduct conservative optimization by applying a single Bi-LSTM layer with only 16 units to minimize the overfitting, and split the engagement dataset (train + validation) with 5-fold cross validation (stratified k-fold) to train a conservative model. The proposed method, by using decision-level ensemble for the two training stages' models, finally win the second place in the challenge (MSE: 0.061110 on the testing set).

Supplemental Material

3382507.3417959.mp4

mp4

10.1 MB

Download

References

Dhall Abhinav, Sharma Garima, Goecke Roland, and Gedeon Tom. 2020. EmotiW 2020: Driver Gaze, Group Emotion, Student Engagement and Physiological Signal based Challenges. In Proceedings of the 2020 on International Conference on Multimodal Interaction. ACM.Google Scholar
Dhall Abhinav, Goecke Roland, Ghosh Shreya, and Gedeon Tom. 2019. EmotiW 2019: Automatic Emotion, Engagement and Cohesion PredictionTasks. In Proceedings of the 2019 on International Conference on Multimodal Interaction. ACM.Google Scholar
Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299--6308.Google ScholarCross Ref
Jennifer A Fredricks, Phyllis C Blumenfeld, and Alison H Paris. 2004. School engagement: Potential of the concept, state of the evidence. Review of educational research 74, 1 (2004), 59--109.Google Scholar
Amanjot Kaur, Aamir Mustafa, Love Mehta, and Abhinav Dhall. 2018. Prediction and localization of student engagement in the wild. In 2018 Digital Image Computing: Techniques and Applications (DICTA). IEEE, 1--8.Google Scholar
Anuruth Lertpiya, Chalothorn Tawunrat, and Chuangsuwanich Ekapol. 2020. Thai Spelling Correction and Word Normalization on Social Text Using a Two- Stage Pipeline With Neural Contextual Attention. IEEE Access 8 133403, 133419 (2020).Google Scholar
Xuesong Niu, Hu Han, Jiabei Zeng, Xuran Sun, Shiguang Shan, Yan Huang, Songfan Yang, and Xilin Chen. 2018. Automatic engagement prediction with GAP feature. In Proceedings of the 2018 on International Conference on Multimodal Interaction. ACM, 599--603.Google ScholarDigital Library
Leslie N Smith. 2017. Cyclical learning rates for training neural networks. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 464--472.Google ScholarCross Ref
Christian Stöhr, Natalia Stathakarou, Franziska Mueller, Sokratis Nifakos, and Cormac McGrath. 2019. Videos as learning objects in MOOCs: A study of specialist and non-specialist participants? video activity in MOOCs. British Journal of Educational Technology 50, 1 (2019), 166--176.Google ScholarCross Ref
Baltrusaitis Tadas, Zadeh Amir, Lim Yao, Chong, and Morency Louis-Philippe. 2018. Openface 2.0: Facial behavior analysis toolkit. 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). 59, 66 (2018).Google Scholar
Lisa Wang, Angela Sy, Larry Liu, and Chris Piech. 2017. Learning to Represent Student Knowledge on Programming Exercises Using Deep Learning. In EDM.Google Scholar
Yanan Wang, Jianming Wu, and Keiichiro Hoashi. 2019. Lightweight Deep Convolutional Neural Networks for Facial Expression Recognition. In 2019 IEEE 21th International Workshop on Multimedia Signal Processing (MMSP).Google Scholar
Justin M. Weinhardt and Traci Sitzmann. 2019. Revolutionizing training and education? Three questions regarding massive open online courses (MOOCs). Human Resource Management Review 29, 2 (2019), 218--225.Google ScholarCross Ref
Kay Will, Carreira Joao, Simonyan Karen, Zhang Brian, Hillier Chloe, Vijayanarasimhan Sudheendra, Viola Fabio, Green Tim, Back Trevor, Natsev Paul, Suleyman Mustafa, and Zisserman. Andrew. 2017. The kinetics human action video dataset. arXiv preprint 1705, 06950 (2017).Google Scholar
Jianfei Yang, Kai Wang, Xiaojiang Peng, and Yu Qiao. 2018. Deep recurrent multi-instance learning with spatio-temporal features for engagement intensity prediction. In Proceedings of the 2018 on International Conference on Multimodal Interaction. ACM, 594--598.Google ScholarDigital Library
Cao Zhe, Hidalgo Gines, Simon Tomas, Wei Shih-En, and Sheikh Yaser. 2018. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. arXiv preprint 1812, 08008 (2018).Google Scholar
Peng Zhou, Wei Shi, Jun Tian, Zhenyu Qi, Bingchen Li, Hongwei Hao, and Bo Xu. 2016. Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 207--212.Google ScholarCross Ref

Index Terms

Advanced Multi-Instance Learning Method with Multi-features Engineering and Conservative Optimization for Engagement Intensity Prediction
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Multi-feature and Multi-instance Learning with Anti-overfitting Strategy for Engagement Intensity Prediction
ICMI '19: 2019 International Conference on Multimodal Interaction

This paper proposes a novel engagement intensity prediction approach, which is also applied in the EmotiW Challenge 2019 and resulted in good performance. The task is to predict the engagement level when a subject student is watching an educational ...
Read More
Multi-instance clustering with applications to multi-instance prediction

In the setting of multi-instance learning, each object is represented by a bag composed of multiple instances instead of by a single instance in a traditional learning setting. Previous works in this area only concern multi-instance prediction problems ...
Read More
Cost‐effective multi‐instance multilabel active learning
Abstract
Multi‐instance multi‐label (MIML) Active Learning (M2AL) aims to improve the learner while reducing the cost as much as possible by querying informative labels of complex bags composed of diverse instances. Existing M2AL solutions suffer high ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction
October 2020
920 pages
ISBN:9781450375818
DOI:10.1145/3382507
General Chairs:
Khiet Truong
University of Twente, the Netherlands
,
Dirk Heylen
University of Twente, the Netherlands
,
Mary Czerwinski
Microsoft Research, USA
,
Program Chairs:
Nadia Berthouze
University College London, United Kingdom
,
Mohamed Chetouani
Sorbonne University, France
,
Mikio Nakano
C4A Research Institute, Japan
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 October 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
conservative optimization
cross-validation
engagement intensity prediction
multi-features engineering
multi-instance learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate453of1,080submissions,42%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 301
  Total Downloads
- Downloads (Last 12 months)36
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Advanced Multi-Instance Learning Method with Multi-features Engineering and Conservative Optimization for Engagement Intensity Prediction

ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Multi-feature and Multi-instance Learning with Anti-overfitting Strategy for Engagement Intensity Prediction

Multi-instance clustering with applications to multi-instance prediction

Cost‐effective multi‐instance multilabel active learning