skip to main content
research-article

Integrating Gaze and Mouse Via Joint Cross-Attention Fusion Net for Students' Activity Recognition in E-learning

Published:27 September 2023Publication History
Skip Abstract Section

Abstract

E-learning has emerged as an indispensable educational mode in the post-epidemic era. However, this mode makes it difficult for students to stay engaged in learning without appropriate activity monitoring. Our work explores a promising solution that combines gaze and mouse data to recognize students' activities, thereby facilitating activity monitoring and analysis during e-learning. We initially surveyed 200 students from a local university, finding more acceptance for eye trackers and mouse loggers compared to video surveillance. We then designed eight students' routine digital activities to collect a multimodal dataset and analyze the patterns and correlations between gaze and mouse across various activities. Our proposed Joint Cross-Attention Fusion Net, a multimodal activity recognition framework, leverages the gaze-mouse relationship to yield improved classification performance by integrating cross-modal representations through a cross-attention mechanism and integrating the joint features that characterize gaze-mouse coordination. Evaluation results show that our method can achieve up to 94.87% F1-score in predicting 8-classes activities, with an improvement of at least 7.44% over using gaze or mouse data independently. This research illuminates new possibilities for monitoring student engagement in intelligent education systems, also suggesting a promising strategy for melding perception and action modalities in behavioral analysis across a range of ubiquitous computing environments.

References

  1. Yomna Abdelrahman, Anam Ahmad Khan, Joshua Newn, Eduardo Velloso, Sherine Ashraf Safwat, James Bailey, Andreas Bulling, Frank Vetere, and Albrecht Schmidt. 2019. Classifying attention types with thermal imaging and eye tracking. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 3 (2019), 1--27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Karan Ahuja, Deval Shah, Sujeath Pareddy, Franceska Xhakaj, Amy Ogan, Yuvraj Agarwal, and Chris Harrison. 2021. Classroom Digital Twins with Instrumentation-Free Gaze Tracking (CHI '21). Association for Computing Machinery, New York, NY, USA, Article 484, 9 pages. https://doi.org/10.1145/3411764.3445711Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Dimah Al-Fraihat, Mike Joy, Jane Sinclair, et al. 2020. Evaluating E-learning systems success: An empirical study. Computers in human behavior 102 (2020), 67--86.Google ScholarGoogle Scholar
  4. Hans-Joachim Bieg, Lewis L Chuang, Roland W Fleming, Harald Reiterer, and Heinrich H Bülthoff. 2010. Eye and pointer coordination in search and selection tasks. In Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications. 89--92.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B Biguer, Marc Jeannerod, and C Prablanc. 1982. The coordination of eye, head, and arm movements during reaching at a single visual target. Experimental brain research 46, 2 (1982), 301--304.Google ScholarGoogle Scholar
  6. Gordon Binsted, Romeo Chua, Werner Helsen, and Digby Elliott. 2001. Eye--hand coordination in goal-directed aiming. Human movement science 20, 4-5 (2001), 563--585.Google ScholarGoogle Scholar
  7. Andreas Bulling and Daniel Roggen. 2011. Recognition of visual memory recall processes using eye movement analysis. In Proceedings of the 13th international conference on Ubiquitous computing. 455--464.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Andreas Bulling, Daniel Roggen, and Gerhard Troester. 2011. What's in the Eyes for Context-Awareness? IEEE Pervasive Computing 10, 2 (2011), 48--57. https://doi.org/10.1109/MPRV.2010.49Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Andreas Bulling, Jamie A Ward, Hans Gellersen, and Gerhard Tröster. 2009. Eye movement analysis for activity recognition. In Proceedings of the 11th international conference on Ubiquitous computing. 41--50.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Andreas Bulling, Jamie A Ward, Hans Gellersen, and Gerhard Tröster. 2010. Eye movement analysis for activity recognition using electrooculography. IEEE transactions on pattern analysis and machine intelligence 33, 4 (2010), 741--753.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cátia Cepeda, Maria Camila Dias, Dina Rindlisbacher, Marcus Cheetham, and Hugo Gamboa. 2020. Eye-pointer Coordination in a Decision-making Task Under Uncertainty.. In BIOSIGNALS. 30--37.Google ScholarGoogle Scholar
  12. Kaixuan Chen, Dalin Zhang, Lina Yao, Bin Guo, Zhiwen Yu, and Yunhao Liu. 2021. Deep Learning for Sensor-Based Human Activity Recognition: Overview, Challenges, and Opportunities. ACM Comput. Surv. 54, 4, Article 77 (may 2021), 40 pages. https://doi.org/10.1145/3447744Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Liming Chen, Chris D. Nugent, and Hui Wang. 2012. A knowledge-driven approach to activity recognition in smart homes. IEEE Transactions on Knowledge and Data Engineering 24, 6 (2012), 961--974. https://doi.org/10.1109/TKDE.2011.51Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Mon Chu Chen, John R Anderson, and Myeong Ho Sohn. 2001. What can a mouse cursor tell us more? Correlation of eye/mouse movements on web browsing. In CHI'01 extended abstracts on Human factors in computing systems. 281--282.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Zhilong Chen, Hancheng Cao, Yuting Deng, Xuan Gao, Jinghua Piao, Fengli Xu, Yu Zhang, and Yong Li. 2021. Learning from home: A mixed-methods analysis of live streaming based remote education experience in Chinese colleges during the COVID-19 pandemic. In Proceedings of the 2021 CHI Conference on human factors in computing systems. 1--16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mihaela Cocea and Stephan Weibelzahl. 2010. Disengagement detection in online learning: Validation studies and perspectives. IEEE transactions on learning technologies 4, 2 (2010), 114--124.Google ScholarGoogle Scholar
  17. Franois Courtemanche, Esma Aïmeur, Aude Dufresne, Mehdi Najjar, and Franck Mpondo. 2011. Activity recognition using eye-gaze movements and traditional interactions. Interacting with Computers 23 (2011), 202--213. Issue 3. https://doi.org/10.1016/j.intcom.2011.02.008Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Javier De Lope and Manuel Grana. 2020. Behavioral activity recognition based on gaze ethograms. International Journal of Neural Systems 30, 07 (2020), 2050025.Google ScholarGoogle ScholarCross RefCross Ref
  19. Shujie Deng, Jian Chang, Julie A Kirkby, and Jian J Zhang. 2016. Gaze--mouse coordinated movements and dependency with coordination demands in tracing. Behaviour & Information Technology 35, 8 (2016), 665--679.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google ScholarGoogle Scholar
  21. M. Ali Akber Dewan, Mahbub Murshed, and Fuhua Lin. 2019. Engagement detection in online learning: a review. Smart Learning Environments 6 (12 2019), 1. Issue 1. https://doi.org/10.1186/s40561-018-0080-zGoogle ScholarGoogle ScholarCross RefCross Ref
  22. Elena Di Lascio, Shkurta Gashi, Maike E. Debus, and Silvia Santini. 2021. Automatic Recognition of Flow During Work Activities Using Context and Physiological Signals. In 2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII). 1--8. https://doi.org/10.1109/ACII52823.2021.9597434Google ScholarGoogle ScholarCross RefCross Ref
  23. Elena Di Lascio, Shkurta Gashi, Juan Sebastian Hidalgo, Beatrice Nale, Maike E Debus, and Silvia Santini. 2020. A multi-sensor approach to automatically recognize breaks and work activities of knowledge workers in academia. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 3 (2020), 1--20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Elena Di Lascio, Shkurta Gashi, and Silvia Santini. 2018. Unobtrusive Assessment of Students' Emotional Engagement during Lectures Using Electrodermal Activity Sensors. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 3 (2018), 1--21. https://doi.org/10.1145/3264913Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. He Du, Zhiwen Yu, Dong Xiao, Zhu Wang, Qi Han, and Bin Guo. 2017. Sensing keyboard input for computer activity recognition with a smartphone. In Proceedings of the 2017 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2017 ACM International Symposium on Wearable Computers. 25--28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Andrew T Duchowski, Krzysztof Krejtz, Izabela Krejtz, Cezary Biele, Anna Niedzielska, Peter Kiefer, Martin Raubal, and Ioannis Giannopoulos. 2018. The index of pupillary activity: Measuring cognitive load vis-à-vis task difficulty with pupil oscillation. In Proceedings of the 2018 CHI conference on human factors in computing systems. 1--13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Selina N Emhardt, Halszka Jarodzka, Saskia Brand-Gruwel, Christian Drumm, Diederick C Niehorster, and Tamara van Gog. 2022. What is my teacher talking about? Effects of displaying the teacher's gaze and mouse cursor cues in video lectures on students' learning. Journal of Cognitive Psychology 34, 7 (2022), 846--864.Google ScholarGoogle ScholarCross RefCross Ref
  28. Wolfgang Fuhl, Nikolai Sanamrad, and Enkelejda Kasneci. 2021. The gaze and mouse signal as additional source for user fingerprints in browser applications. arXiv preprint arXiv:2101.03793 (2021).Google ScholarGoogle Scholar
  29. Chenyang Gao, Ivan Marsic, Aleksandra Sarcevic, Waverly Gestrich-Thompson, and Randall S Burd. 2023. Real-time Context-Aware Multimodal Network for Activity and Activity-Stage Recognition from Team Communication in Dynamic Clinical Settings. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 7, 1 (2023), 1--28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Nan Gao, Wei Shao, Mohammad Saiedur Rahaman, and Flora D. Salim. 2020. N-Gage: Predicting in-Class Emotional, Behavioural and Cognitive Engagement in the Wild. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 4, 3, Article 79 (sep 2020), 26 pages. https://doi.org/10.1145/3411813Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. K. Gidlf, A. Wallin, R. Dewhurst, and K. Holmqvist. 2013. Using Eye Tracking to Trace a Cognitive Process: Gaze Behaviour During Decision Making in a Natural Environment. Journal of Eye Movement Research 6, 1 (2013), 613--619.Google ScholarGoogle Scholar
  32. Benjamin S Goldberg, Robert A Sottilare, Keith W Brawner, and Heather K Holden. 2011. Predicting learner engagement during well-defined and ill-defined computer-based intercultural interactions. In International Conference on Affective Computing and Intelligent Interaction. Springer, 538--547.Google ScholarGoogle ScholarCross RefCross Ref
  33. Qi Guo and Eugene Agichtein. 2010. Towards Predicting Web Searcher Gaze Position from Mouse Movements. In CHI '10 Extended Abstracts on Human Factors in Computing Systems (Atlanta, Georgia, USA) (CHI EA '10). Association for Computing Machinery, New York, NY, USA, 3601--3606. https://doi.org/10.1145/1753846.1754025Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Mariam Hassib, Stefan Schneegass, Philipp Eiglsperger, Niels Henze, Albrecht Schmidt, and Florian Alt. 2017. EngageMeter: A System for Implicit Audience Engagement Sensing Using Electroencephalography. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI '17). Association for Computing Machinery, New York, NY, USA, 5114--5119. https://doi.org/10.1145/3025453.3025669Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Chengkun He, Jie Shao, and Jiayu Sun. 2018. An anomaly-introduced learning method for abnormal event detection. Multimedia Tools and Applications 77, 22 (2018), 29573--29588.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Fengxiang He, Tongliang Liu, and Dacheng Tao. 2020. Why resnet works? residuals generalize. IEEE transactions on neural networks and learning systems 31, 12 (2020), 5349--5362.Google ScholarGoogle ScholarCross RefCross Ref
  37. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  38. Curtis R. Henrie, Robert Bodily, Ross Larsen, and Charles R. Graham. 2018. Exploring the potential of LMS log data as a proxy measure of student engagement. Journal of Computing in Higher Education 30 (8 2018), 344--362. Issue 2. https://doi.org/10.1007/s12528-017-9161-1Google ScholarGoogle ScholarCross RefCross Ref
  39. Brad Hodge, Brad Wright, and Pauleen Bennett. 2018. The role of grit in determining engagement and academic outcomes for university students. Research in Higher Education 59, 4 (2018), 448--460.Google ScholarGoogle ScholarCross RefCross Ref
  40. Jeff Huang, Ryen White, and Georg Buscher. 2012. User see, user pointGaze and Cursor Alignment in Web Search. In Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems - CHI '12. ACM Press, New York, New York, USA, 1341. https://doi.org/10.1145/2207676.2208591Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jeff Huang, Ryen W White, and Susan Dumais. 2011. No clicks, no problem: using cursor movements to understand and improve search. In Proceedings of the SIGCHI conference on human factors in computing systems. 1225--1234.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Stephen Hutt, Kristina Krasich, James R. Brockmole, and Sidney K. D'Mello. 2021. Breaking out of the Lab: Mitigating Mind Wandering with Gaze-Based Attention-Aware Technology in Classrooms. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI '21). Association for Computing Machinery, New York, NY, USA, Article 52, 14 pages. https://doi.org/10.1145/3411764.3445269Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Stephen Hutt, Caitlin Mills, Nigel Bosch, Kristina Krasich, James Brockmole, and Sidney D'mello. 2017. "Out of the Fr-Eye-ing Pan" Towards Gaze-Based Models of Attention during Learning with Technology in the Classroom. In Proceedings of the 25th conference on user modeling, adaptation and personalization. 94--103.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Md Rabiul Islam, Shuji Sakamoto, Yoshihiro Yamada, Andrew W Vargo, Motoi Iwata, Masakazu Iwamura, and Koichi Kise. 2021. Self-supervised Learning for Reading Activity Classification. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 3 (2021), 1--22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, and Pierre-Alain Muller. 2019. Deep learning for time series classification: a review. Data mining and knowledge discovery 33, 4 (2019), 917--963.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Roland S Johansson, Göran Westling, Anders Bäckström, and J Randall Flanagan. 2001. Eye--hand coordination in object manipulation. Journal of neuroscience 21, 17 (2001), 6917--6932.Google ScholarGoogle ScholarCross RefCross Ref
  47. Harmanpreet Kaur, Alex C. Williams, Daniel McDuff, Mary Czerwinski, Jaime Teevan, and Shamsi T. Iqbal. 2020. Optimizing for Happiness and Productivity: Modeling Opportune Moments for Transitions and Breaks at Work. Conference on Human Factors in Computing Systems - Proceedings (2020), 1--15. https://doi.org/10.1145/3313831.3376817Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Ilan Kirsh. 2020. Directions and Speeds of Mouse Movements on a Website and Reading Patterns: A Web Usage Mining Case Study. In Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics (Biarritz, France) (WIMS 2020). Association for Computing Machinery, New York, NY, USA, 129--138. https://doi.org/10.1145/3405962.3405982Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Ilan Kirsh and Mike Joy. 2020. Exploring Pointer Assisted Reading (PAR): Using Mouse Movements to Analyze Web Users' Reading Behaviors and Patterns. In HCI International 2020 - Late Breaking Papers: Multimodality and Intelligence, Constantine Stephanidis, Masaaki Kurosu, Helmut Degen, and Lauren Reinerman-Jones (Eds.). Springer International Publishing, Cham, 156--173.Google ScholarGoogle Scholar
  50. Saskia Koldijk, Mark van Staalduinen, Mark Neerincx, and Wessel Kraaij. 2012. Real-Time Task Recognition Based on Knowledge Workers' Computer Activities. In Proceedings of the 30th European Conference on Cognitive Ergonomics (Edinburgh, United Kingdom) (ECCE '12). Association for Computing Machinery, New York, NY, USA, 152--159. https://doi.org/10.1145/2448136.2448170Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Thomas Kosch, Mariam Hassib, Paweł W Woźniak, Daniel Buschek, and Florian Alt. 2018. Your eyes tell: Leveraging smooth pursuit for assessing cognitive workload. In Proceedings of the 2018 chi conference on human factors in computing systems. 1--13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Manu Kumar, Jeff Klingner, Rohan Puranik, Terry Winograd, and Andreas Paepcke. 2008. Improving the accuracy of gaze input for interaction. In Proceedings of the 2008 symposium on Eye tracking research & applications. 65--68.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Kai Kunze, Yuzuko Utsumi, Yuki Shiga, Koichi Kise, and Andreas Bulling. 2013. I know what you are reading - recognition of document types using mobile eye tracking. ISWC 2013 - Proceedings of the 2013 ACM International Symposium on Wearable Computers (2013), 113--116. https://doi.org/10.1145/2493988.2494354Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Tiffany C K Kwok, Eugene Yujun Fu, Erin You Wu, Michael Xuelin Huang, Grace Ngai, and Hong-Va Leong. 2018. Every Little Movement Has a Meaning of Its Own: Using Past Mouse Movements to Predict the Next Interaction. In 23rd International Conference on Intelligent User Interfaces. 397--401.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Guohao Lan, Bailey Heit, Tim Scargill, and Maria Gorlatova. 2020. GazeGraph: Graph-based few-shot cognitive context sensing from human visual behavior. SenSys 2020 - Proceedings of the 2020 18th ACM Conference on Embedded Networked Sensor Systems (2020), 422--435. https://doi.org/10.1145/3384419.3430774Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Jiajia Li, Grace Ngai, Hong Va Leong, and Stephen CF Chan. 2016. Multimodal human attention detection for reading from facial expression, eye gaze, and mouse dynamics. ACM SIGAPP Applied Computing Review 16, 3 (2016), 37--49.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Jonathan Liebers, Patrick Horn, Christian Burschik, Uwe Gruenefeld, and Stefan Schneegass. 2021. Using gaze behavior and head orientation for implicit identification in virtual reality. In Proceedings of the 27th ACM Symposium on Virtual Reality Software and Technology. 1--9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Daniel J. Liebling and Susan T. Dumais. 2014. Gaze and Mouse Coordination in Everyday Work (UbiComp '14 Adjunct). Association for Computing Machinery, New York, NY, USA, 1141--1150. https://doi.org/10.1145/2638728.2641692Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Shengzhong Liu, Shuochao Yao, Jinyang Li, Dongxin Liu, Tianshi Wang, Huajie Shao, and Tarek Abdelzaher. 2020. GlobalFusion: A global attentional deep learning framework for multisensor information fusion. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 1 (2020), 1--27. https://doi.org/10.1145/3380999Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Christof Lutteroth, Moiz Penkar, and Gerald Weber. 2015. Gaze vs. mouse: A fast and accurate gaze-only click alternative. In Proceedings of the 28th annual ACM symposium on user interface software & technology. 385--394.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Mathias N. Lystbæk, Peter Rosenberg, Ken Pfeuffer, Jens Emil Grønbæk, and Hans Gellersen. 2022. Gaze-Hand Alignment: Combining Eye Gaze and Mid-Air Pointing for Interacting with Menus in Augmented Reality. Proc. ACM Hum.-Comput. Interact. 6, ETRA, Article 145 (may 2022), 18 pages. https://doi.org/10.1145/3530886Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Shuai Ma, Taichang Zhou, Fei Nie, and Xiaojuan Ma. 2022. Glancee: An Adaptable System for Instructors to Grasp Student Learning Status in Synchronous Online Classes. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI '22). Association for Computing Machinery, New York, NY, USA, Article 313, 25 pages. https://doi.org/10.1145/3491102.3517482Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Abdelsalam M Maatuk, Ebitisam K Elberkawi, Shadi Aljawarneh, Hasan Rashaideh, and Hadeel Alharbi. 2022. The COVID-19 pandemic and E-learning: challenges and opportunities from the perspective of students and instructors. 34 (2022), 21--38. https://doi.org/10.1007/s12528-021-09274-2Google ScholarGoogle ScholarCross RefCross Ref
  64. Gloria Mark, Shamsi T. Iqbal, Mary Czerwinski, and Paul Johns. 2014. Bored Mondays and Focused Afternoons: The Rhythm of Attention and Online Activity in the Workplace. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Toronto, Ontario, Canada) (CHI '14). Association for Computing Machinery, New York, NY, USA, 3025--3034. https://doi.org/10.1145/2556288.2557204Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. André N Meyer, Laura E Barton, Gail C Murphy, Thomas Zimmermann, and Thomas Fritz. 2017. The work life of developers: Activities, switches and perceived productivity. IEEE Transactions on Software Engineering 43, 12 (2017), 1178--1193.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Andre N. Meyer, Chris Satterfield, Manuela Zuger, Katja Kevic, Gail C. Murphy, Thomas Zimmermann, and Thomas Fritz. 2020. Detecting Developers' Task Switches and Types. IEEE Transactions on Software Engineering (2020), 1--1. https://doi.org/10.1109/TSE.2020.2984086Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Johannes Meyer, Adrian Frank, Thomas Schlebusch, and Enkeljeda Kasneci. 2021. A CNN-Based Human Activity Recognition System Combining a Laser Feedback Interferometry Eye Movement Sensor and an IMU for Context-Aware Smart Glasses. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 4 (2021), 1--24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Alexandre Milisavljevic, Kevin Hamard, Coralie Petermann, Bernard Gosselin, Karine Doré-Mazars, and Matei Mancas. 2018. Eye and mouse coordination during task: from behaviour to prediction. In International Conference on Human Computer Interaction Theory and Applications. SCITEPRESS-Science and Technology Publications, 86--93.Google ScholarGoogle ScholarCross RefCross Ref
  69. Hamed Monkaresi, Nigel Bosch, Rafael A Calvo, and Sidney K D'Mello. 2016. Automated detection of engagement using video-based estimation of facial expressions and heart rate. IEEE Transactions on Affective Computing 8, 1 (2016), 15--28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Prasanth Murali, Javier Hernandez, Daniel McDuff, Kael Rowan, Jina Suh, and Mary Czerwinski. 2021. AffectiveSpotlight: Facilitating the Communication of Affective Responses from Audience Members during Online Presentations (CHI '21). Association for Computing Machinery, New York, NY, USA, Article 247, 13 pages. https://doi.org/10.1145/3411764.3445235Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Vidhya Navalpakkam, LaDawn Jentzsch, Rory Sayres, Sujith Ravi, Amr Ahmed, and Alex Smola. 2013. Measurement and modeling of eye-mouse behavior in the presence of nonlinear page layouts. In Proceedings of the 22nd international conference on World Wide Web. 953--964.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Sebastiaan FW Neggers and Harold Bekkering. 2000. Ocular gaze is anchored to the target of an ongoing pointing movement. Journal of Neurophysiology 83, 2 (2000), 639--651.Google ScholarGoogle ScholarCross RefCross Ref
  73. Farzan Majeed Noori, Michael Riegler, Md Zia Uddin, and Jim Torresen. 2020. Human Activity Recognition from Multiple Sensors Data Using Multi-Fusion Representations and CNNs. ACM Trans. Multimedia Comput. Commun. Appl. 16, 2, Article 45 (may 2020), 19 pages. https://doi.org/10.1145/3377882Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Henry Friday Nweke, Ying Wah Teh, Ghulam Mujtaba, and Mohammed Ali Al-Garadi. 2019. Data fusion and multiple classifier systems for human activity detection and health monitoring: Review and open research directions. Information Fusion 46 (2019), 147--170.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Chakradhar Pabba and Praveen Kumar. 2022. An intelligent system for monitoring students' engagement in large classroom teaching through facial expression recognition. Expert Systems 39, 1 (2022), e12839.Google ScholarGoogle ScholarCross RefCross Ref
  76. Erfan Pakdamanian, Shili Sheng, Sonia Baee, Seongkook Heo, Sarit Kraus, and Lu Feng. 2021. DeepTake: Prediction of Driver Takeover Behavior Using Multimodal Data. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI '21). Association for Computing Machinery, New York, NY, USA, Article 103, 14 pages. https://doi.org/10.1145/3411764.3445563Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Aditya Prakash, Kashyap Chitta, and Andreas Geiger. 2021. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7073--7083. https://doi.org/10.1109/CVPR46437.2021.00700Google ScholarGoogle ScholarCross RefCross Ref
  78. R Gnana Praveen, Wheidima Carneiro de Melo, Nasib Ullah, Haseeb Aslam, Osama Zeeshan, Théo Denorme, Marco Pedersoli, Alessandro L Koerich, Simon Bacon, Patrick Cardinal, et al. 2022. A joint cross-attention model for audio-visual fusion in dimensional emotion recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2486--2495.Google ScholarGoogle ScholarCross RefCross Ref
  79. Valentin Radu, Catherine Tong, Sourav Bhattacharya, Nicholas D. Lane, Cecilia Mascolo, Mahesh K. Marina, and Fahim Kawsar. 2018. Multimodal Deep Learning for Activity and Context Recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 4, Article 157 (jan 2018), 27 pages. https://doi.org/10.1145/3161174Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Rachid Riad Saboundji and Robert Adrian Rill. 2020. Predicting Human Errors from Gaze and Cursor Movements. Proceedings of the International Joint Conference on Neural Networks (jul 2020), 1--8. https://doi.org/10.1109/IJCNN48605.2020.9207189Google ScholarGoogle ScholarCross RefCross Ref
  81. Thomais Rousoulioti, Dina Tsagari, and Christina Nicole Giannikas. 2022. Parents' New Role and Needs During the COVID-19 Educational Emergency. Interchange (2022), 1--27.Google ScholarGoogle Scholar
  82. Mohamed Sathik and Sofia G Jonathan. 2013. Effect of facial expressions on student's comprehension recognition in virtual educational environments. SpringerPlus 2, 1 (2013), 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  83. Carlos H Schenck, Scott R Bundlie, Andrea L Patterson, and Mark W Mahowald. 1987. Rapid eye movement sleep behavior disorder: a treatable parasomnia affecting older adults. Jama 257, 13 (1987), 1786--1789.Google ScholarGoogle ScholarCross RefCross Ref
  84. Chao Shen, Zhongmin Cai, Xiaohong Guan, Youtian Du, and Roy A Maxion. 2012. User authentication through mouse dynamics. IEEE Transactions on Information Forensics and Security 8, 1 (2012), 16--30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Chao Shen, Zhongmin Cai, Xiaomei Liu, Xiaohong Guan, and Roy A Maxion. 2016. MouseIdentity: Modeling mouse-interaction behavior for a user verification system. IEEE Transactions on Human-Machine Systems 46, 5 (2016), 734--748.Google ScholarGoogle ScholarCross RefCross Ref
  86. Joo-Hyun Song and Ken Nakayama. 2009. Hidden cognitive states revealed in choice reaching tasks. Trends in cognitive sciences 13, 8 (2009), 360--366.Google ScholarGoogle Scholar
  87. Yunpeng Song and Zhongmin Cai. 2022. Integrating Handcrafted Features with Deep Representations for Smartphone Authentication. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 1 (2022), 1--27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Namrata Srivastava, Joshua Newn, and Eduardo Velloso. 2018. Combining low and mid-level gaze features for desktop activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 4 (2018), 1--27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Ben Steichen, Cristina Conati, and Giuseppe Carenini. 2014. Inferring visualization task properties, user performance, and user cognitive abilities from eye gaze data. ACM Transactions on Interactive Intelligent Systems 4, 2 (2014). https://doi.org/10.1145/2633043Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. David Stromback, Sangxia Huang, and Valentin Radu. 2020. Mm-fit Multimodal deep learning for automatic exercise logging across sensing devices. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4 (2020). Issue 4. https://doi.org/10.1145/3432701Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Han Su, Shuncheng Liu, Bolong Zheng, Xiaofang Zhou, and Kai Zheng. 2020. A survey of trajectory distance measures and performance evaluation. The VLDB Journal 29 (2020), 3--32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Wei Sun, Yunzhi Li, Feng Tian, Xiangmin Fan, and Hongan Wang. 2019. How Presenters Perceive and React to Audience Flow Prediction In-Situ: An Explorative Study of Live Online Lectures. Proc. ACM Hum.-Comput. Interact. 3, CSCW, Article 162 (nov 2019), 19 pages. https://doi.org/10.1145/3359264Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Praveen Sundar and S Kumar. 2016. Disengagement detection in online learning using log file analysis. International journal of computer technology and applications 9, 27 (2016), 195--301.Google ScholarGoogle Scholar
  94. Luke Sy, Michael Raitor, Michael Del Rosario, Heba Khamis, Lauren Kark, Nigel H Lovell, and Stephen J Redmond. 2020. Estimating lower limb kinematics using a reduced wearable sensor count. IEEE Transactions on Biomedical Engineering 68, 4 (2020), 1293--1304.Google ScholarGoogle ScholarCross RefCross Ref
  95. Daniel Szafir and Bilge Mutlu. 2012. Pay Attention! Designing Adaptive Agents That Monitor and Improve User Engagement. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Austin, Texas, USA) (CHI '12). Association for Computing Machinery, New York, NY, USA, 11--20. https://doi.org/10.1145/2207676.2207679Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler. 2022. Efficient transformers: A survey. Comput. Surveys 55, 6 (2022), 1--28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Jia-Hua Tsai and Wei-Ta Chu. 2022. Multimodal Fusion with Cross-Modal Attention for Action Recognition in Still Images. In Proceedings of the 4th ACM International Conference on Multimedia in Asia (Tokyo, Japan) (MMAsia '22). Association for Computing Machinery, New York, NY, USA, Article 31, 5 pages. https://doi.org/10.1145/3551626.3564960Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google ScholarGoogle Scholar
  99. Jun Wang, Michael Xuelin Huang, Grace Ngai, and Hong Va Leong. 2017. Are you stressed? Your eyes and the mouse can tell. In 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 222--228.Google ScholarGoogle ScholarCross RefCross Ref
  100. Datong Wei, Chaofan Yang, Xiaolong (Luke) Zhang, and Xiaoru Yuan. 2021. Predicting Mouse Click Position Using Long Short-Term Memory Model Trained by Joint Loss Function. Conference on Human Factors in Computing Systems - Proceedings (2021). https://doi.org/10.1145/3411763.3451651Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Huan Wei, Haotian Li, Meng Xia, Yong Wang, and Huamin Qu. 2020. Predicting student performance in interactive online question pools using mouse interaction features. In Proceedings of the tenth international conference on learning analytics & knowledge. 645--654.Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Jacob Whitehill, Zewelanji Serpell, Yi-Ching Lin, Aysha Foster, and Javier R Movellan. 2014. The faces of engagement: Automatic recognition of student engagementfrom facial expressions. IEEE Transactions on Affective Computing 5, 1 (2014), 86--98.Google ScholarGoogle ScholarCross RefCross Ref
  103. Yang Xian, Xuejian Rong, Xiaodong Yang, and Yingli Tian. 2016. Evaluation of low-level features for real-world surveillance event detection. IEEE Transactions on Circuits and Systems for Video Technology 27, 3 (2016), 624--634.Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. Pingmei Xu, Yusuke Sugano, and Andreas Bulling. 2016. Spatio-temporal modeling and prediction of visual attention in graphical user interfaces. Conference on Human Factors in Computing Systems - Proceedings (2016), 3299--3310. https://doi.org/10.1145/2858036.2858479Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. Fei Yan, Nan Wu, Abdullah M Iliyasu, Kazuhiko Kawamoto, and Kaoru Hirota. 2022. Framework for identifying and visualising emotional atmosphere in online learning environments in the COVID-19 Era. Applied Intelligence 52, 8 (2022), 9406--9422.Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Jian Bo Yang, Minh Nhut Nguyen, Phyo Phyo San, Xiao Li Li, and Shonali Krishnaswamy. 2015. Deep Convolutional Neural Networks on Multichannel Time Series for Human Activity Recognition. In Proceedings of the 24th International Conference on Artificial Intelligence (Buenos Aires, Argentina) (IJCAI'15). AAAI Press, 3995--4001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. Matin Yarmand, Jaemarie Solyst, Scott Klemmer, and Nadir Weibel. 2021. "It Feels Like I Am Talking into a Void": Understanding Interaction Gaps in Synchronous Online Classrooms (CHI '21). Association for Computing Machinery, New York, NY, USA, Article 351, 9 pages. https://doi.org/10.1145/3411764.3445240Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. Woo-Han Yun, Dongjin Lee, Chankyu Park, Jaehong Kim, and Junmo Kim. 2018. Automatic recognition of children engagement from facial video using convolutional neural networks. IEEE Transactions on Affective Computing 11, 4 (2018), 696--707.Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. Ming Zeng, Le T Nguyen, Bo Yu, Ole J Mengshoel, Jiang Zhu, Pang Wu, and Joy Zhang. 2014. Convolutional neural networks for human activity recognition using mobile sensors. In 6th international conference on mobile computing, applications and services. IEEE, 197--205.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Integrating Gaze and Mouse Via Joint Cross-Attention Fusion Net for Students' Activity Recognition in E-learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
      Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 7, Issue 3
      September 2023
      1734 pages
      EISSN:2474-9567
      DOI:10.1145/3626192
      Issue’s Table of Contents

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 September 2023
      Published in imwut Volume 7, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)262
      • Downloads (Last 6 weeks)42

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader