research-article

Simulating Student Interactions with Two-stage Imitation Learning for Intelligent Educational Systems

Authors:

Enhong ChenAuthors Info & Claims

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Pages 3423 - 3432

https://doi.org/10.1145/3583780.3615060

Published: 21 October 2023 Publication History

Abstract

The fundamental task of intelligent educational systems is to offer adaptive learning services to students, such as exercise recommendations and computerized adaptive testing. However, optimizing required models in these systems would always encounter the collection difficulty of high-quality interaction data in practice. Therefore, establishing a student simulator is of great value since it can generate valid interactions to help optimize models. Existing advances have achieved success but generally suffer from exposure bias and overlook long-term intentions. To tackle these problems, we propose a novel Direct-Adversarial Imitation Student Simulator (DAISim) by formulating it as a Markov Decision Process (MDP), which unifies the workflow of the simulator in training and generating to alleviate the exposure bias and single-step optimization problems. To construct the intentions underlying the complex student interactions, we first propose a direct imitation strategy to mimic the interactions with a simple reward function. Then, we propose an adversarial imitation strategy to learn a rational distribution with the reward given by a parameterized discriminator. Furthermore, we optimize the discriminator in adversarial imitation in a pairwise manner, and the theoretical analysis shows that the pairwise discriminator would improve the generation quality. We conduct extensive experiments on real-world datasets, where the results demonstrate that our DAISim can simulate high-quality student interactions whose distribution is close to real distribution and can promote several downstream services.

References

[1]

Xueying Bai, Jian Guan, and Hongning Wang. 2019. A model-based reinforcement learning with adversarial training for online recommendation. Advances in Neural Information Processing Systems, Vol. 32 (2019).

[2]

Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. 2015. Scheduled sampling for sequence prediction with recurrent neural networks. Advances in neural information processing systems, Vol. 28 (2015).

[3]

Haoyang Bi, Haiping Ma, Zhenya Huang, Yu Yin, Qi Liu, Enhong Chen, Yu Su, and Shijin Wang. 2020. Quality meets diversity: A model-agnostic framework for computerized adaptive testing. In 2020 IEEE International Conference on Data Mining (ICDM). IEEE, 42--51.

[4]

Christopher M Bishop and Nasser M Nasrabadi. 2006. Pattern recognition and machine learning. Vol. 4. Springer.

Digital Library

[5]

Hua-Hua Chang and Zhiliang Ying. 1996. A Global Information Approach to Computerized Adaptive Testing. Applied Psychological Measurement, Vol. 20 (1996), 213 -- 229.

[6]

Mingzhi Chen, Quanlong Guan, Yizhou He, Zhenyu He, Liangda Fang, and Weiqi Luo. 2022. Knowledge Tracing Model with Learning and Forgetting Behavior. Proceedings of the 31st ACM International Conference on Information & Knowledge Management (2022).

Digital Library

[7]

Kyunghyun Cho, Bart Van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014).

[8]

Youngduck Choi, Youngnam Lee, Dongmin Shin, Junghyun Cho, Seoyon Park, Seewoo Lee, Jineon Baek, Chan Bae, Byungsoo Kim, and Jaewe Heo. 2020. Ednet: A large-scale hierarchical dataset in education. In International Conference on Artificial Intelligence in Education. Springer, 69--73.

Digital Library

[9]

Hui-Chun Chu. 2014. Potential negative effects of mobile learning on students' learning achievement and cognitive load-A format assessment perspective. Journal of Educational Technology & Society, Vol. 17, 1 (2014), 332--344.

[10]

Kamil Ciosek. 2022. Imitation Learning by Reinforcement Learning. In International Conference on Learning Representations. https://openreview.net/forum?id=1zwleytEpYx

[11]

Xinyi Dai, Jianghao Lin, Weinan Zhang, Shuai Li, Weiwen Liu, Ruiming Tang, Xiuqiang He, Jianye Hao, Jun Wang, and Yong Yu. 2021. An Adversarial Imitation Click Model for Information Retrieval. In Proceedings of the Web Conference 2021. 1809--1820.

Digital Library

[12]

Chao Feng, Defu Lian, Xiting Wang, Zheng Liu, Xing Xie, and Enhong Chen. 2023. Reinforcement Routing on Proximity Graph for Efficient Recommendation. ACM Transactions on Information Systems, Vol. 41, 1 (2023), 1--27.

Digital Library

[13]

Aritra Ghosh, Neil Heffernan, and Andrew S Lan. 2020. Context-Aware Attentive Knowledge Tracing. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.

Digital Library

[14]

Aritra Ghosh and Andrew Lan. 2021. BOBCAT: Bilevel Optimization-Based Computerized Adaptive Testing. In International Joint Conference on Artificial Intelligence.

[15]

Jonathan Ho and Stefano Ermon. 2016. Generative adversarial imitation learning. Advances in neural information processing systems, Vol. 29 (2016).

[16]

Zhenya Huang, Qi Liu, Yuying Chen, Le Wu, Keli Xiao, Enhong Chen, Haiping Ma, and Guoping Hu. 2020. Learning or forgetting? a dynamic approach for tracking the knowledge proficiency of students. ACM Transactions on Information Systems (TOIS), Vol. 38, 2 (2020), 1--33.

Digital Library

[17]

Zhenya Huang, Qi Liu, Chengxiang Zhai, Yu Yin, Enhong Chen, Weibo Gao, and Guoping Hu. 2019. Exploring multi-objective exercise recommendations in online education systems. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 1261--1270.

Digital Library

[18]

Ahmed Hussein, Mohamed Medhat Gaber, Eyad Elyan, and Chrisina Jayne. 2017. Imitation learning: A survey of learning methods. ACM Computing Surveys (CSUR), Vol. 50, 2 (2017), 1--35.

Digital Library

[19]

Alexia Jolicoeur-Martineau. 2019. The relativistic discriminator: a key element missing from standard GAN. In International Conference on Learning Representations. https://openreview.net/forum?id=S1erHoR5t7

[20]

Alexia Jolicoeur-Martineau. 2020. On relativistic f-divergences. In International Conference on Machine Learning. PMLR, 4931--4939.

[21]

Minsam Kim, Yugeun Shim, Seewoo Lee, Hyunbin Loh, and Juneyoung Park. 2021. Behavioral Testing of Deep Neural Network Knowledge Tracing Models. International Educational Data Mining Society (2021).

[22]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[23]

René F Kizilcec and Hansol Lee. 2020. Algorithmic fairness in education. arXiv preprint arXiv:2007.05443 (2020).

[24]

Karol Kurach, Mario Lucic, Xiaohua Zhai, Marcin Michalski, and Sylvain Gelly. 2019. The GAN Landscape: Losses, Architectures, Regularization, and Normalization. https://openreview.net/forum?id=rkGG6s0qKQ

[25]

Yu Lei, Hongbin Pei, Hanqi Yan, and Wenjie Li. 2020. Reinforcement learning based recommendation with graph convolutional q-network. In Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. 1757--1760.

Digital Library

[26]

Qi Liu, Zhenya Huang, Yu Yin, Enhong Chen, Hui Xiong, Yu Su, and Guoping Hu. 2019a. Ekt: Exercise-aware knowledge tracing for student performance prediction. IEEE Transactions on Knowledge and Data Engineering, Vol. 33, 1 (2019), 100--115.

Digital Library

[27]

Qi Liu, Shiwei Tong, Chuanren Liu, Hongke Zhao, Enhong Chen, Haiping Ma, and Shijin Wang. 2019b. Exploiting cognitive structure for adaptive learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 627--635.

Digital Library

[28]

Frederic M. Lord. 1980. Applications of Item Response Theory To Practical Testing Problems.

[29]

Yu Lu, Yang Pian, Penghe Chen, Qinggang Meng, and Yunbo Cao. 2021. RadarMath: An Intelligent Tutoring System for Math Education. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 16087--16090.

[30]

Shalini Pandey and George Karypis. 2019. A self-attentive model for knowledge tracing. arXiv preprint arXiv:1907.06837 (2019).

[31]

Chris Piech, Jonathan Bassen, Jonathan Huang, Surya Ganguli, Mehran Sahami, Leonidas J Guibas, and Jascha Sohl-Dickstein. 2015. Deep knowledge tracing. Advances in neural information processing systems, Vol. 28 (2015).

[32]

Siddharth Reddy, Anca D. Dragan, and Sergey Levine. 2020. SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards. In International Conference on Learning Representations. https://openreview.net/forum?id=S1xKd24twB

[33]

Siddharth Reddy, Sergey Levine, and Anca Dragan. 2017. Accelerating human learning with deep reinforcement learning. In NIPS'17 Workshop: Teaching Machines, Robots, and Humans. 5--9.

[34]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).

[35]

Shuanghong Shen, Zhenya Huang, Qi Liu, Yu Su, Shijin Wang, and Enhong Chen. 2022. Assessing Student's Dynamic Knowledge State by Exploring the Question Difficulty Effect. Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (2022).

Digital Library

[36]

Jing-Cheng Shi, Yang Yu, Qing Da, Shi-Yong Chen, and An-Xiang Zeng. 2019. Virtual-taobao: Virtualizing real-world online retail environment for reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4902--4909.

Digital Library

[37]

Jill-Jênn Vie, Fabrice Popineau, Éric Bruillard, and Yolaine Bourda. 2017. A review of recent advances in adaptive assessment. Learning analytics: fundaments, applications, and trends (2017), 113--142.

[38]

Ruohan Wang, Carlo Ciliberto, Pierluigi Vito Amadori, and Yiannis Demiris. 2019. Random expert distillation: Imitation learning via expert policy support estimation. In International Conference on Machine Learning. PMLR, 6536--6544.

[39]

Qize Xie, Liping Wang, Peidong Song, and Xuemin Lin. 2021. SQKT: A Student Attention-Based and Question-Aware Model for Knowledge Tracing. In Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data. Springer, 221--236.

[40]

Chun-Kit Yeung and Dit-Yan Yeung. 2018. Addressing two problems in deep knowledge tracing via prediction-consistent regularization. In Proceedings of the Fifth Annual ACM Conference on Learning at Scale. 1--10.

Digital Library

[41]

Jifan Yu, Yuquan Wang, Qingyang Zhong, Gan Luo, Yiming Mao, Kai Sun, Wenzheng Feng, Wei Xu, Shulin Cao, Kaisheng Zeng, et al. 2021. MOOCCubeX: a large knowledge-centered repository for adaptive learning in MOOCs. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 4643--4652.

Digital Library

[42]

Jiani Zhang, Xingjian Shi, Irwin King, and Dit-Yan Yeung. 2017. Dynamic key-value memory networks for knowledge tracing. In Proceedings of the 26th international conference on World Wide Web. 765--774.

Digital Library

[43]

Moyu Zhang, Xinning Zhu, Chunhong Zhang, Yang Ji, Feng Pan, and Changchuan Yin. 2021. Multi-factors aware dual-attentional knowledge tracing. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2588--2597.

Digital Library

[44]

Junbo Zhao, Yoon Kim, Kelly Zhang, Alexander Rush, and Yann LeCun. 2018a. Adversarially regularized autoencoders. In International conference on machine learning. PMLR, 5902--5911.

[45]

Wayne Xin Zhao, Wenhui Zhang, Yulan He, Xing Xie, and Ji-Rong Wen. 2018b. Automatically learning topics and difficulty levels of problems in online judge systems. ACM Transactions on Information Systems (TOIS), Vol. 36, 3 (2018), 1--33.

Digital Library

[46]

Xiangyu Zhao, Long Xia, Lixin Zou, Hui Liu, Dawei Yin, and Jiliang Tang. 2021. Usersim: User simulation via supervised generativeadversarial network. In Proceedings of the Web Conference 2021. 3582--3589.

Digital Library

[47]

Yan Zhuang, Qi Liu, Zhenya Huang, Zhi Li, Shuanghong Shen, and Haiping Ma. 2022. Fully Adaptive Framework: Neural Computerized Adaptive Testing for Online Education. In AAAI, Vol. 36. 4734--4742.

Cited By

Xiao QWang J(2024)DRL-SRS: A Deep Reinforcement Learning Approach for Optimizing Spaced Repetition SchedulingApplied Sciences10.3390/app1413559114:13(5591)Online publication date: 27-Jun-2024
https://doi.org/10.3390/app14135591
He LHuang ZLiu JChen EWang FSha JWang SChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Bit-mask Robust Contrastive Knowledge Distillation for Unsupervised Semantic HashingProceedings of the ACM Web Conference 202410.1145/3589334.3645440(1395-1406)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645440
Xiong ZLi HLiu ZChen ZZhou HRong WOuyang Y(2024)A Review of Data Mining in Personalized Education: Current Trends and Future ProspectsFrontiers of Digital Education10.1007/s44366-024-0019-61:1(26-50)Online publication date: 2-Jul-2024
https://doi.org/10.1007/s44366-024-0019-6

Index Terms

Simulating Student Interactions with Two-stage Imitation Learning for Intelligent Educational Systems
1. Applied computing
  1. Education
    1. E-learning
2. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
        Adversarial learning

Recommendations

Imitating Opponent to Win: Adversarial Policy Imitation Learning in Two-player Competitive Games
AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems

Recent research on vulnerabilities of deep reinforcement learning (RL) has shown that adversarial policies can influence a target RL agent (victim agent) to perform poorly. In existing studies, adversarial policies are directly trained based on ...
Improve generated adversarial imitation learning with reward variance regularization
Abstract
Imitation learning aims at recovering expert policies from limited demonstration data. Generative Adversarial Imitation Learning (GAIL) employs the generative adversarial learning framework for imitation learning and has shown great potentials. ...
Tracking the Race Between Deep Reinforcement Learning and Imitation Learning
Quantitative Evaluation of Systems
Abstract
Learning-based approaches for solving large sequential decision making problems have become popular in recent years. The resulting agents perform differently and their characteristics depend on those of the underlying learning approach. Here, we ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

October 2023

5508 pages

ISBN:9798400701245

DOI:10.1145/3583780

General Chairs:
Ingo Frommholz
University of Wolverhampton, UK
,
Frank Hopfgartner
University of Koblenz, Germany
,
Mark Lee
University of Birmingham, UK
,
Michael Oakes
University of Birmingham, UK
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Min Zhang
Tsinghua University, China
,
Rodrygo Santos
Federal University of Minas Gerais, Brazil

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

the National Key Research and Development Program of China
the National Natural Science Foundation of China
the University Synergy Innovation Program of Anhui Province

Conference

CIKM '23

Sponsor:

CIKM '23: The 32nd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2023

Birmingham, United Kingdom

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
196
Total Downloads

Downloads (Last 12 months)108
Downloads (Last 6 weeks)16

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xiao QWang J(2024)DRL-SRS: A Deep Reinforcement Learning Approach for Optimizing Spaced Repetition SchedulingApplied Sciences10.3390/app1413559114:13(5591)Online publication date: 27-Jun-2024
https://doi.org/10.3390/app14135591
He LHuang ZLiu JChen EWang FSha JWang SChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Bit-mask Robust Contrastive Knowledge Distillation for Unsupervised Semantic HashingProceedings of the ACM Web Conference 202410.1145/3589334.3645440(1395-1406)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645440
Xiong ZLi HLiu ZChen ZZhou HRong WOuyang Y(2024)A Review of Data Mining in Personalized Education: Current Trends and Future ProspectsFrontiers of Digital Education10.1007/s44366-024-0019-61:1(26-50)Online publication date: 2-Jul-2024
https://doi.org/10.1007/s44366-024-0019-6

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents