Abstract
In this paper, we aim to optimize the sequencing of learning activities using the Q-learning, a reinforcement learning method. The Q-learning agent decides each time which activity to propose to the student. The sequencing policy we propose is guided by the aim to improve efficiently the student knowledge state. Thus, the Q-learning learns a mapping of the student knowledge states to the optimal activity to perform in that state.
In this paper, we tackle two main issues in implementing the Q-learning off-policy: the combinatorial explosion of the student knowledge states and the definition of the reward function allowing to improve efficiently the student knowledge state. We rely on the student model and the domain model to answer these two challenges.
We carried out a study to evaluate the approach we propose on simulated students. We show that our approach is more efficient since it achieves better learning gain with fewer activities than a random policy or an expert based policy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aleven, V., McLaughlin, E.A., Glenn, R.A., Koedinger, K.R.: Instruction based on adaptive learning technologies. In: Handbook of Research on Learning and Instruction, vol. 2, pp. 522–560 (2016)
Bassen, J., et al.: Reinforcement learning for the adaptive scheduling of educational activities. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–12 (2020)
Clement, B., Roy, D., Oudeyer, P.Y., Lopes, M.: Multi-armed bandits for intelligent tutoring systems. arXiv preprint arXiv:1310.3174 (2013)
Corbett, A.T., Anderson, J.R.: Knowledge tracing: modeling the acquisition of procedural knowledge. User Model. User-Adap. Inter. 4, 253–278 (1994)
Doroudi, S., Aleven, V., Brunskill, E.: Where’s the reward? Int. J. Artif. Intell. Educ. 29(4), 568–620 (2019)
Efremov, A., Ghosh, A., Singla, A.: Zero-shot learning of hint policy via reinforcement learning and program synthesis. In: EDM (2020)
Mandel, T., Liu, Y.E., Levine, S., Brunskill, E., Popovic, Z.: Offline policy evaluation across representations with applications to educational games. In: AAMAS, vol. 1077 (2014)
Sen, A., et al.: Machine beats human at sequencing visuals for perceptual-fluency practice. International Educational Data Mining Society (2018)
Singla, A., Rafferty, A.N., Radanovic, G., Heffernan, N.T.: Reinforcement learning for education: opportunities and challenges. arXiv preprint arXiv:2107.08828 (2021)
Sondik, E.J.: The optimal control of partially observable Markov decision processes. PhD thesis, Stanford University (1971)
Tessler, C., Mankowitz, D.J., Mannor, S.: Reward constrained policy optimization. arXiv preprint arXiv:1805.11074 (2018)
VanLehn, K.: Regulative loops, step loops and task loops. Int. J. Artif. Intell. Educ. 26, 107–112 (2016)
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3), 279–292 (1992)
Watkins, C.J.C.H.: Learning from delayed rewards (1989)
Yudelson, M.V., Koedinger, K.R., Gordon, G.J.: Individualized Bayesian knowledge tracing models. In: Lane, H.C., Yacef, K., Mostow, J., Pavlik, P. (eds.) AIED 2013. LNCS (LNAI), vol. 7926, pp. 171–180. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39112-5_18
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yessad, A. (2023). Using the ITS Components in Improving the Q-Learning Policy for Instructional Sequencing. In: Frasson, C., Mylonas, P., Troussas, C. (eds) Augmented Intelligence and Intelligent Tutoring Systems. ITS 2023. Lecture Notes in Computer Science, vol 13891. Springer, Cham. https://doi.org/10.1007/978-3-031-32883-1_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-32883-1_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-32882-4
Online ISBN: 978-3-031-32883-1
eBook Packages: Computer ScienceComputer Science (R0)