The Impact of Batch Deep Reinforcement Learning on Student Performance: A Simple Act of Explanation Can Go A Long Way

Sanz Ausin, Markel; Maniktala, Mehak; Barnes, Tiffany; Chi, Min

doi:10.1007/s40593-022-00312-3

The Impact of Batch Deep Reinforcement Learning on Student Performance: A Simple Act of Explanation Can Go A Long Way

Article
Published: 28 November 2022

Volume 33, pages 1031–1056, (2023)
Cite this article

International Journal of Artificial Intelligence in Education Aims and scope Submit manuscript

Markel Sanz Ausin ORCID: orcid.org/0000-0002-4526-9252¹,
Mehak Maniktala¹,
Tiffany Barnes¹ &
…
Min Chi¹

339 Accesses
1 Altmetric
Explore all metrics

Abstract

While Reinforcement learning (RL), especially Deep RL (DRL), has shown outstanding performance in video games, little evidence has shown that DRL can be successfully applied to human-centric tasks where the ultimate RL goal is to make the human-agent interactions productive and fruitful. In real-life, complex, human-centric tasks, such as education and healthcare, data can be noisy and limited. Batch RL is designed for handling such situations where data is limited yet noisy, and where building simulations is challenging. In two consecutive empirical studies, we investigated Batch DRL for pedagogical policy induction, to choose student learning activities in an Intelligent Tutoring System. In Fall 2018 (F18), we compared the Batch DRL policy to an Expert policy, but found no significant difference between the DRL and Expert policies. In Spring 2019 (S19), we augmented the Batch DRL-induced policy with a simple act of explanation by showing a message such as “The AI agent thinks you should view this problem as a Worked Example to learn how some new rules work.”. We compared this policy against two conditions, the Expert policy, and a student decision making policy. Our results show that 1) the Batch DRL policy with explanations significantly improved student learning performance more than the Expert policy; and 2) no significant differences were found between the Expert policy and student decision making. Overall, our results suggest that pairing simple explanations with the Batch DRL policy can be an important and effective technique for applying RL to real-life, human-centric tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Promises and Challenges of Artificial Intelligence for Teachers: a Systematic Review of Research

Article Open access 25 March 2022

The impact of artificial intelligence on learner–instructor interaction in online learning

Article Open access 26 October 2021

Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence

Article Open access 24 August 2023

Notes

A node is ‘needed’ when its deletion would make a solution incomplete.
More details can be found on Fall 2018 student demographics at NCSU at https://www.engr.ncsu.edu/ir/fast-facts/fall-2018-fast-facts/

References

Abdelshiheed, M., & Chi, M. (2020). Metacognition and motivation: The role of time-awareness in preparation for future learning S. Denison, M. Mack, Y. Xu, & B.C. Armstrong (Eds.).
Andrychowicz, M., Baker, B., & et al. (2018). Learning dexterous in-hand manipulation. arXiv:1808.00177.
Ausin, M. S., Azizsoltani, H., Barnes, T., & Chi, M. (2019). Leveraging deep reinforcement learning for pedagogical policy induction in an intelligent tutoring system. In Proceedings of The 12th International Conference on Educational Data Mining (EDM 2019), vol. 168, p. 177. ERIC.
Ausin, M.S., Azizsoltani, H., Ju, S., Kim, Y., & Chi, M. (2021). Infernet for delayed reinforcement tasks: Addressing the temporal credit assignment problem. In Y. Chen, H. Ludwig, Y. Tu, U.M. Fayyad, X. Zhu, X. Hu, S. Byna, X. Liu, J. Zhang, S. Pan, V. Papalexakis, J. Wang, A. Cuzzocrea, & C. Ordonez (Eds.) 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, December 15-18, 2021, pp. 1337–1348. IEEE. https://doi.org/10.1109/BigData52589.2021.9671827.
Ausin, M.S., Maniktala, M., Barnes, T., & Chi, M. (2020). Exploring the impact of simple explanations and agency on batch deep reinforcement learning induced pedagogical policies. In I.I. Bittencourt, M. Cukurova, K. Muldner, R. Luckin, & E. Millán (Eds.) Artificial Intelligence in Education - 21st International Conference, AIED 2020, Ifrane, Morocco, July 6-10, 2020, Proceedings, Part I, Lecture Notes in Computer Science, vol. 12163, pp. 472–485. Springer. https://doi.org/10.1007/978-3-030-52237-7_38.
Ausin, M.S., Maniktala, M., Barnes, T., & Chi, M. (2021). Tackling the credit assignment problem in reinforcement learning-induced pedagogical policies with neural networks. In I. Roll, D.S. McNamara, S.A. Sosnovsky, R. Luckin, & Dimitrova V. (Eds.) Artificial Intelligence in Education - 22nd International Conference, AIED 2021, Utrecht, The Netherlands, June 14-18, 2021, Proceedings, Part I, Lecture Notes in Computer Science, vol. 12748, pp. 356–368. Springer. https://doi.org/10.1007/978-3-030-78292-4_29.
Azizsoltani, H., Kim, Y. J., Ausin, M. S., Barnes, T., & Chi, M. (2019). Unobserved is not equal to non-existent: using gaussian processes to infer immediate rewards across contexts. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 1974–1980. AAAI Press.
Azizsoltani, H., & Sadeghi, E. (2018). Adaptive sequential strategy for risk estimation of engineering systems using gaussian process regression active learning. Engineering Applications of Artificial Intelligence, 74, 146–165.
Article Google Scholar
Barnes, T., & Stamper, J. (2010). Automatic hint generation for logic proof tutoring using historical data. Journal of Educational Technology & Society, 13(1), 3.
Google Scholar
Beck, J., Woolf, B. P., & Beal, C. R. (2000). Advisor: a machine learning architecture for intelligent tutor construction. In AAAI/IAAI 2000,(552-557), pp 1–2.
Behrooz, M., & Tiffany, B. (2017). Evolution of an intelligent deductive logic tutor using data-driven elements. International Journal of Artificial Intelligence in Education, 27(1), 5–36.
Article Google Scholar
Chi, M., Jordan, P.W., & VanLehn, K. (2014). When is tutorial dialogue more effective than step-based tutoring?. In S. Trausan-Matu, K.E. Boyer, M.E. Crosby, & K. Panourgia (Eds.) Intelligent Tutoring Systems - 12th International Conference, ITS 2014, Honolulu, HI, USA, June 5-9, 2014. Proceedings, Lecture Notes in Computer Science, vol. 8474, pp. 210–219. Springer. https://doi.org/10.1007/978-3-319-07221-0_25.
Chi, M., Jordan, P. W., Vanlehn, K., & Litman, D. J. (2009). To elicit or to tell: Does it matter?. In Aied, pp. 197–204.
Chi, M., VanLehn, K., Litman, D., & Jordan, P. (2011). Empirically evaluating the application of reinforcement learning to the induction of effective and adaptive pedagogical strategies. User Modeling and User-Adapted Interaction, 21 (1-2), 137–180.
Article Google Scholar
Cordova, D. I., & Lepper, M. R. (1996). Intrinsic motivation and the process of learning: Beneficial effects of contextualization, personalization, and choice. Journal of Educational Psychology, 88(4), 715.
Article Google Scholar
Dabney, W., Rowland, M., Bellemare, M. G., & Munos, R. (2017). Distributional reinforcement learning with quantile regression. arXiv:1710.10044.
Deci, E. L., Eghrari, H., Patrick, B. C., & Leone, D. R. (1994). Facilitating internalization: The self-determination theory perspective. Journal of Personality, 62(1), 119–142.
Article Google Scholar
Doroudi, S., Aleven, V., & Brunskill, E. (2019). Where’s the reward?. International Journal of Artificial Intelligence in Education, 29(4), 568–620.
Article Google Scholar
Flam, J. T., Chatterjee, S., & et al. (2012). On mmse estimation: a linear model under gaussian mixture statistics. IEEE Transactions on Signal Processing, 60(7), 3840–3845.
Article MathSciNet Google Scholar
Fujimoto, S., Conti, E., Ghavamzadeh, M., & Pineau, J. (2019). Benchmarking batch deep reinforcement learning algorithms. arXiv:1910.01708.
Fujimoto, S., Meger, D., & Precup, D. (2019). Off-policy deep reinforcement learning without exploration. In International conference on machine learning, pp. 2052–2062.
Guo, D., Shamai, S., & Verdú, S. (2004). Mutual information and minimum mean-square error in gaussian channels cs/0412108.
Hasselt, H. V. (2010). Double q-learning. In Advances in neural information processing systems, pp. 2613–2621.
Iglesias, A., Martínez, P., Aler, R., & Fernández, F. (2009). Learning teaching strategies in an adaptive and intelligent educational system through reinforcement learning. Applied Intelligence, 31(1), 89–106.
Article Google Scholar
Iglesias, A., Martínez, P., Aler, R., & Fernández, F. (2009). Reinforcement learning of pedagogical policies in adaptive and intelligent educational systems. Knowledge-Based Systems, 22(4), 266–270.
Article Google Scholar
Jaques, N., Ghandeharioun, A., Shen, J. H., Ferguson, C., Lapedriza, A., Jones, N., Gu, S., & Picard, R. (2019). Way off-policy batch deep reinforcement learning of implicit human preferences in dialog. arXiv:1907.00456.
Ju, S., Chi, M., & Zhou, G. (2020). Pick the moment: Identifying critical pedagogical decisions using long-short term rewards. In A.N. Rafferty, J. Whitehill, C. Romero, & V. Cavalli-Sforza (Eds.), Proceedings of the 13th International Conference on Educational Data Mining, EDM 2020, Fully virtual conference, July 10–13, 2020. International Educational Data Mining Society. https://educationaldatamining.org/files/conferences/EDM2020/papers/paper_167.pdf. Accessed 10 Nov 2022.
Kim, N., Lee, Y., & Park, H. (2008). Performance analysis of mimo system with linear mmse receiver. IEEE Transactions on Wireless Communications 7(11).
Kinzie, M. B., & Sullivan, H. J. (1989). Continuing motivation, learner control, and cai. Educational Technology Research and Development, 37(2), 5–14.
Article Google Scholar
Koedinger, K. R., Anderson, J. R., Hadley, W. H., & Mark, M. A. (1997). Intelligent tutoring goes to school in the big city. International Journal of Artificial Intelligence in Education (IJAIED), 8, 30–43.
Google Scholar
Kohn, A. (1993). Choices for children. Phi Delta Kappan, 75(1), 8–20.
Google Scholar
Kumar, A., Fu, J., Soh, M., Tucker, G., & Levine, S. (2019). Stabilizing off-policy q-learning via bootstrapping error reduction. In Advances in neural information processing systems, pp. 11,784–11,794.
Lange, S., Gabel, T., & Riedmiller, M. (2012). Batch reinforcement learning. In Reinforcement learning, pp. 45–73. Springer.
Laroche, R., Trichelair, P., & Des Combes, R. T. (2019). Safe policy improvement with baseline bootstrapping. In International conference on machine learning, pp. 3652–3661. PMLR.
Mandel, T., Liu, Y. E., Levine, S., Brunskill, E., & Popovic, Z. (2014). Offline policy evaluation across representations with applications to educational games. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems, pp. 1077–1084. International Foundation for Autonomous Agents and Multiagent Systems.
Maniktala, M., Cody, C., Barnes, T., & Chi, M. (2020). Avoiding help avoidance: Using interface design changes to promote unsolicited hint usage in an intelligent tutor. International Journal of Artificial Intelligence in Education, 30(4), 637–667.
Article Google Scholar
Maniktala, M., Cody, C., Isvik, A., Lytle, N., Chi, M., Barnes, T., & et al. (2020). Extending the hint factory for the assistance dilemma: a novel, data-driven helpneed predictor for proactive problem-solving help. Journal of Educational Data Mining, 12(4), 24–65.
Google Scholar
McLaren, B. M., van Gog, T., & et al. (2014). Exploring the assistance dilemma: Comparing instructional support in examples and problems. In Intelligent tutoring systems, pp. 354–361. Springer.
McLaren, B. M., & Isotani, S. (2011). When is it best to learn with all worked examples?. In AIED, pp. 222–229. Springer.
McLaren, B. M., & Isotani, S. (2011). When is it best to learn with all worked examples?. In International conference on artificial intelligence in education, pp. 222–229. Springer.
McLaren, B. M., Lim, S. J., & Koedinger, K. R. (2008). When and how often should worked examples be given to students? new results and a summary of the current state of research. In Proceedings of the 30th annual conference of the cognitive science society, pp. 2176–2181.
Minsky, M. (1961). Steps toward artificial intelligence. In Proceedings of the IRE, vol 49, pp 8–30.
Mnih, V., Kavukcuoglu, K., Silver, D., & et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529.
Article Google Scholar
Najar, A. S., & Mitrovic, A. (2016). Learning with intelligent tutors and worked examples: selecting learning activities adaptively leads to better learning outcomes than a fixed curriculum. UMUAI, 26(5), 459–491.
Google Scholar
Newell, A., & Simon, H.A. (1972). Human problem solving. vol 104, Prentice-Hall Englewood Cliffs, NJ.
Precup, D., Sutton, R. S., & Singh, S. P. (2000). Eligibility traces for off-policy policy evaluation. In ICML. pp. 759–766. Citeseer.
Rafferty, A. N., Brunskill, E., & et al. (2016). Faster teaching via pomdp planning. Cognitive Science, 40(6), 1290–1332.
Article Google Scholar
Rasmussen, C. E. (2003). Gaussian processes in machine learning. In Summer school on machine learning. pp. 63–71. Springer.
Renkl, A., Atkinson, R. K., & et al. (2002). From example study to problem solving: Smooth transitions help learning. The Journal of Experimental Education, 70(4), 293–315.
Article Google Scholar
Rowe, J. P., & Lester, J. C. (2015). Improving student problem solving in narrative-centered learning environments: a modular reinforcement learning framework. In AIED, pp. 419–428. Springer.
Salden, R. J., Aleven, V., Schwonke, R., & Renkl, A. (2010). The expertise reversal effect and worked examples in tutored problem solving. Instructional Science, 38(3), 289–307.
Article Google Scholar
Schraw, G., Flowerday, T., & Reisetter, M. F. (1998). The role of choice in reader engagement. Journal of Educational Psychology, 90(4), 705.
Article Google Scholar
Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015). Trust region policy optimization. In International conference on machine learning, pp. 1889–1897.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv:1707.06347.
Schwab, D., & Ray, S. (2017). Offline reinforcement learning with task hierarchies. Machine Learning, 106(9-10), 1569–1598.
Article MathSciNet Google Scholar
Schwonke, R., Renkl, A., Krieg, C., Wittwer, J., Aleven, V., & Salden, R. (2009). The worked-example effect: Not an artefact of lousy control conditions. Computers in Human Behavior, 25(2), 258–266.
Article Google Scholar
Shen, S., Ausin, M. S., Mostafavi, B., & Chi, M. (2018). Improving learning & reducing time: a constrained action-based reinforcement learning approach. In UMAP, pp. 43–51. ACM.
Shen, S., & Chi, M. (2016). Aim low: Correlation-based feature selection for model-based reinforcement learning. International Educational Data Mining Society.
Shen, S., & Chi, M. (2016). Reinforcement learning: the sooner the better, or the later the better?. In Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization, pp. 37–44. ACM.
Shen, S., Mostafavi, B., Lynch, C., Barnes, T., & Chi, M. (2018). Empirically evaluating the effectiveness of pomdp vs. mdp towards the pedagogical strategies induction. In International conference on artificial intelligence in education, pp. 327–331. Springer.
Shyu, H. Y., & Brown, S. W. (1992). Learner control versus program control in interactive videodisc instruction: What are the effects in procedural learning. International Journal of Instructional Media, 19(2), 85–95.
Google Scholar
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., & Lanctot, M. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484–489.
Article Google Scholar
Silver, D., Hubert, T., Schrittwieser, J., & et al. (2018). A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419), 1140–1144.
Article MathSciNet Google Scholar
Stamper, J., Barnes, T., Lehmann, L., & Croy, M. (2008). The hint factory: Automatic generation of contextualized help for existing computer aided instruction. In Proceedings of the 9th International Conference on Intelligent Tutoring Systems Young Researchers Track, pp. 71–78.
Sweller, J., & Cooper, G. A. (1985). The use of worked examples as a substitute for problem solving in learning algebra. Cognition and Instruction, 2 (1), 59–89.
Article Google Scholar
Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double q-learning. In AAAI, vol. 2, p. 5. Phoenix, AZ.
VanLehn, K., Graesser, A. C., & et al. (2007). When are tutorial dialogues more effective than reading?. Cognitive Science, 31(1), 3–62.
Article Google Scholar
Vinyals, O., Babuschkin, I., Czarnecki, W., & et al. (2019). Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575, 350.
Article Google Scholar
Wang, P., Rowe, J., Min, W., Mott, B., & Lester, J. (2017). Interactive narrative personalization with deep reinforcement learning. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence.
Yeh, S. W., & Lehman, J. D. (2001). Effects of learner control and learning strategies on english as a foreign language (efl) learning from interactive hypermedia lessons. Journal of Educational Multimedia and Hypermedia, 10(2), 141–159.
Google Scholar
Zhou, G., Azizsoltani, H., Ausin, M. S., Barnes, T., & Chi, M. (2019). Hierarchical reinforcement learning for pedagogical policy induction. In International conference on artificial intelligence in education, pp. 544–556. Springer.
Zhou, G., Azizsoltani, H., Ausin, M.S., Barnes, T., & Chi, M. (2020). Hierarchical reinforcement learning for pedagogical policy induction (extended abstract). In C. Bessiere (Ed.) Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, pp. 4691–4695. ijcai.org. https://doi.org/10.24963/ijcai.2020/647.

Download references

Acknowledgements

This research was supported by the NSF Grants: CAREER: Improving Adaptive Decision Making in Interactive Learning Environments(#1651909), Integrated Data-driven Technologies for Individualized Instruction in STEM Learning Environments(#1726550), Generalizing Data-Driven Technologies to Improve Individualized STEM Instruction by Intelligent Tutors (#2013502), Educational Data Mining for Individualized Instruction in STEM Learning Environments (#1432156).

Author information

Authors and Affiliations

North Carolina State University, Raleigh, USA
Markel Sanz Ausin, Mehak Maniktala, Tiffany Barnes & Min Chi

Authors

Markel Sanz Ausin
View author publications
You can also search for this author in PubMed Google Scholar
Mehak Maniktala
View author publications
You can also search for this author in PubMed Google Scholar
Tiffany Barnes
View author publications
You can also search for this author in PubMed Google Scholar
Min Chi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Markel Sanz Ausin.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sanz Ausin, M., Maniktala, M., Barnes, T. et al. The Impact of Batch Deep Reinforcement Learning on Student Performance: A Simple Act of Explanation Can Go A Long Way. Int J Artif Intell Educ 33, 1031–1056 (2023). https://doi.org/10.1007/s40593-022-00312-3

Download citation

Accepted: 05 September 2022
Published: 28 November 2022
Issue Date: December 2023
DOI: https://doi.org/10.1007/s40593-022-00312-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Impact of Batch Deep Reinforcement Learning on Student Performance: A Simple Act of Explanation Can Go A Long Way

Abstract

Access this article

Similar content being viewed by others

The Promises and Challenges of Artificial Intelligence for Teachers: a Systematic Review of Research

The impact of artificial intelligence on learner–instructor interaction in online learning

Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The Impact of Batch Deep Reinforcement Learning on Student Performance: A Simple Act of Explanation Can Go A Long Way

Abstract

Access this article

Similar content being viewed by others

The Promises and Challenges of Artificial Intelligence for Teachers: a Systematic Review of Research

The impact of artificial intelligence on learner–instructor interaction in online learning

Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation