skip to main content
10.1145/3313831.3376518acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

Reinforcement Learning for the Adaptive Scheduling of Educational Activities

Published: 23 April 2020 Publication History

Abstract

Adaptive instruction for online education can increase learning gains and decrease the work required of learners, instructors, and course designers. Reinforcement Learning (RL) is a promising tool for developing instructional policies, as RL models can learn complex relationships between course activities, learner actions, and educational outcomes. This paper demonstrates the first RL model to schedule educational activities in real time for a large online course through active learning. Our model learns to assign a sequence of course activities while maximizing learning gains and minimizing the number of items assigned. Using a controlled experiment with over 1,000 learners, we investigate how this scheduling policy affects learning gains, dropout rates, and qualitative learner feedback. We show that our model produces better learning gains using fewer educational activities than a linear assignment condition, and produces similar learning gains to a self-directed condition using fewer educational activities and with lower dropout rates.

References

[1]
Jonathan Bassen, Iris Howley, Ethan Fast, John Mitchell, and Candace Thille. 2018. OARS: exploring instructor analytics for online learning. In Proceedings of the Fifth Annual ACM Conference on Learning at Scale. ACM, 55.
[2]
Léon Bottou. 2010. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010. Springer, 177--186.
[3]
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. Openai gym. arXiv preprint arXiv:1606.01540 (2016).
[4]
Catherine C Chase, Jonathan T Shemwell, and Daniel L Schwartz. 2010. Explaining across contrasting cases for deep understanding in science: An example using interactive simulations. In Proceedings of the 9th International Conference of the Learning Sciences-Volume 1. International Society of the Learning Sciences, 153--160.
[5]
Albert T. Corbett and John R. Anderson. 1994. Knowledge tracing: Modeling the acquisition of procedural knowledge. User Modeling and User-Adapted Interaction 4, 4 (01 Dec 1994), 253--278.
[6]
Shayan Doroudi, Vincent Aleven, and Emma Brunskill. 2017. Robust Evaluation Matrix: Towards a More Principled Offline Exploration of Instructional Policies. In Proceedings of the Fourth (2017) ACM Conference on Learning @ Scale (L@S '17). ACM, NY, NY, USA, 3--12.
[7]
Miroslav Dudík, John Langford, and Lihong Li. 2011. Doubly robust policy evaluation and learning. arXiv preprint arXiv:1103.4601 (2011).
[8]
Andrew J Elliot and Carol S Dweck. 2013. Handbook of competence and motivation. Guilford Publications.
[9]
Dedre Gentner. 1983. Structure-Mapping: A Theoretical Framework for Analogy. Cognitive science 7, 2 (1983), 155--170.
[10]
Elena L. Glassman, Aaron Lin, Carrie J. Cai, and Robert C. Miller. 2016. Learnersourcing Personalized Hints. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW '16). ACM, NY, NY, USA, 1626--1636.
[11]
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research), Jennifer Dy and Andreas Krause (Eds.), Vol. 80. PMLR, StockholmsmÃd'ssan, Stockholm Sweden, 1861--1870. http://proceedings.mlr.press/v80/haarnoja18b.html
[12]
Neil T Heffernan and Cristina Lindquist Heffernan. 2014. The ASSISTments Ecosystem: Building a platform that brings scientists and teachers together for minimally invasive research on human learning and teaching. International Journal of Artificial Intelligence in Education 24, 4 (2014), 470--497.
[13]
Neil T Heffernan and Kenneth R Koedinger. 1998. A developmental model for algebra symbolization: The results of a difficulty factors assessment. In Proceedings of the twentieth annual conference of the cognitive science society. Hillsdale, NJ, 484--489.
[14]
Ana Iglesias, Paloma Martínez, Ricardo Aler, and Fernando Fernández. 2009. Reinforcement learning of pedagogical policies in adaptive and intelligent educational systems. Knowledge-Based Systems 22, 4 (2009), 266--270.
[15]
Mohammad Khajah, Yun Huang, José P. González-Brenes, Michael C. Mozer, and Peter Brusilovsky. 2014. Integrating Knowledge Tracing and Item Response Theory: A Tale of Two Frameworks. In UMAP Workshops.
[16]
Juho Kim, Philip J. Guo, Daniel T. Seaton, Piotr Mitros, Krzysztof Z. Gajos, and Robert C. Miller. 2014. Understanding In-video Dropouts and Interaction Peaks Inonline Lecture Videos. In Proceedings of the First ACM Conference on Learning @ Scale Conference (L@S '14). ACM, NY, NY, USA, 31--40.
[17]
Kenneth Koedinger, Philip I Pavlik Jr, John Stamper, Tristan Nixon, and Steven Ritter. 2010. Avoiding problem selection thrashing with conjunctive knowledge tracing. In Educational data mining 2011.
[18]
Kenneth R Koedinger, Albert T Corbett, and Charles Perfetti. 2012. The Knowledge-Learning-Instruction framework: Bridging the science-practice chasm to enhance robust student learning. Cognitive science 36, 5 (2012), 757--798.
[19]
Chinmay Kulkarni, Koh Pang Wei, Huy Le, Daniel Chia, Kathryn Papadopoulos, Justin Cheng, Daphne Koller, and Scott R. Klemmer. 2013. Peer and Self Assessment in Massive Online Classes. ACM Trans. Comput.-Hum. Interact. 20, 6, Article 33 (Dec. 2013), 31 pages.
[20]
Chinmay E Kulkarni, Michael S Bernstein, and Scott R Klemmer. 2015. PeerStudio: rapid peer feedback emphasizes revision and improves performance. In Proceedings of the second (2015) ACM conference on learning@ scale. ACM, 75--84.
[21]
Robert V Lindsey, Michael C Mozer, William J Huggins, and Harold Pashler. 2013. Optimizing Instructional Policies. In Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2778--2786. http://papers.nips.cc/paper/4887-optimizing-instructional-policies.pdf
[22]
Yun-En Liu, Travis Mandel, Emma Brunskill, and Zoran Popovi´ c. 2014a. Towards Automatic Experimentation of Educational Knowledge. In Proceedings of the 32Nd Annual ACM Conference on Human Factors in Computing Systems (CHI '14). ACM, NY, NY, USA, 3349--3358.
[23]
Yun-En Liu, Travis Mandel, Emma Brunskill, and Zoran Popovic. 2014b. Trading Off Scientific Knowledge and User Learning with Multi-Armed Bandits. In EDM. 161--168.
[24]
Travis Mandel, Yun-En Liu, Sergey Levine, Emma Brunskill, and Zoran Popovic. 2014. Offline policy evaluation across representations with applications to educational games. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems, 1077--1084.
[25]
Piotr Mitros. 2015. Learnersourcing of complex assessments. In Proceedings of the Second (2015) ACM Conference on Learning@ Scale. ACM, 317--320.
[26]
Allen Newell and Paul S Rosenbloom. 1981. Mechanisms of skill acquisition and the law of practice. Cognitive skills and their acquisition 1, 1981 (1981), 1--55.
[27]
Chris Piech, Jonathan Bassen, Jonathan Huang, Surya Ganguli, Mehran Sahami, Leonidas J Guibas, and Jascha Sohl-Dickstein. 2015. Deep knowledge tracing. In Advances in neural information processing systems. 505--513.
[28]
Martha C Polson and J Jeffrey Richardson. 2013. Foundations of intelligent tutoring systems. Psychology Press.
[29]
J. vanMarrienboer K. Yates R. Clark, D. Feldon and S. Early. 2008. Handbook of research on educational communications and technology (3rd ed.). Chapter Cognitive task analysis for training, 577--593.
[30]
Georg Rasch. 1960. Studies in mathematical psychology: I. Probabilistic models for some intelligence and attainment tests. (1960).
[31]
Doug Rohrer. 2009. The effects of spacing and mixing practice problems. Journal for Research in Mathematics Education (2009), 4--17.
[32]
Michael Schaarschmidt, Sven Mika, Kai Fricke, and Eiko Yoneki. 2019. RLgraph: Modular Computation Graphs for Deep Reinforcement Learning. In Proceedings of the 2nd Conference on Systems and Machine Learning (SysML).
[33]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
[34]
Avi Segal, Yossi Ben David, Joseph Jay Williams, Kobi Gal, and Yaar Shalom. 2018. Combining Difficulty Ranking with Multi-Armed Bandits to Sequence Educational Content. In Artificial Intelligence in Education, Carolyn Penstein Rosé, Roberto Martínez-Maldonado, H. Ulrich Hoppe, Rose Luckin, Manolis Mavrikis, Kaska Porayska-Pomsta, Bruce McLaren, and Benedict du Boulay (Eds.). Springer International Publishing, Cham, 317--321.
[35]
Kikumi K Tatsuoka. 1995. Architecture of knowledge structures and cognitive diagnosis: A statistical pattern recognition and classification approach. Cognitively diagnostic assessment (1995), 327--359.
[36]
J. Sewall V. Aleven, B. McLaren and K. Koedinger. 2006. The cognitive tutor authoring tools (CTAT): preliminary evaluation of efficiency gains. In Intelligent Tutoring Systems (ITS '06). 61--70.
[37]
Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu, and Nando de Freitas. 2016. Sample efficient actor-critic with experience replay. arXiv preprint arXiv:1611.01224 (2016).
[38]
Christopher JCH Watkins and Peter Dayan. 1992. Q-learning. Machine learning 8, 3--4 (1992), 279--292.
[39]
Jacob Whitehill, Joseph Williams, Glenn Lopez, Cody Coleman, and Justin Reich. 2015. Beyond prediction: First steps toward automatic intervention in MOOC student stopout. Available at SSRN 2611750 (2015).
[40]
Joseph Jay Williams, Juho Kim, Anna Rafferty, Samuel Maldonado, Krzysztof Z. Gajos, Walter S. Lasecki, and Neil Heffernan. 2016. AXIS: Generating Explanations at Scale with Learnersourcing and Machine Learning. In Proceedings of the Third (2016) ACM Conference on Learning @ Scale (L@S '16). ACM, NY, NY, USA, 379--388.
[41]
Joseph Jay Williams, Anna N. Rafferty, Dustin Tingley, Andrew Ang, Walter S. Lasecki, and Juho Kim. 2018. Enhancing Online Problems Through Instructor-Centered Tools for Randomized Experiments. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, NY, NY, USA, Article 207, 12 pages.

Cited By

View all
  • (2024)AI alignment with changing and influenceable reward functionsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692292(5706-5756)Online publication date: 21-Jul-2024
  • (2024)Comprehensive Survey of Adaptive and Intelligent Education System Using Reinforcement LearningAI-Enhanced Teaching Methods10.4018/979-8-3693-2728-9.ch008(176-197)Online publication date: 14-Jun-2024
  • (2024)An Introduction to Reinforcement Learning and Its Application in Various DomainsDeep Learning, Reinforcement Learning, and the Rise of Intelligent Systems10.4018/979-8-3693-1738-9.ch001(1-25)Online publication date: 23-Feb-2024
  • Show More Cited By

Index Terms

  1. Reinforcement Learning for the Adaptive Scheduling of Educational Activities

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems
    April 2020
    10688 pages
    ISBN:9781450367080
    DOI:10.1145/3313831
    This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 April 2020

    Check for updates

    Badges

    • Honorable Mention

    Author Tags

    1. adaptive learning
    2. online education
    3. reinforcement learning

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    CHI '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

    Upcoming Conference

    CHI 2025
    ACM CHI Conference on Human Factors in Computing Systems
    April 26 - May 1, 2025
    Yokohama , Japan

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)238
    • Downloads (Last 6 weeks)16
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)AI alignment with changing and influenceable reward functionsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692292(5706-5756)Online publication date: 21-Jul-2024
    • (2024)Comprehensive Survey of Adaptive and Intelligent Education System Using Reinforcement LearningAI-Enhanced Teaching Methods10.4018/979-8-3693-2728-9.ch008(176-197)Online publication date: 14-Jun-2024
    • (2024)An Introduction to Reinforcement Learning and Its Application in Various DomainsDeep Learning, Reinforcement Learning, and the Rise of Intelligent Systems10.4018/979-8-3693-1738-9.ch001(1-25)Online publication date: 23-Feb-2024
    • (2024)Learn with M.E.—Let Us Boost Personalized Learning in K-12 Math Education!Education Sciences10.3390/educsci1407077314:7(773)Online publication date: 16-Jul-2024
    • (2024)Online reinforcement learning-based pedagogical planning for narrative-centered learning environmentsProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i21.30365(23191-23199)Online publication date: 20-Feb-2024
    • (2024)Building Human Values into Recommender Systems: An Interdisciplinary SynthesisACM Transactions on Recommender Systems10.1145/36322972:3(1-57)Online publication date: 5-Jun-2024
    • (2024)Systemization of Knowledge (SoK): Creating a Research Agenda for Human-Centered Real-Time Risk Detection on Social Media PlatformsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642315(1-21)Online publication date: 11-May-2024
    • (2024)AMMA: Adaptive Multimodal Assistants Through Automated State Tracking and User Model-Directed Guidance Planning2024 IEEE Conference Virtual Reality and 3D User Interfaces (VR)10.1109/VR58804.2024.00108(892-902)Online publication date: 16-Mar-2024
    • (2024)Enhancing Procedural Writing Through Personalized Example Retrieval: A Case Study on Cooking RecipesInternational Journal of Artificial Intelligence in Education10.1007/s40593-024-00405-1Online publication date: 22-Apr-2024
    • (2024)Reinforcement learning tutor better supported lower performers in a math taskMachine Language10.1007/s10994-023-06423-9113:5(3023-3048)Online publication date: 9-Feb-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media