Skip to main content

Abstract

Reinforcement learning (RL) has shown significant potential for inducing data-driven scaffolding policies but designing reward functions that lead to effective policies is challenging. A promising solution is to use inverse RL to learn a reward function from effective demonstrations. This paper presents an inverse reward deep RL framework for inducing scaffolding policies in an adaptive learning environment. The framework centers on generating a data-driven model of immediate rewards by sampling high learning-gain episodes from previous student interactions and applying inverse RL. The resulting reward model is used to induce an adaptive scaffolding policy using batch constrained deep Q-learning. We evaluate this framework on data from 487 learners who completed an adaptive trianing course that provided direct instruction on principles of leading stability operations. Results show that the framework yields significantly better scaffolding policies more quickly compared to several RL baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arora, S., Doshi, P.: A survey of inverse reinforcement learning: challenges, methods and progress. Artif. Intell. 297, 1–28 (2021)

    Article  MathSciNet  Google Scholar 

  2. Sanz Ausin, M., Maniktala, M., Barnes, T., Chi, M.: Exploring the impact of simple explanations and agency on batch deep reinforcement learning induced pedagogical policies. In: Bittencourt, I.I., Cukurova, M., Muldner, K., Luckin, R., Millán, E. (eds.) AIED 2020. LNCS (LNAI), vol. 12163, pp. 472–485. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52237-7_38

    Chapter  Google Scholar 

  3. Chi, M.T.H., Wylie, R.: The ICAP framework: linking cognitive engagement to active learning outcomes. Educ. Psychol. 49(4), 219–243 (2014)

    Article  Google Scholar 

  4. Fahid, F.M., Rowe, J.P., Spain, R.D., Goldberg, B.S., Pokorny, R., Lester, J.: Adaptively scaffolding cognitive engagement with batch constrained deep Q-networks. In: Roll, I., McNamara, D., Sosnovsky, S., Luckin, R., Dimitrova, V. (eds.) AIED 2021. LNCS (LNAI), vol. 12748, pp. 113–124. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-78292-4_10

    Chapter  Google Scholar 

  5. Fujimoto, S., Meger, D., Precup, D.: Off-policy deep reinforcement learning without exploration. In: Proceedings of the 36th International Conference on Machine Learning, pp. 2052–2062. PMLR (2019)

    Google Scholar 

  6. Thomas, P.S., Brunskill, E.: Data-efficient off-policy policy evaluation for reinforcement learning. In: Proceeding of the 33rd International Conference on Machine Learning, pp. 2139–2148. PMLR (2016)

    Google Scholar 

  7. Zolna, K., et al.: Offline learning from demonstrations and unlabeled experience. In: arXiv preprint arXiv:2011.13885 (2020)

Download references

Acknowledgements

This work is supported the U.S. Army Research Laboratory under cooperative agreement W911NF-15–2-0030. The statements and opinions expressed in this article do not necessarily reflect the position or the policy of the United States Government, and no official endorsement should be inferred.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fahmid Morshed Fahid .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fahid, F.M., Rowe, J.P., Spain, R.D., Goldberg, B.S., Pokorny, R., Lester, J. (2022). Robust Adaptive Scaffolding with Inverse Reinforcement Learning-Based Reward Design. In: Rodrigo, M.M., Matsuda, N., Cristea, A.I., Dimitrova, V. (eds) Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners’ and Doctoral Consortium. AIED 2022. Lecture Notes in Computer Science, vol 13356. Springer, Cham. https://doi.org/10.1007/978-3-031-11647-6_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-11647-6_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-11646-9

  • Online ISBN: 978-3-031-11647-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics