research-article

Graph Enhanced Hierarchical Reinforcement Learning for Goal-oriented Learning Path Recommendation

Authors:

Yong YuAuthors Info & Claims

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Pages 1318 - 1327

https://doi.org/10.1145/3583780.3614897

Published: 21 October 2023 Publication History

Abstract

Goal-oriented Learning path recommendation aims to recommend learning items (concepts or exercises) step-by-step to a learner to promote the mastery level of her specific learning goals. By formulating this task as a Markov decision process, reinforcement learning (RL) methods have demonstrated great power. Although extensive research efforts have been made, previous methods still fail to recommend effective goal-oriented paths due to the under-utilizing of goals. Specifically, it is mainly reflected in two aspects: (1)The lack of goal planning. When learners have multiple goals with different difficulties, the previous methods can't fully utilize the difficulties and dependencies between goal learning items to plan the sequence of achieving these goals, making the path chaotic and inefficient; (2)The lack of efficiency in goal achieving. When pursuing a single goal, the path may contain learning items unrelated to the goal, which makes realizing a certain goal inefficient. To address these challenges, we present a novel Graph Enhanced Hierarchical Reinforcement Learning (GEHRL) framework for goal-oriented learning path recommendation. The framework divides learning path recommendation into two parts: sub-goal selection(planning) and sub-goal achieving(learning item recommendation). Specifically, we employ a high-level agent as a sub-goal selector to select sub-goals for the low-level agent to achieve. The low-level agent in the framework is to recommend learning items to the learner. To make the path only contain goal-related learning items to improve the efficiency of achieving the goal, we develop a graph-based candidate selector to constrain the action space of the low-level agent based on the sub-goal and knowledge graph. We also develop test-based internal reward for low-level training so that the sparsity problem of external reward can be alleviated. Extensive experiments on three different simulators demonstrate our framework achieves state-of-the-art performance.

References

[1]

2023. MindSpore. https://www.mindspore.cn/

[2]

Cun-Ling Bian, De-Liang Wang, Shi-Yu Liu, Wei-Gang Lu, and Jun-Yu Dong. 2019. Adaptive learning path recommendation based on graph theory and an improved immune algorithm. KSII Transactions on Internet and Information Systems (TIIS) 13, 5 (2019), 2277--2298.

[3]

Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. Advances in neural information processing systems 26 (2013).

[4]

Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. Openai gym. arXiv preprint arXiv:1606.01540 (2016).

[5]

Haw-Shiuan Chang, Hwai-Jung Hsu, and Kuan-Ta Chen. 2015. Modeling Exercise Relationships in E-Learning: A Unified Approach. In EDM. 532--535.

[6]

Haw-Shiuan Chang, Hwai-Jung Hsu, and Kuan-Ta Chen. 2015. Modeling Exercise Relationships in E-Learning: A Unified Approach. In EDM.

[7]

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).

[8]

Benoit Choffin, Fabrice Popineau, Yolaine Bourda, and Jill-Jenn Vie. 2019. DAS3H: modeling student learning and forgetting for optimally scheduling distributed practice of skills. arXiv preprint arXiv:1905.06873 (2019).

[9]

Pragya Dwivedi, Vibhor Kant, and Kamal K Bharadwaj. 2018. Learning path recommendation based on modified variable length genetic algorithm. Education and information technologies 23, 2 (2018), 819--836.

[10]

Lumbardh Elshani and Krenare Pireva Nuçi. 2021. Constructing a personalized learning path using genetic algorithms approach. arXiv preprint arXiv:2104.11276 (2021).

[11]

Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855--864.

Digital Library

[12]

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning. PMLR, 1861--1870.

[13]

Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).

[14]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.

Digital Library

[15]

Wacharawan Intayoad, Chayapol Kamyod, and Punnarumol Temdee. 2020. Reinforcement learning based on contextual bandits for personalized online learning recommendation systems. Wireless Personal Communications 115, 4 (2020), 2917--2932.

Digital Library

[16]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[17]

Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).

[18]

Vijay Konda and John Tsitsiklis. 1999. Actor-critic algorithms. Advances in neural information processing systems 12 (1999).

[19]

Yoshiki Kubotani, Yoshihiro Fukuhara, and Shigeo Morishima. 2021. RLTutor: Reinforcement Learning Based Adaptive Tutoring System by Modeling Virtual Student with Fewer Interactions. arXiv preprint arXiv:2108.00268 (2021).

[20]

Tejas D Kulkarni, Karthik Narasimhan, Ardavan Saeedi, and Josh Tenenbaum. 2016. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. Advances in neural information processing systems 29 (2016).

[21]

Yuanguo Lin, Shibo Feng, Fan Lin, Wenhua Zeng, Yong Liu, and Pengcheng Wu. 2021. Adaptive course recommendation in MOOCs. Knowledge-Based Systems 224 (2021), 107085.

[22]

Qi Liu, Shiwei Tong, Chuanren Liu, Hongke Zhao, Enhong Chen, Haiping Ma, and Shijin Wang. 2019. Exploiting cognitive structure for adaptive learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 627--635.

Digital Library

[23]

Teng Long, Ryan Lowe, Jackie Chi Kit Cheung, and Doina Precup. 2016. Leveraging lexical resources for learning entity embeddings in multi-relational data. arXiv preprint arXiv:1605.05416 (2016).

[24]

Frederic Lord. 1952. A theory of test scores. Psychometric monographs (1952).

[25]

Frederic M Lord. 2012. Applications of item response theory to practical testing problems. Routledge.

[26]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).

[27]

Hiromi Nakagawa, Yusuke Iwasawa, and Yutaka Matsuo. 2019. Graph-based knowledge tracing: modeling student proficiency using graph neural network. In 2019 IEEE/WIC/ACM International Conference On Web Intelligence (WI) . IEEE, 156--163.

Digital Library

[28]

Mehdi Niknam and Parimala Thulasiraman. 2020. LPR: A bio-inspired intelligent learning path recommendation system based on meaningful learning theory. Education and Information Technologies 25, 5 (2020), 3797--3819.

Digital Library

[29]

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701--710.

Digital Library

[30]

Chris Piech, Jonathan Bassen, Jonathan Huang, Surya Ganguli, Mehran Sahami, Leonidas J Guibas, and Jascha Sohl-Dickstein. 2015. Deep knowledge tracing. Advances in neural information processing systems 28 (2015).

[31]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).

[32]

Richard S Sutton, Doina Precup, and Satinder Singh. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial intelligence 112, 1--2 (1999), 181--211.

[33]

Hangyu Wang, Ting Long, Liang Yin, Weinan Zhang, Wei Xia, Qichen Hong, Dingyin Xia, Ruiming Tang, and Yong Yu. 2023. GMOCAT: A Graph-Enhanced Multi-Objective Method for Computerized Adaptive Testing. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining . 2279--2289.

Digital Library

[34]

Han Xiao, Minlie Huang, Lian Meng, and Xiaoyan Zhu. 2017. SSP: semantic space projection for knowledge graph embedding with text descriptions. In Thirty-First AAAI conference on artificial intelligence.

[35]

Ruobing Xie, Shaoliang Zhang, Rui Wang, Feng Xia, and Leyu Lin. 2021. Hierarchical reinforcement learning for integrated recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4521--4528.

[36]

Hang Yin, Zhiyu Sun, Yanchun Sun, and Gang Huang. 2021. Automatic Learning Path Recommendation for Open Source Projects Using Deep Learning on Knowledge Graphs. In 2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE, 824--833.

[37]

Jing Zhang, Bowen Hao, Bo Chen, Cuiping Li, Hong Chen, and Jimeng Sun. 2019. Hierarchical reinforcement learning for course recommendation in MOOCs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 435--442.

Digital Library

[38]

Qian Zhang, Jie Lu, and Guangquan Zhang. 2021. Recommender Systems in E-learning. Journal of Smart Environments and Green Computing 1, 2 (2021), 76--89.

[39]

Dongyang Zhao, Liang Zhang, Bo Zhang, Lizhou Zheng, Yongjun Bao, and Weipeng Yan. 2020. Mahrl: Multi-goals abstraction based deep hierarchical reinforcement learning for recommendations. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 871--880.

Digital Library

[40]

Ling Zhong, Yantao Wei, Huang Yao, Wei Deng, Zhifeng Wang, and Mingwen Tong. 2020. Review of deep learning-based personalized learning recommendation. In Proceedings of the 2020 11th International conference on E-education, E-business, E-management, and E-learning. 145--149.

Digital Library

[41]

Yuwen Zhou, Changqin Huang, Qintai Hu, Jia Zhu, and Yong Tang. 2018. Personalized learning full-path recommendation model based on LSTM neural networks. Information Sciences 444 (2018), 135--152.

Cited By

Zhang HShen SXu BHuang ZWu JSha JWang SBaeza-Yates RBonchi F(2024)Item-Difficulty-Aware Learning Path Recommendation: From a Real Walking PerspectiveProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671947(4167-4178)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671947
Li QXia WYin LJin JYu YBaeza-Yates RBonchi F(2024)Privileged Knowledge State Distillation for Reinforcement Learning-based Educational Path RecommendationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671872(1621-1630)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671872
Pögelt AIhsberner KPengel NKravcik MGrüttmüller MHardt W(2024)Individualised Mathematical Task Recommendations Through Intended Learning Outcomes and Reinforcement LearningGenerative Intelligence and Intelligent Tutoring Systems10.1007/978-3-031-63028-6_10(117-130)Online publication date: 1-Jun-2024
https://doi.org/10.1007/978-3-031-63028-6_10

Index Terms

Graph Enhanced Hierarchical Reinforcement Learning for Goal-oriented Learning Path Recommendation
1. Applied computing
  1. Education
    1. E-learning

Recommendations

Item-Difficulty-Aware Learning Path Recommendation: From a Real Walking Perspective
KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Learning path recommendation aims to provide learners with a reasonable order of items to achieve their learning goals. Intuitively, the learning process on the learning path can be metaphorically likened to walking. Despite extensive efforts in this ...
Goal Modelling for Deep Reinforcement Learning Agents
Machine Learning and Knowledge Discovery in Databases. Research Track
Abstract
Goals provide a high-level abstraction of an agent’s objectives and guide its behavior in complex environments. As agents become more intelligent, it is necessary to ensure that the agent’s goals are aligned with the goals of the agent designers ...
Screening goals and selecting policies in hierarchical reinforcement learning
Abstract
Hierarchical Reinforcement Learning (HRL) is primarily proposed for addressing problems with sparse reward signals and a long time horizon. Many existing HRL algorithms use neural networks to automatically produce goals, which have not taken into ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

October 2023

5508 pages

ISBN:9798400701245

DOI:10.1145/3583780

General Chairs:
Ingo Frommholz
University of Wolverhampton, UK
,
Frank Hopfgartner
University of Koblenz, Germany
,
Mark Lee
University of Birmingham, UK
,
Michael Oakes
University of Birmingham, UK
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Min Zhang
Tsinghua University, China
,
Rodrygo Santos
Federal University of Minas Gerais, Brazil

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Conference

CIKM '23

Sponsor:

CIKM '23: The 32nd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2023

Birmingham, United Kingdom

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
498
Total Downloads

Downloads (Last 12 months)355
Downloads (Last 6 weeks)51

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang HShen SXu BHuang ZWu JSha JWang SBaeza-Yates RBonchi F(2024)Item-Difficulty-Aware Learning Path Recommendation: From a Real Walking PerspectiveProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671947(4167-4178)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671947
Li QXia WYin LJin JYu YBaeza-Yates RBonchi F(2024)Privileged Knowledge State Distillation for Reinforcement Learning-based Educational Path RecommendationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671872(1621-1630)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671872
Pögelt AIhsberner KPengel NKravcik MGrüttmüller MHardt W(2024)Individualised Mathematical Task Recommendations Through Intended Learning Outcomes and Reinforcement LearningGenerative Intelligence and Intelligent Tutoring Systems10.1007/978-3-031-63028-6_10(117-130)Online publication date: 1-Jun-2024
https://doi.org/10.1007/978-3-031-63028-6_10

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents