REDRL: A review-enhanced Deep Reinforcement Learning model for interactive recommendation

https://doi.org/10.1016/j.eswa.2022.118926Get rights and content

Highlights

  • Mining information in reviews and interaction data via the pretrained model.

  • Modeling long-term dynamic preferences of users accurately and discriminately.

  • Filtering irrelevant items and getting candidate items dynamically from a new anger.

  • Better interactive recommendation based on deep reinforcement learning.

Abstract

Recent advances in interactive recommender systems (IRS) have received wide attention due to its flexible recommendation strategy and optimization for users’ long-term utility. Considering this interaction paradigm of IRS, researchers have made some attempts to incorporate reinforcement learning (RL) models into IRS, because of the excellent ability of RL in long-term optimizing and decision making. However, data sparsity is an intractable problem most IRS urgently need to address. Although a small amount of work has exploited reviews to address data sparsity, they ignored the varying importance of items for modeling the user. In addition, most existing RL-based approaches suffer from decision-making difficulties when the action space becomes large. To solve above problems, in this work, we present a Review-enhanced Deep Reinforcement Learning model (REDRL) for interactive recommendation. Specifically, we utilize text reviews, combined with a pretrained review representation model to acquire item review-enhanced embedding representations. Then we formalize the recommendation problem as a Markov Decision Process (MDP), and exploit deep reinforcement learning (DRL) to model the interactive recommendation. Notably, we introduce a multi-head self-attention technique to capture distinct importance of various items in the sequence behavior, which is overlooked by existing work when modeling the user preference. In this way, we can model long-term dynamic preferences of users accurately and discriminately for comprehensive interactive recommendation. Moreover, we subtly combine the semantic structure information in the user–item bipartite graph with meta-paths in heterogeneous information networks (HIN), to filter some irrelevant items and obtain candidate items dynamically. By this means, the size of the discrete action space is effectively reduced from a new anger. The experimental results based on three benchmark datasets demonstrate the efficiency of our method with significant improvement over state-of-the-art.

Introduction

In the age with explosive growth of information, recommender systems (RS) are crucial in solving the issue of information overload. RS help users make decisions by displaying items (goods, services, news, information, etc.) that meet their needs as much as possible, according to their purchasing behaviors and preferences. Recently, interactive recommender systems (IRS) have received wide attention from the research community. Different from traditional RS, where the recommendation process is considered as a stationary one and follows a constant strategy, IRS sequentially recommend items to users and meanwhile update their recommendation policy based on users’ feedback. This interaction paradigm presented by IRS has been extensively used in some popular systems, such as Spotify, TikTok, and YouTube.

One way to implement IRS is multi-armed bandit (MAB) methods (Li et al., 2010, Wang et al., 2017, Zeng et al., 2016), and they focus on how to achieve the trade-off between exploration and exploitation. However, these methods assume that the user preference stays the same during the recommendation process, which goes against the dynamic nature of IRS. Recently, there have been some attempts to incorporate reinforcement learning (RL) models into IRS (Lei and Li, 2019, Lei et al., 2020, Zhou et al., 2020, Zou et al., 2020), due to the excellent ability of RL in long-term optimizing and decision making. However, most existing RL-based approaches suffer from decision-making difficulties when the action space becomes large, because the time complexity of decision making is linearly related to the scale of the discrete action space.

For more efficient decision-making, on the basis of Deep Deterministic Policy Gradient (DDPG), Wang, Guo, Li, Pan, and Li (2020) designed an efficient approach for constructing an action candidate set by dividing users into some clusters and using the idea of collaborative filtering. However, such a method introduced the burden of clustering users and the quality of the candidate set relied on the accuracy of the clustering.

Simultaneously, most existing RL-based approaches only consider user–item interaction rating data to model item characteristics and user preferences. However, the reliance on the single rating information can easily make the model fall into the dilemma of the sparsity problem. It is well recognized that textual reviews contain large amount of abundant semantic knowledge, which can help mitigate the data sparsity problem. And as a supplementary information, they are easily accessible from many e-commerce and review websites. Thanks to the progress made by deep learning (DL) in natural language processing (NLP), there have been some successful works to leverage reviews to enhance recommendation performance (Wu, Quan, Li, and Ji, 2018, Wu et al., 2019, Zheng et al., 2017). But in the scenario of interactive recommendation, most existing RL-based approaches either ignore the reviews information or have difficulty in incorporating with reviews. Although Wang et al. (2020) put forward an Actor–Critic architecture framework to alleviate the data sparsity problem by leveraging textual information, they failed to model user preference accurately and discriminately, because they ignored the varying importance of items for modeling the user. Furthermore, when modeling the user preference, most RL-based methods consider the output (the last hidden state) of the recurrent neural network (RNN) as the representation of user state. This approach ignores the importance of the contribution of different items to the overall modeling in the sequence behavior.

Therefore, to solve the above problems, in this work, we present a Review-enhanced Deep Reinforcement Learning model (REDRL) for interactive recommendation.

Specifically, to reduce the impact from the sparsity problem, we utilize text reviews, combined with a pretrained review representation model to acquire item review-enhanced embedding representations. Then we formalize the recommendation problem as a Markov Decision Process (MDP), and exploit deep reinforcement learning (DRL) to model the interactive recommendation. To solve the problem that existing RL-based models ignore the distinct importance of various items in sequential behaviors when modeling user preference, we introduce a multi-head self-attention technique. In this way, we can model long-term dynamic preferences of users accurately and discriminately for comprehensive interactive recommendation.

Moreover, to solve the problem of large action spaces that would be faced when applying reinforcement learning to the recommendation domain, we combine the semantic structure information in the user–item bipartite graph with meta-paths in heterogeneous information networks (HIN), to filter some irrelevant items and obtain candidate items dynamically from a fresh perspective. The main contributions of our work are summarized below:

  • We propose a novel Review-enhanced Deep Reinforcement Learning model for interactive recommendation, which comprehensively considers the rating information and review information to mitigate the problem of data sparsity. Moreover, by introducing the multi-head attention technique, it can characterize long-term dynamic preferences of users accurately and discriminately.

  • We combine the semantic structure information in the user–item bipartite graph with meta-paths in heterogeneous information networks, to filter some irrelevant items and acquire candidate items dynamically from a fresh perspective. These potential candidate items not only reduce the scale of discrete action space, but also increase the effectiveness of sampling for policy optimization.

  • We conduct experiments on three benchmark datasets. Experimental results demonstrate that our proposed model REDRL is superior to several state-of-the-art alternatives.

Section snippets

Related work

In this section, we concisely review works related to our study from three different angers. The first category related to our work is collaborative filtering. The second is about reinforcement learning for recommendation. And the third is about review information for recommendation.

Problem formulation

In this section, we consider the recommendation process from a reinforcement learning perspective. In the scenario of interactive recommendation, the interaction between the IRS and the user can be modeled as the interaction between the agent and the environment in RL.

Fig. 1illustrates the general scenario of the recommender–user interactions in MDP. At every time step t, given the state st about the environment (user), the agent (recommender) chooses an action (item) at according to the policy

The proposed model

In this section, we will introduce our review-enhanced deep reinforcement learning model for interactive recommendation scenario. Fig. 2 presents the architecture of our model, which includes four main modules: action representation module, state representation module, candidate selection module and deep Q-learning network module. In the scenario of interactive recommendation, at each time step t, the agent consecutively takes actions at, i.e., recommends an item it to users, and meanwhile

Experiments

In this section, we employ three benchmark datasets to perform sufficient experiments for performance evaluation. And then, we examine the effect of different parameters on the performance of our REDRL model. Finally, we analyze the impact of important components to the performance of REDRL.

Conclusion

In this work, we present a Review-enhanced DRL model named REDRL for interactive recommendation. By utilizing text reviews, we first obtain item review-enhanced embedding representations via a pre-trained review representation model. Then we formalize the recommendation problem as a MDP, and leverage DRL to model the interactive recommendation. In REDRL, we combine the semantic structure information in the user–item bipartite graph with meta-paths in heterogeneous information networks, to

CRediT authorship contribution statement

Huiting Liu: Conceptualization, Methodology, Writing – review & editing. Kun Cai: Software, Validation, Formal analysis, Data curation, Writing – original draft, Visualization. Peipei Li: Investigation. Cheng Qian: Data curation. Peng Zhao: Resources. Xindong Wu: Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research has been supported by the National Natural Science Foundation of China No. 61976077, the Natural Science Foundation of Anhui Province, Nos. 2008085MF219 and 2108085MF212, and the Provincial Natural Science Foundation of Anhui Higher Education Institution of China , Nos. KJ2021A0040 and KJ2021A0043.

References (62)

  • WangX. et al.

    Exploration in interactive personalized music recommendation: A reinforcement learning approach

    ACM Transactions on Multimedia Computing, Communications, and Applications

    (2014)
  • Bai, T., Zou, L., Zhao, W. X., Du, P., Liu, W., Nie, J. Y., & Wen, J. R. 2019. CTrec: A long-short demands evolution...
  • ChenH. et al.

    Large-scale interactive recommendation with tree-structured policy gradient

  • Chen, X., Xu, H., Zhang, Y., Tang, J., Cao, Y., Qin, Z., & Zha, H. 2018. Sequential recommendation with user memory...
  • Chen, T., Yin, H., Ye, G., Huang, Z., Wang, Y., & Wang, M. 2020. Try this instead: Personalized and interpretable...
  • Chen, J., Zhang, H., He, X., Nie, L., Liu, W., & Chua, T. S. 2017. Attentive collaborative filtering: Multimedia...
  • Cheng, H. T., Koc, L., Harmsen, J., Shaked, T., Chandra, T., Aradhye, H., & Shah, H. 2016. Wide & deep learning for...
  • Christakopoulou, K., Radlinski, F., & Hofmann, K. 2016. Towards conversational recommender systems. In Proceedings of...
  • Dong, Y., Chawla, N. V., & Swami, A. 2017. metapath2vec: Scalable representation learning for heterogeneous networks....
  • Dulac-ArnoldG. et al.

    Deep reinforcement learning in large discrete action spaces

    (2015)
  • Gu, Y., Ding, Z., Wang, S., & Yin, D. 2020. Hierarchical user profiling for e-commerce recommender systems. In...
  • HammouB.A. et al.

    An effective distributed predictive model with matrix factorization and random forest for Big Data recommendation systems

    Expert Systems with Applications

    (2019)
  • Hausknecht, M., & Stone, P. 2015. Deep recurrent q-learning for partially observable MDPS. In 2015 AAAI fall symposium...
  • He, X., He, Z., Du, X., & Chua, T. S. 2018. Adversarial personalized ranking for recommendation. In The 41st...
  • He, X., Liao, L., Zhang, H., Nie, L., Hu, X., & Chua, T. S. 2017. Neural collaborative filtering. In Proceedings of the...
  • He, R., & McAuley, J. 2016. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative...
  • HidasiB. et al.

    Session-based recommendations with recurrent neural networks

    (2015)
  • Ie, E., Jain, V., Wang, J., Narvekar, S., Agarwal, R., Wu, R., & Boutilier, C. 2019. SlateQ: A tractable decomposition...
  • KangW.C. et al.

    Self-attentive sequential recommendation

  • KawaleJ. et al.

    Efficient thompson sampling for online matrix-factorization recommendation

    Advances in Neural Information Processing Systems

    (2015)
  • KonstanJ.A. et al.

    Grouplens: Applying collaborative filtering to usenet news

    Communications of the ACM

    (1997)
  • Koren, Y. 2008. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of...
  • LeiY. et al.

    Interactive recommendation with user-specific deep reinforcement learning

    ACM Transactions on Knowledge Discovery from Data

    (2019)
  • Lei, Y., Pei, H., Yan, H., & Li, W. 2020. Reinforcement learning based recommendation with graph convolutional...
  • Li, L., Chu, W., Langford, J., & Schapire, R. E. 2010. A contextual-bandit approach to personalized news article...
  • Li, J., Ren, P., Chen, Z., Ren, Z., Lian, T., & Ma, J. 2017. Neural attentive session-based recommendation. In...
  • Li, S., Zhou, J., Xu, T., Liu, H., Lu, X., & Xiong, H. 2020. Competitive analysis for points of interest. In...
  • Liang, D., Krishnan, R. G., Hoffman, M. D., & Jebara, T. 2018. Variational autoencoders for collaborative filtering. In...
  • LiuF. et al.

    Deep reinforcement learning based recommendation with explicit user-item interactions modeling

    (2018)
  • MnihV. et al.

    Playing atari with deep reinforcement learning

    (2013)
  • MnihV. et al.

    Human-level control through deep reinforcement learning

    Nature

    (2015)
  • Cited by (0)

    View full text