Elsevier

Knowledge-Based Systems

Volume 227, 5 September 2021, 107217
Knowledge-Based Systems

Multi-modal Knowledge-aware Reinforcement Learning Network for Explainable Recommendation

https://doi.org/10.1016/j.knosys.2021.107217Get rights and content

Abstract

Knowledge graphs (KGs) can provide rich, structured information for recommendation systems as well as increase accuracy and perform explicit reasoning. Deep reinforcement learning (RL) has also sparked great interest in personalized recommendations. The combination of the two holds promise in carrying out interpretable causal inference procedures and improving the performance of graph-structured recommendation. However, most KG-based recommendation focus on rich semantic relationships between entities in a heterogeneous knowledge graph, and thus fail to fully make use of the image information corresponding to an entity. In order to address these issues, we proposed a novel Multi-modal Knowledge-aware Reinforcement Learning Network (MKRLN), which couples recommendation and interpretability by providing actual paths in multi-modal KG (MKG). The MKRLN can generate path representation by composing the structural and visual information of entities, and infers the underlying rational of agent-MKG interactions by leveraging the sequential dependencies within a path from the MKG. In addition, as KGs have too many attributes and entities, their combination with RL leads to too many action spaces and states in the reinforcement learning space, which complicates the search of action spaces. Furthermore, in order to solve this problem, we proposed a new hierarchical attention-path, which makes users focus their attention on the items they are interested in. This reduces the relations and entities in the KGs, which in turn reduces the action space and state in RL, shortens the path to the target entity, and improves the accuracy of recommendation. Our model has explicit explanation ability in knowledge and images. Finally, we extensively evaluated our model on several large-scale real-world benchmark datasets, and it yielded favorable results compared with state-of-the-art methods.

Introduction

With the explosive growth of online content and services, recommendation systems are playing an increasingly important role in matching user needs with various online resources. Knowledge graphs (KGs) is used as an auxiliary resource in recommendation systems, and helps make better use of various types of structured information to improve recommendation performance and enhance the interpretability of the recommended model [1]. In general, the use of KG-awareness recommendation is two-fold: First, some approaches focuses on using knowledge-graph embedding to make personalized recommendation, such as TransE [2], node2vec [3], and Metapath2Vec [4]. These approaches adopt knowledge graph embedding to calculate the similarity between items and users for Top-N item recommendation [5], [6]. However, these methods match the similarity between a user and item, which do not produce the interpretable reasoning process. Second, as knowledge-graph embedding methods do not produce explanations for recommendation. Therefore, some recommendation systems can produce reasonable explanations through KGs. For example, Wang et al. [7] proposed the knowledge-aware path recurrent network (KPRN) to exploit effective reasoning on paths that further infer the underlying rationale of a user-item interaction. Xian et al. [8] further extended explainable recommendation by proposing policy-guided path reasoning, which formally defines and interprets the reasoning process.

As such, there is a great value in exploiting KGs for explainable recommendation systems. However, those methods inject KGs to enrich the representation of recommendation problems, but ignore the visual traits of items and multi-modal information. Further, with the growing volume of online images, content-based image decisions have come into play. For example, when browsing a movie from a website in Fig. 1, users typically look at the movie’s poster first; then they will read the text content and learn that the film is a story between a white driver, Tony, and black musician, Don. Then they will decide about whether to watch the movie. This indicates that images share a great deal of latent knowledge-level connections, which benefits recommendation systems.

Therefore, it is natural to combine the semantic and image modalities. Indeed, many efforts linking image and text have shown promising results and can be applied to recommendation systems, such as visual-semantic embedding and multi-modal correlation learning. Recently, researchers have explored the potential of multi-modal recommendation in greater depth. For example, Yu et al. [9] proposed the vision-language recommendation model, which enables users to provide natural language feedback on visual products. Zhang et al. [10] proposed Joint Representation Learning (JRL) heterogeneous recommendation systems. However, these models use the representation of text and images, which cannot produce the interpretable reasoning process.

In contrast to existing single-modal recommendation methods, such as pure KGs [5], [11], we propose a multi-modal method for explaining recommendation systems. The agent starts with a user and conducts a multi-hop logical path over the MKGs so as to discover suitable items for recommendation to the target user. If the agent recommends items for the user based on a logical path, it will be easy to interpret the reasoning process over the MKG that leads to each recommendation. Thus, the system can provide two aspects of causal evidence, visuals and knowledge, in support of the recommended items. Accordingly, the aim of our system is not only to recommend the candidate items for the user, but also to provide the corresponding explaining logic paths in the MKG. The logic paths contain visuals and knowledge that serve as interpretable evidence for why a given recommendation is made.

Considering the shortcomings of previous work and inspired by the wide application of images, KGs, and deep reinforcement learning (RL), we proposed a multi-modal knowledge graph incorporating the reinforcement learning network (MKRLN) model, that is, a deep RL model incorporating multiple modes for multi-dimension explanation and reasoning. Then, we designed a novel hierarchy attention-path over KGs, which can largely decrease the action spaces and filter noise. The designed recommendation approach has three advantages. First, we can provide explanations from both visual and knowledge aspects, which are complementary. The agent starts from a user (i.e., is linked to an entity) and performs searches so as to discover suitable items along the paths over the KGs. These multi-step paths can provide logical reasons and deep explanations as to how to recommend items for the user. The image can explain the reason from a visual perspective, and knowledge can explain the recommendation from an external knowledge perspective. Second, in a typical KG, one entity can be linked to a large number of neighbors with the same attributes. In this regard, we propose using attention neighbors and attention-paths to greatly reduce the number of large action spaces and the entities. Third, faced with a large number of items, users have difficulty focusing on the items they care most about, so attention-path mechanisms can explore the user’s really preference and filter out redundant information.

Our contributions are summarized as below:

  • We proposed a multi-modal KG combined with deep RL for personalized recommendation. The model can explain logical reasoning from both visual and knowledge aspects, making the explanation multi-dimensional.

  • We designed a novel approach hierarchy attention-path over multi-modal KGs, which can greatly reduce the number of action spaces, entities, and filter out noise, allowing the user to focus on items they care about most.

  • We highlighted the significance that multi-modal KGs for the purpose of evaluating the performance of recommendation systems capable of recommendation with a higher knowledge level and more explicit reasoning about image contents using external information.

Section snippets

Recommendation with knowledge graph

In recent years, researchers have explored the potential of knowledge graph reasoning in recommendation systems. A series of studies focused on the use of knowledge-graph embedding models to make recommendation [4], [5]. Another research direction is to make interpretable recommendation based on the entity and path information in a knowledge graph. For example, Ai et al. [12] proposed a collaborative filtering (CF) method over knowledge-graph embedding for improving personalized recommendation.

Framework

A multi-modal knowledge graph is defined as GM=(E,R,G), where E is entity set and R is the relation set, G={e,r,e|e,eE,rR} represents the KG triples. Each triple (e,r,e) represents a facet of the relation r, from the head entity, e, to the tail entity, e. Let V be the set of images, with an image being vV. Each entity e is associated with the corresponding image v. The images describe the appearances of the entity, which enriches the entities’ representation with their hidden semantics.

Experiments

We extensively evaluated the performance of our model on real world datasets. We first introduced the benchmarks for our experiments and the corresponding experiments settings. Then, we quantitatively compared the effectiveness of our model with other state-of-the-art methods, and conducted ablation studies to show how parameter variations influence our model.

Conclusion

We believe that in the future intelligent agents should have the ability to perform explicit reasoning over knowledge and images for decision-making. In this paper, we proposed an end-to-end framework based on the interaction of deep reinforcement learning and multi-modal knowledge graph to automatically model the recommendation system for recommendation with interpretation. To achieve this, we built a multi-modal knowledge graph and learned the representations of the entities and images within

CRediT authorship contribution statement

Shaohua Tao: Conceptualization, Methodology, Software. Runhe Qiu: Supervision, Writing- Original draft preparation. Yuan Ping: Writing- Reviewing. Hui Ma: Investigation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by the National Key R&D Program of China under Grant 2017YFB0802000, the National Natural Science Foundation of China under Grant U1736111, the Plan For Scientific Innovation Talent of Hen’an Province, China under Grant 184100510012, Key Technologies R&D Program of He’nan Province, China under Grant 212102210084, and Innovation Scientists and Technicians Troop Construction Projects of He’nan Province Key Technologies R & D Program, China of Henan Province under Grant NO.

References (28)

  • ZhangY.-F. et al.

    Explainable recommendation: A survey and new perspectives

    (2018)
  • A. Bordes, N. Usunier, A.G. Duran, J. Weston, et al. Translating Embeddings for Modeling Multi-relational data, in:...
  • A. Grover, J. Leskovec, node2vec: Scalable feature learning for networks, in: Proceedings of the 22nd ACM SIGKDD...
  • Y.-X. Dong, N.-V. Chawla, A. Swami, metapath2vec: Scalable representation learning for heterogeneous networks, in:...
  • F.-Z. Zhang, N.-J. Yuan, D. Lian, X. Xie, et al. Collaborative knowledge base embedding for recommender systems, in:...
  • E. Palumbo, G. Rizzo, R. Troncy, Entity2rec: Learning user-item relatedness from knowledge graphs for top-N item...
  • WangX. et al.

    Explainable reasoning over knowledge graphs for recommendation

    (2018)
  • Y.-K. Xian, Z.-H. Fu, S. Muthukrishnan, G.-D. Melo, et al. Reinforcement knowledge graph reasoning for explainable...
  • Y. Tong, Y.-L. Lin, R.-Y. Zhang, X.-Y. Zeng, et al. Vision-language recommendation via attribute augmented multimodal...
  • Y.-F. Zhang, Q.-Y. Ai, X. Chen, W.-B. Cro, Joint representation learning for top-N recommendation with heterogeneous...
  • X.-T. Wang, Y.-R. Chen, J. Yang, L. Wu, et al. A reinforcement learning framework for explainable recommendation, in:...
  • AiQ.-Y. et al.

    Learning heterogeneous knowledge base embeddings for explainable recommendation, algorithms

    Algorithms

    (2018)
  • WangX. et al.

    Explainable reasoning over knowledge graphs for recommendation

    (2018)
  • L.-Z. Liao, Y.-S. Ma, X.-N. He, R.-C. Hong, et al. Knowledge-aware multimodal dialogue systems, in: Proceedings of the...
  • Cited by (34)

    • KAiPP: An interaction recommendation approach for knowledge aided intelligent process planning with reinforcement learning

      2022, Knowledge-Based Systems
      Citation Excerpt :

      In recent years, RL has been widely introduced into interaction recommendation systems, such as ad recommendation [26], movie recommendation [27], music recommendation [28], etc., as its advantages of considering users’ long-term feedbacks. For example, Tao et al. [29] proposed an end-to-end framework based on the multi-modal knowledge-aware reinforcement learning network for explainable recommendation. Huang et al. [30] modeled the recommendation process with MDP and then introduced a top-N deep reinforcement learning-based approach to handle the long-term recommendation problem.

    • Self-Supervised Reinforcement Learning with dual-reward for knowledge-aware recommendation

      2022, Applied Soft Computing
      Citation Excerpt :

      To tackle this problem, RL algorithms are preferred solutions. Differing from the recommendation models based on traditional machine learning methods [53–55] and deep learning methods [56–59], the explainable recommendation models with KG and RL [60] not only make high-quality recommendations but also provide explanations, which contribute to the effectiveness and trustworthiness of the recommender systems. For example, Park et al. [61]

    View all citing articles on Scopus

    The code (and data) in this article has been certified as Reproducible by Code Ocean: (https://codeocean.com/). More information on the Reproducibility Badge Initiative is available at https://www.elsevier.com/physical-sciences-and-engineering/computer-science/journals.

    View full text