Elsevier

Knowledge-Based Systems

Volume 203, 5 September 2020, 106119
Knowledge-Based Systems

Tag-informed collaborative topic modeling for cross domain recommendations

https://doi.org/10.1016/j.knosys.2020.106119Get rights and content

Abstract

Collaborative topic modeling is powerful to alleviate data sparsity in recommender systems owing to the incorporation of collaborative filtering and topic models. However sufficient textual data is not always available. On the other hand, tags serving as supplementary description of items can reflect users’ interests in item attributes. But previous works only mine the effect of tags on ratings in one domain and ignore that in related domains items can be related in attributes. Tags encode similar properties of items and can be transferred across domains to mutually benefit recommendations for both domains. In this study we propose a TagCDCTR (Tag-informed Cross Domain Collaborative Topic Regression) model, which exploits shared tags as bridges to link related domains through an extended collaborative topic modeling framework. The model exploits the inter-domain relations by encoding cross domain item–item similarity based on common tags and jointly learning a shared set of topics from all domains together. Collectively factorizing the rating matrices of multiple domains into common user latent factors and domain-specific item latent factors, so that the learned item latent factors are linked through the inter-domain relations, helping to capture the items more comprehensively. The rich information reused in multiple domains alleviates data sparsity and the semantic advantage of topics and tags provides a better interpretability of recommendations. The experiments conducted on three datasets demonstrate that TagCDCTR outperforms state-of-the-art collaborative-topic-based models and cross-domain-based models.

Introduction

Recommender Systems (RS) aim to manage information overload by helping people identify the products or services that best fit their tastes. Collaborative Filtering (CF) is one of the most successful approaches in recommender systems [1] because it mainly relies on past user behaviors and requires no extensive data collection. On the other hand, CF models capture items only by relatively few user ratings and thus suffer from sparsity problem. To alleviate this difficulty, collaborative topic modeling becomes popular in recommendation owing to the incorporation of CF methods and topic models. It fits a model that uses the latent themes to explain both the observed ratings and the observed words [2]. However, collaborative topic modeling mainly makes use of ratings and text, while sufficient textual data is not always available. Recommendation performance would be further improved by fully exploiting other additional data.

Currently, many recommender systems allow users tag items along with ratings to describe their interests in item attributes. Tags serve as supplementary description of items, and user ratings for an item may be affected by the tags the user attached to the item. A user and the items that he has tagged tend to share similar latent features [3]. For example, if a user gave a movie high ratings and tagged it with “romance”, “Leonardo” and “Oscar”, we can infer that the user may prefer romantic movies, especially played by Leonardo and honored by Oscar; also, the romantic movie may be played by Leonardo and is awarded Oscar. Tags contain both factual information about items and subjective information about users, which may be very useful for recommendation. But previous works [3], [4], [5] only capture the tags in one domain, few of them deal with the problem via cross domain techniques, whereas in real-world scenarios we can easily find related CF domains that recommend similar items with the target domain [6].

Cross-Domain Collaborative Filtering (CDCF) methods are proposed to relieve sparsity problem in each individual domain by transferring useful knowledge from other domains [7]. The rationale of knowledge transfer over domains is that, in related domains, users can be related in interests and items can be related in attributes [6]. In this paper, we consider the notation of domain at attribute level, which means the recommended items are of the same type and have the same attributes. Two items would be considered to belong to different domains if they differ in the value of certain attribute [8]. For instance, comedy movies and drama movies are in different domains because they belong to different genres. Furthermore, we consider scenarios where the domains share the same aligned users, which means the users have rated items in several domains [9]. This assumption commonly appears in real world. The key challenge is how to relate these domains together to transfer knowledge effectively.

In related domains, tags represent users’ preferences and encode the similar properties of items. The shared users with same patterns may use same set of tags to annotate items, thus the tags may act as a common vocabulary between domains [9]. Fig. 1 shows an example of cross domain movie tagging system. The system consists of two domains, item v1 is in D1 of comedy genre, items v2 and v3 are in D2 of drama genre. There are totally two users and six distinct tags shared across domains. u1 tags “Leonardo”, “romance” and “psychological” to v1; and tags “Leonardo”, “romance” and “Oscar” to v3, indicating that v1 and v3 may share similar properties of “romance” and “Leonardo”. u2 tags “mystery”, “adventure” and “psychological” to v2. Single-domain CF models cannot capture the correlation between v2 and v3 because their tags do not overlap. While in cross domain scenarios, “psychological” is the overlap between v1 and v2; “Leonardo” and “romance” are overlap between v1 and v3. Such underlying correlations between v2 and v3 are captured via knowledge transfer techniques and spread among all items, contributing to more comprehensive items.

In this paper, we investigate how to exploit tags as bridges to improve collaborative topic modeling of multiple domains simultaneously. Specifically, we encode cross domain item–item similarity in the form of common tags, and utilize topic models to extract a shared set of topics in the collections of item documents from all domains together. The learned information is then incorporated into multiple probabilistic matrix factorizations as inter-domain links in a collective way, so that the user and item latent factors are learned jointly for both domains. The fully exploited ratings, text and tags contribute to a reliable and meaningful bridge between domains and provide a mutual promotion for both domains.

The main contributions of this paper are summarized as follows. (1) TagCDCTR takes advantage of factual and subjective information of tags across domains, thus capturing the item features more comprehensively. Tags serve as supplementary item descriptions and are able to encode similar properties of items across domains. (2) TagCDCTR improves the interpretability of recommendations. Firstly, TagCDCTR provides an interpretable latent representation for items from text by assuming that item latent vector is close to topic proportion. Secondly, it represents user preferences with topic interests. The model also utilizes tags to describe similar properties of items cross domains. Such semantic advantage of topics and tags which fused in a cross-domain framework serve as a summary of items and capture the characteristics of items, helping to explain the recommendations. (3) TagCDCTR reuses information via knowledge transfer and alleviates data sparsity in each individual domain more effectively. We fuse all three aspects of information in an unified cross domain framework, thus the model is able to make recommendations by shared tags when the ratings and text are relatively few. (4) The experiments conducted on three datasets demonstrate that our model outperforms some state-of-the-art models and prove that knowledge transfer is mutually beneficial for both domains.

Section snippets

Related works

The related works to our study are mainly about Collaborative Topic Modeling and Cross Domain Collaborative Filtering.

Tag-informed cross domain collaborative topic regression

In this section we describe the proposed TagCDCTR in detail. We first characterize the cross domain item–item similarity based on common tags. Then introduce the central mechanism of model formulation and parameter learning.

Experiments

In this section, we investigate whether the recommendation performance for each individual domain can be improved by transferring knowledge from related domains. We compare TagCDCTR for both of the combined domains with some well-known approaches, then examine the ability of TagCDCTR to address the data sparseness, finally discuss the impact brought by the parameters.

Conclusions

In this paper, we present TagCDCTR, a novel framework that extends collaborative topic modeling into cross domain context by exploiting tags as inter-domain links. We utilize not only ratings and text, but also factual and subjective information of tags across domains to estimate the item latent factors more comprehensively. We also use shared tags as supplementary item descriptions to encode similar properties of items across domains, then use such semantic advantage of tags and topics learned

CRediT authorship contribution statement

Jiaqi Wang: Conceptualization, Methodology, Writing - original draft. Jing Lv: Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

We are grateful to Chaochao Chen, the first author of reference [3]. His code for TRCF algorithm is helpful to our experiments. This work was supported by the Philosophy and Social Science Research in Colleges and Universities in Jiangsu Province (No.2019SJA0237), the Educational Reform Project of Nanjing Normal University .

References (49)

  • ChenC. et al.

    Capturing semantic correlation for item recommendation in tagging systems

  • WangH. et al.

    Collaborative topic regression with social regularization for tag recommendation

  • LiB.

    Cross-domain collaborative filtering: A brief survey

  • CantadorI. et al.

    Cross-domain recommender systems

  • EnrichM. et al.

    Cold-start management with cross-domain collaborative filtering and tags

  • SalakhutdinovR. et al.

    Bayesian probabilistic matrix factorization using markov chain monte carlo

  • MnihA. et al.

    Probabilistic matrix factorization

  • BleiD.M. et al.

    Latent dirichlet allocation

    J. Mach. Learn. Res.

    (2003)
  • ChangJ. et al.

    Reading tea leaves: how humans interpret topic models

    Neural Inf. Process. Syst.

    (2009)
  • AgarwalD. et al.

    Flda: matrix factorization through latent dirichlet allocation

  • KorenY. et al.

    Matrix factorization techniques for recommender systems

    Computer

    (2009)
  • KorenY.

    Factorization meets the neighborhood: a multifaceted collaborative filtering model

  • PurushothamS. et al.

    Collaborative topic regression with social matrix factorization for recommendation systems

  • DingX. et al.

    Celebrity recommendation with collaborative social topic regression

  • Cited by (0)

    View full text