Elsevier

Neurocomputing

Volume 127, 15 March 2014, Pages 124-134
Neurocomputing

Cross domain recommendation based on multi-type media fusion

https://doi.org/10.1016/j.neucom.2013.08.034Get rights and content

Abstract

Due to the scarcity of user interest information in the target domain, recommender systems generally suffer from the sparsity problem. To alleviate this limitation, one natural way is to transfer user interests in other domains to the target domain. However, objects in different domains may be in different media types, which make it very difficult to find the correlations between them. In this paper, we propose a Bayesian hierarchical approach based on Latent Dirichlet Allocation (LDA) to transfer user interests cross domains or media. We model documents (corresponding to media objects) from different domains and user interests in a common topic space, and learn topic distributions for documents and user interests together. Specifically, to learn the model, we combine multi-type media information: media descriptions, user-generated text data and ratings. With this model, recommendation can be done in multiple ways, via predicting ratings, comparing topic distributions of documents and user interests directly and so on. Experiments on two real world datasets demonstrate that our proposed method is effective in addressing the sparsity problem by transferring user interests cross domains.

Introduction

Recommender systems attempt to suggest items that target users are likely to be interested in. The most representative recommendation method is Collaborative Filtering (CF) which predicts the preference of a user by combining feedbacks of other users with similar interests. Even though CF methods achieve great successes in practical applications, there are still some problems which limit their performance. One main limitation is the well-known sparsity problem [1], [2]. That is, when some users access limited items or some items are used by limited users, it is difficult to predict user interests and overfitting may happen easily.

To alleviate the sparsity problem, auxiliary data, such as users' explicit and implicit feedbacks in other domains, can be used. Fig. 1 shows an example. Assuming the scenario that users leave less preference information (e.g., ratings and comments) in books but much more in movies. It is difficult to recommend books for these users only based on their feedbacks on books, since their interest data in books is limited (i.e., the data is sparse). Fortunately, we can transfer user interests from movies to books. Intuitively, if users like the movie Harry Potter, they may also like the book Harry Potter. In another more meaningful example, if users watched many science fiction films, they may be interested in books in similar styles.

The key is how to transfer user interests cross domains, even with different media types. Some researchers propose transfer learning methods to solve this problem [3], [4]. They assume that the rating matrices in different domains share similar cluster-level rating patterns and consider these patterns as potential candidates to be transferred from the auxiliary domain. Although these methods can alleviate the sparsity problem to some extent, there are still two major limitations. Firstly, they require data in both the auxiliary domain and the target domain to be standardized and structured. It means that the data in both domains are in the form of rating matrices, while in practice this requirement cannot be met sometimes. Secondly, these methods are hard to extend for exploiting other kinds of information, such as media content and user-generated text data.

In practical applications, various kinds of information can be utilized to transfer user interests. For instance, in E-commerce websites such as Amazon, recommender system designers may be interested in transferring user interests cross commodity categories (e.g., electronic products to books). In this example, users' comments can be used to mine user interests, description text can be used to build correlations between commodities from different categories, and so on. In the example of Fig. 1, the useful information includes the content of the media itself, the media description text, some kinds of meta data and user-generated information. To transfer user interests based on these types of information, a nature solution is to build correlations or similarities between objects from different domains. However it is difficult to get these correlations which are consistent with user interests, especially for objects in the form of different media types.

In this paper, we propose a Bayesian hierarchical approach based on Latent Dirichlet Allocation (LDA) for cross domain recommendation. We model documents (we consider all types of objects, such as movies and books, as documents) from different domains and user interests in a common topic space. The topic distributions for documents and user interests are learned simultaneously and the topic distribution of a particular document is built on document content as well as user interests. Then the correlations among different types of media can be constructed based on the topic distributions. Since we incorporate user interests in topic modeling, the correlations are forced to agree with user tastes. Specifically, a document corresponds to a media object and includes three parts of information: media content (or its description text), user-generated text data and ratings. We model media content in a similar way as the basic LDA model. But for the user-generated text, we choose topics either from the document topic or the user interest topic, because user generated text, such as tags, is related to both the document and the user interests. Finally, we model ratings based on the assumption: a user will like a document if there are common topics between the user interests and the document. Given two topics chosen from the document and the user respectively, a rating is drawn from a rating distribution. Different from the word distribution in basic LDA model which is in the form of a two order matrix, the rating distribution here is a three order tensor. Based on this model, we can suggest documents to users based on predicting ratings or by comparing topic distributions between documents and user interests directly. Experiments on two real world datasets demonstrate that our proposed method outperforms baseline methods in cross domain recommendation.

The rest of this paper is organized as follows. Section 2 reviews the related work. In Section 3, we represent the data used in our method and define notations. In Section 4, we introduce our proposed method on cross domain recommendation. Extensive experimental results are presented in Section 5. We conclude our paper in Section 6.

Section snippets

Related work

In this paper, we propose to transfer user interests for cross media recommendation based on LDA model. Our work is related to recommendation by transfer learning, cross media retrieval and recommendation by LDA model. In this section we provide a brief review of these works.

Data representation and notions

As represented in Fig. 1, the data comes from multiple domains, relating to different media types. To simplify the description, we only consider the situation of two domains here (different media types or the same media type with different categories). The document collections (i.e., items) from these domains are denoted by D1 and D2. For a document collection D from a particular domain, D={d1,d2,,d|D|}, where di is the ith document in the collection. To simplify the notation, we represent the

Transfer user interests cross media

In this section, we introduce the proposed user interests considered cross media LDA model (cmLDA). Before defining the model, we present the basic LDA model and some of its extensions first. Based on the model definition, we then introduce the inference method and parameter estimation by Gibbs Sampling. Recommendation by our model is discussed in the end.

Experiments

In this section, we investigate the use of our proposed algorithm for cross media recommendation. Before presenting the experiment results, we introduce the experiment settings first.

Conclusion

To alleviate the sparsity problem in recommender systems, we introduce a probabilistic collaborative filtering algorithm based on Latent Dirichlet Allocation model for cross domain or cross media recommendation. We first assume that documents (i.e., items) from different domains and user interests share a common topic space. Then topic distributions for documents and user interests are learned simultaneously. In this way, cross media recommendation can be done by comparing topic distributions

Acknowledgments

This work is partially supported by the National Basic Research Program of China (973 Program) under Grant No. 2013CB336500, National Natural Science Foundation of China (Grant No. 61173186, 61173185), National High Technology Research and Development Program of China (863 Program) under Grant No. 2013AA040601.

Shulong Tan received the BS degree in Software Engineering from Zhejiang University, China, in 2008. He is currently a PhD candidate in College of Computer Science, Zhejiang University, under the supervision of Prof. Chun Chen. His research interests include social network mining, recommender systems and text mining.

References (36)

  • G. Adomavicius et al.

    Toward the next generation of recommender systemsa survey of the state-of-the-art and possible extensions

    IEEE Trans. Knowl. Data Eng.

    (2005)
  • W. Pan, E. Xiang, N.N. Liu, Q. Yang, Transfer learning in collaborative filtering for sparsity reduction, in:...
  • B. Li, Q. Yang, X. Xue, Can movies and books collaborate? Cross-domain collaborative filtering for sparsity reduction,...
  • B. Li, Q. Yang, X. Xue, Transfer learning for collaborative filtering via a rating-matrix generative model, in:...
  • S.J. Pan et al.

    A survey on transfer learning

    IEEE Trans. Knowl. Data Eng.

    (2009)
  • N.D. Phuong, T. M. Phuong, Collaborative filtering by multi-task learning, in: IEEE International Conference on...
  • A.P. Singh, G.J. Gordon, Relational learning via collective matrix factorization, in: Proceedings of the 14th ACM...
  • E.W. Xiang, N.N. Liu, S.J. Pan, Q. Yang, Knowledge transfer among heterogeneous information networks, in: IEEE...
  • Y. Zhang, B. Cao, D.-Y. Yeung, Multi-domain collaborative filtering, in: Proceedings of the 26th Conference on...
  • B. Cao, N.N. Liu, Q. Yang, Transfer learning for collective link prediction in multiple heterogenous domains, in:...
  • W. Pan, N.N. Liu, E. Xiang, Q. Yang, Transfer learning to predict missing ratings via heterogeneous user feedbacks, in:...
  • W. Pan, E. Xiang, Q. Yang, Transfer learning in collaborative filtering with uncertain ratings, in: Proceedings of the...
  • J. Tang, S. Wu, J. Sun, H. Su, Cross-domain collaboration recommendation, in: Proceedings of the 18th ACM SIGKDD...
  • J. Jeon, V. Lavrenko, R. Manmatha, Automatic image annotation and retrieval using cross-media relevance models, in:...
  • J.-Y. Pan, H.-J. Yang, C. Faloutsos, P. Duygulu, Automatic multimedia cross-modal correlation discovery, in:...
  • F. Monay et al.

    Modeling semantic aspects for cross-media image indexing

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • Y. Yang, D. Xu, F. Nie, J. Luo, Y. Zhuang, Ranking with local regression and global alignment for cross media...
  • N. Rasiwasia, J.C. Pereira, E.C. Gabriel Doyle, G.R. Lanckriet, R. Levy, N. Vasconcelos, A new approach to cross-modal...
  • Cited by (75)

    • CD-SPM: Cross-domain book recommendation using sequential pattern mining and rule mining

      2022, Journal of King Saud University - Computer and Information Sciences
      Citation Excerpt :

      For experimental testing, Amazon datasets and Epinions real-world datasets are used. A CDRS proposed by Tan et al. (2014) based on Bayesian hierarchical approach and Latent Dirichlet Allocation (LDA) for transferring user interests in cross-media or across domains. CDRS based learning model combines multi-type media information: rating, user-generated text data and media descriptions.

    • Deep sparse autoencoder prediction model based on adversarial learning for cross-domain recommendations

      2021, Knowledge-Based Systems
      Citation Excerpt :

      The popular WordNet model was utilized to evaluate the semantic associations among textual words. The Bayesian hierarchical method based on latent Dirichlet allocation was proposed by S. Tan et al. [42] to transfer user preferences from the auxiliary domain to the target domain. In the proposed approach, cross-domain multitype media information (textual data, media descriptions and observed ratings) was learned to profile topic distributions for documents and user interests for predictive ratings.

    • A Novel Cross-Domain Recommendation with Evolution Learning

      2024, ACM Transactions on Internet Technology
    • Dual Interests-Aligned Graph Auto-Encoders for Cross-domain Recommendation in WeChat

      2023, International Conference on Information and Knowledge Management, Proceedings
    View all citing articles on Scopus

    Shulong Tan received the BS degree in Software Engineering from Zhejiang University, China, in 2008. He is currently a PhD candidate in College of Computer Science, Zhejiang University, under the supervision of Prof. Chun Chen. His research interests include social network mining, recommender systems and text mining.

    Jiajun Bu received the BS and Ph.D. degrees in Computer Science from Zhejiang University, China, in 1995 and 2000, respectively. He is a professor in College of Computer Science, Zhejiang University. His research interests include embedded system, data mining, information retrieval and mobile database.

    Xuzhen Qin received his BS degree in Computer Science and Technology from Xiamen University, China, in 2011. He is currently working toward a master degree in College of Computer Science, Zhejiang University. His research interest includes recommender systems and data mining.

    Chun Chen received the BS degree in Mathematics from Xiamen University, China, in 1981, and his MS and Ph.D. degrees in Computer Science from Zhejiang University, China, in 1984 and 1990, respectively. He is a professor in College of Computer Science, Zhejiang University. His research interests include information retrieval, data mining, computer vision, computer graphics and embedded technology.

    Deng Cai is a Professor in the State Key Lab of CAD&CG, College of Computer Science at Zhejiang University, China. He received the Ph.D. degree in computer science from University of Illinois at Urbana Champaign in 2009. Before that, he received his Bachelor's degree and a Master's degree from Tsinghua University in 2000 and 2003 respectively, both in automation. His research interests include machine learning, data mining and information retrieval.

    View full text