Unified collaborative filtering model based on combination of latent features

https://doi.org/10.1016/j.eswa.2010.02.044Get rights and content

Abstract

Collaborative filtering (CF) has been studied extensively in the literature and is demonstrated successfully in many different types of personalized recommender systems. In this paper, we propose a unified method combining the latent and external features of users and items for accurate recommendation. A mapping scheme for collaborative filtering problem to text analysis problem is introduced, and the probabilistic latent semantic analysis was used to calculate the latent features based on the historical rating data. The main advantages of this technique over standard memory-based methods are the higher accuracy, constant time prediction, and an explicit and compact model representation. The experimental evaluation shows that substantial improvements in accuracy over existing methods can be obtained.

Introduction

Because of the ability to help user deal with information overload and provide personalized recommendations, recommender systems have become an important research area since the first paper on collaborative filtering in the mid-1990s (Adomavicius & Tuzhilin, 2005). Recommender systems use historical data on user preferences and purchases and other available data on users and items to recommend items that might be of interest to new user. One of the earliest techniques developed for recommendations is based on nearest neighbor collaborative filtering algorithms that use the history of user preferences as input. Nearest-neighbor methods use some notion of similarity among user for whom predictions are being generated. Variations on this notion of similarity and other aspects of memory-based algorithms are discussed by Breese, Heckerman, and Kadie (1998) and Deshpande and Karypis (2004).

Although the Nearest-neighbor User-based collaborative filtering is the most successful technology for building recommender systems to date and is extensively used in many commercial recommender systems, the methods suffer from the scalability problem because the computational complexity of these methods grows linearly with the number of customers, which in typical commercial applications can be several millions. To address these scalability concerns, model-based recommendation techniques have been developed. A simple probabilistic approach to collaborative filtering was proposed by Breese et al. (1998), where the unknown ratings are calculated as the exception of the active user’s rating to the item according the known rating values. A collaborative filtering method in a machine learning framework was introduced by Billsus and Pazzani (1998), where various machine learning techniques (such as artificial neural networks) coupled with feature extraction techniques (such as singular value decomposition – an algebraic technique for reducing dimensionality of matrices) can be used. Zhang and Vijay (2002) used different linear classifiers to predict the user’s preference, and compared this method with another model-based method using decision trees and with memory-based methods using data from various domains. The experimental results indicate that linear models are well suited for this application. Ungar and Foster (1998) reviewed the two-sided clustering model for collaborative filtering and describe how this model can be represented by a Bayesian network (BN). Also, the author described how this model can be represented as a probabilistic relational model (PRM). Probabilistic latent semantic analysis (pLSA) Model is a new probabilistic graphical model for users’ purchase behavior (Hofmann 2004); this models rely on a statistical modeling technique to discover user communities and prototypical interest profiles.

Although both user and item have their external features, in the most of traditional collaborative filtering, the models ignore external features and only focus on history user ratings. Robin (2002) reviewed several hybrid recommender methods developed to combine the external features and historical rating data for higher predication accuracy. According to the experiment results reported, it is believed that both features and the historical ratings have great values to estimate the predication function for recommendation. Claypool and Gokhale (1999) introduced a simple linear combination of recommendation scores from different recommenders. It initially gives collaborative and content-based recommenders an equal weight, but gradually it adjusts the weighting as predictions about user ratings are confirmed or disconfirmed. The weight adjustment was crucial to the predicting performance, and it was through the system long-time running to minimize the past error. Basu, Hirsh, and Cohen (1998) presented a method exploited both user ratings and content feature in recommending movies. The method treats collaborative filtering problem as a classification problem, which combined the content feature of recommending movies and the collaborative features such as “Users who liked many movies of genre X” generated from the historical ratings. Although it achieved significant improvements in precision over a pure collaborative filtering approach, the selection of collaborative features and content features depends on the expert’s experiences.

Inspired by the hybrid recommender method, we present a unified model based on learning classification model using both history ratings and the external features for higher predication accuracy. The new method in this paper has two stages; firstly, it employs the history rating data to get the latent features for users and items; secondly, the traditional approximate function-learning model such as artificial neural networks could be used to establish rating function on the extended features space. In the rest of the paper, we introduce the unified CF model in Section 2, the method for mining the latent features are introduced in Section 3, the empirical evaluations of the proposed approach are showed in Section 4, the conclusion and review the approach are presented in Section 5.

Section snippets

Unified model for collaborative filtering

There are two kinds of reasonable assumptions used in traditional CF algorithms. One is based on the users’ historical ratings, which means that if two users rate most of items similarly, they will rate other items similarly. Symmetrically, if two items were rated similarly by most of users, they will be rated similarly by a specific user. The user-based and item-based collaborative filtering methods employ those assumptions, respectively. Another assumption is based on the external user and

Foundation of probabilistic latent semantic analysis

As the name indicates, the pLSA has been largely inspired and influenced by latent semantic analysis (LSA) (Deerwester et al., 1990) which could be used any type of count data over a discrete dyadic domain, such as the analysis and retrieval of text documents. Suppose that there is a collection of text documents D = {d1,..., dN} with terms from a vocabulary W = {w1,..., wM}. By ignoring the sequential order in which words occur in a document, one may summarize the data in a rectangular co-occurrence

Data set

There are several popular benchmark data sets used in the collaborative filtering research, such as EachMovie, Jester data, MsWeb. But anonymous users are including in most of these data sets. Only the EachMovie data set provided some user’s basic information; therefore, we use this data set in our experiments. The user’s age, gender, and occupation features were selected as user objects external features. For the movie objects, the movie title, release date, and video release date were

Conclusions and feature work

In this paper, collaborative filtering is investigated as a classification problem, and the probabilistic latent semantic analysis method is used to construct the latent features for both item and user. A simple and effective unified collaborative filtering model has been presented, which takes advantage of the historical ratings data as well as the external features. This method achieves competitive recommendation and prediction accuracies because this method did not need any prior expert

Acknowledgements

This work is supported by Natural Science Foundation Project of CQ CSTC (2008BB2195) and the Australian ARC Large Grant DP0558879.

References (16)

  • G. Adomavicius et al.

    Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions

    IEEE Transactions on Knowledge and Data Engineering

    (2005)
  • Basu, C., Hirsh, H., & Cohen, W. (1998). Recommendation as classification: Using social and content-based information...
  • R. Battiti

    First and second order methods for learning: Between steepest descent and Newton’s method

    Neural Computation

    (1992)
  • Billsus, D., & Pazzani, M. (1998). Learning collaborative information filters. In Machine learning: Proceedings of the...
  • Breese, J.S., Heckerman, D., & Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative...
  • Claypool, M., Gokhale, A., et al. (1999). Combining content-based and collaborative filters in an online newspaper. In...
  • S. Deerwester et al.

    Indexing by latent semantic analysis

    Journal of the American Society for Information Science

    (1990)
  • M. Deshpande et al.

    Item-based top-N recommendation algorithms

    ACM Transactions on Information Systems

    (2004)
There are more references available in the full text version of this article.

Cited by (0)

View full text