Elsevier

Expert Systems with Applications

Volume 40, Issue 17, 1 December 2013, Pages 6758-6765
Expert Systems with Applications

Pessimists and optimists: Improving collaborative filtering through sentiment analysis

https://doi.org/10.1016/j.eswa.2013.06.049Get rights and content

Highlights

  • We apply Sentiment Analysis in Recommender Systems by categorizing users according to the average polarity of their comments.

  • We have generated a new corpus of opinions on movies obtained from the Internet Movie Database (IMDb).

  • We improve Collaborative Filtering algorithms in rating prediction tasks.

Abstract

This work presents a novel application of Sentiment Analysis in Recommender Systems by categorizing users according to the average polarity of their comments. These categories are used as attributes in Collaborative Filtering algorithms. To test this solution a new corpus of opinions on movies obtained from the Internet Movie Database (IMDb) has been generated, so both ratings and comments are available. The experiments stress the informative value of comments. By applying Sentiment Analysis approaches some Collaborative Filtering algorithms can be improved in rating prediction tasks. The results indicate that we obtain a more reliable prediction considering only the opinion text (RMSE of 1.868), than when apply similarities over the entire user community (RMSE of 2.134) and sentiment analysis can be advantageous to recommender systems.

Introduction

Today we find on the Internet a huge amount of social and unstructured information, which is called the Social Web. The number of online opinions or comments expressed in the thoughts about a variety of topics is constantly growing, and a large percentage of Internet users uses these opinions and assessments to make decisions. Thousands of opinions and assessments on books, movies, travels, products or services are populating the web every day.

In Information Retrieval, Recommender Systems (RS) are tools whose objective is to assist users in their information search processes, helping them to filter the retrieved items, using the proposed item recommendations (Peña Henríquez & Carrillo, 2008). These recommendations are generated from other user opinions on certain items or from the user profile and item description, leading to the two major RS approaches (Yager, 2003): collaborative based or content based. The former group tries to find, for a given user, those users with similar interests, rating new products or recommending new items to the user from similar user profiles. The second group generates a profile of the user from their previously selected items and takes those items closer to this profile, which is characterized by item features rather than by similarities with other users. These systems are able to evaluate and filter the great amount of information available on the Internet to help users in their search and retrieval information processes (Herrera-Viedma, Herrera, Martínez, Herrera, & López, 2004). This is the reason why recommender systems have been so relevant to many commercial activities, like tourism (Ricci, 2002) or e-commerce (Schafer, Konstan, & Riedi, 1999) for more than a decade.

In this paper, a proposal for the application of Sentiment Analysis (SA) in recommender systems is detailed. First, the relation between comments and ratings is explored, to justify the consideration of to do comments as a valuable source of information. Then, a strategy for incorporating this knowledge is proposed. This approach categorizes users into two distinct groups: optimists and pessimists. The rest of experiments analyze how these categories can be used in collaborative filtering methods and how to perform this categorization using sentiment analysis solutions.

In order to perform these experiments, a corpus with both comments and ratings on a large set of items and users is needed. Main corpora known by the recommender systems community do not include textual opinions. Thus, a new corpus has been built from the Internet Movie Database (IMDb). Some details on the generation of this corpus are explained also in this paper.

The rest of the paper is organized as follows. In Section 2 a brief review on the state of the art in opinions mining and collaborative filtering is provided. Then, Section 3 describes the main corpus features and its generation. In the next section a walk through all the experiments performed allows the reader to understand how valuable textual information can be and how it has been used in collaborative filtering algorithms. Finally, in Section 5 we highlight the different contributions of this work and future tasks to continue this line of research.

Section snippets

State of the art

Recommender systems (Ricci, Rokach, & Shapira, 2011) mainly attend to two kind of problems: rating prediction and item recommendation. Rating prediction is focused on automatically calculate the score that a given user would assign to a given item, not known (or seen, bought… ) by this user. Item recommendation is an extension of the former, but proposing new products to the user that may satisfy him/her expectations. Basically, both problems are treated similarly. The first recommender systems

IMDb corpus

In order to perform the experiments, it is needed a corpus to train and test a recommender system (items rated by users) but incorporating textual reviews or opinions, so sentiment analysis approaches can be applied on these pieces of texts given by users on items. The Internet Movie Database3 (IMDb) is a great online database that provides information on movies. It started in 1990 as a hobby by a group of fans of movies and TV shows. IMDb provides a

Rating prediction experiments

One of the tasks that solve recommender systems is the prediction of the score (named as rating prediction). Collaborative filtering algorithms used in recommender systems usually do not pay attention on textual information. With the aim of checking whether user reviews are helpful in this task. We perform a series of experiments that allow us to answer the following questions in a sequence that defines the rationale behind our study:

  • 1.

    Is there an implicit relationship between a user’s comments

Conclusions and ongoing work

The most interesting aspect of collaborative filtering algorithms, compared with well-known text mining approaches, is that we can estimate a user’s score on a movie without having any comment, i.e., compute a distance between the user and the item when there is no relation at all. This is what really makes these algorithms very valuable for recommending new items, while the previous solution can not recommend new products as we cannot know the opinion of a user previously. But the results

References (45)

  • G. Bafoutsou et al.

    Review and functional classification of collaborative systems

    International Journal of Information Management

    (2002)
  • E. Herrera-Viedma et al.

    Incorporating filtering techniques in a fuzzy linguistic multi-agent model for information gathering on the web

    Fuzzy Sets and Systems

    (2004)
  • R. Yager

    Fuzzy logic methods in recommender systems

    Fuzzy Sets and Systems

    (2003)
  • S. Aciar et al.

    Informed recommender: Basing recommendations on consumer product reviews

    Intelligent Systems, IEEE

    (2007)
  • Alves, D., Freitas, M., Moura, T., & Souza, D. (2013). Using social network information to identify user contexts for...
  • R.M. Bell et al.

    Lessons from the netflix prize challenge

    ACM SIGKDD Explorations Newsletter

    (2007)
  • R. Bell et al.

    Modeling relationships at multiple scales to improve accuracy of large recommender systems

  • Blanco Fernández, Y. (2007). Propuesta metodológica para el razonamiento semántico en sistemas de recomendación...
  • E. Boldrini et al.

    Emotiblog: a finer-grained and more precise learning of subjectivity expression models

  • R. Burke

    Hybrid recommender systems: Survey and experiments

    User Modeling and User-adapted Interaction

    (2002)
  • E. Cambria et al.
    (2012)
  • Chen, L., Wang, W., Nagarajan, M., Wang, S., & Sheth, A. (2012). Extracting diverse sentiment expressions with...
  • S. Debnath et al.

    Feature weighting in content based recommendation system using social network analysis

  • C. Dellarocas

    Immunizing online reputation reporting systems against unfair ratings and discriminatory behavior

  • Esuli, A., & Sebastiani, F. (2006). Sentiwordnet: A publicly available lexical resource for opinion mining. In...
  • P. Foltz et al.

    Personalized information delivery: An analysis of information filtering methods

    Communications of the ACM

    (1992)
  • Galań Nieto, S. (2007). Filtrado colaborativo y sistemas de...
  • R. Gemulla et al.

    Large-scale matrix factorization with distributed stochastic gradient descent

  • Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. Machine...
  • S.D. Kamvar et al.

    We feel fine and searching the emotional web

  • R. Kohavi

    A study of cross-validation and bootstrap for accuracy estimation and model selection

  • Y. Koren

    Factor in the neighbors: Scalable and accurate collaborative filtering

    ACM Trans. Knowl. Discov. Data

    (2010)
  • Cited by (68)

    • Multicriteria Recommender System Using Different Approaches

      2021, Cognitive Big Data Intelligence with a Metaheuristic Approach
    • CF4J 2.0: Adapting Collaborative Filtering for Java to new challenges of collaborative filtering based recommender systems

      2021, Knowledge-Based Systems
      Citation Excerpt :

      The first approach builds the recommendations using a model derived from the ratings, whereas the second one uses similarity metrics to obtain the distance between two users or items according to their corresponding ratings. Performance of the CF RSs can also be improved by analyzing the users’ comments with sentiment analysis techniques [3,4] Collaborative Filtering for Java (CF4J) is a framework to carry out CF based research experiments that were designed keeping the scientific community in mind.

    • Multicriteria recommender system using different approaches

      2021, Cognitive Big Data Intelligence with a Metaheuristic Approach
    View all citing articles on Scopus

    This work has been granted by the Fondo Europeo de Desarrollo Regional (FEDER), TEXT-COOL 2.0 project (TIN2009-13391-C04-02), ATTOS project (TIN2012-38536-C03-0) from the Spanish Government. Also, this study is partially funded by the European Commission under the Seventh (FP7-2007-2013) Framework Programme for Research and Technological Development through the FIRST project (FP7-287607). This publication reflects the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

    View full text