Pessimists and optimists: Improving collaborative filtering through sentiment analysis☆
Introduction
Today we find on the Internet a huge amount of social and unstructured information, which is called the Social Web. The number of online opinions or comments expressed in the thoughts about a variety of topics is constantly growing, and a large percentage of Internet users uses these opinions and assessments to make decisions. Thousands of opinions and assessments on books, movies, travels, products or services are populating the web every day.
In Information Retrieval, Recommender Systems (RS) are tools whose objective is to assist users in their information search processes, helping them to filter the retrieved items, using the proposed item recommendations (Peña Henríquez & Carrillo, 2008). These recommendations are generated from other user opinions on certain items or from the user profile and item description, leading to the two major RS approaches (Yager, 2003): collaborative based or content based. The former group tries to find, for a given user, those users with similar interests, rating new products or recommending new items to the user from similar user profiles. The second group generates a profile of the user from their previously selected items and takes those items closer to this profile, which is characterized by item features rather than by similarities with other users. These systems are able to evaluate and filter the great amount of information available on the Internet to help users in their search and retrieval information processes (Herrera-Viedma, Herrera, Martínez, Herrera, & López, 2004). This is the reason why recommender systems have been so relevant to many commercial activities, like tourism (Ricci, 2002) or e-commerce (Schafer, Konstan, & Riedi, 1999) for more than a decade.
In this paper, a proposal for the application of Sentiment Analysis (SA) in recommender systems is detailed. First, the relation between comments and ratings is explored, to justify the consideration of to do comments as a valuable source of information. Then, a strategy for incorporating this knowledge is proposed. This approach categorizes users into two distinct groups: optimists and pessimists. The rest of experiments analyze how these categories can be used in collaborative filtering methods and how to perform this categorization using sentiment analysis solutions.
In order to perform these experiments, a corpus with both comments and ratings on a large set of items and users is needed. Main corpora known by the recommender systems community do not include textual opinions. Thus, a new corpus has been built from the Internet Movie Database (IMDb). Some details on the generation of this corpus are explained also in this paper.
The rest of the paper is organized as follows. In Section 2 a brief review on the state of the art in opinions mining and collaborative filtering is provided. Then, Section 3 describes the main corpus features and its generation. In the next section a walk through all the experiments performed allows the reader to understand how valuable textual information can be and how it has been used in collaborative filtering algorithms. Finally, in Section 5 we highlight the different contributions of this work and future tasks to continue this line of research.
Section snippets
State of the art
Recommender systems (Ricci, Rokach, & Shapira, 2011) mainly attend to two kind of problems: rating prediction and item recommendation. Rating prediction is focused on automatically calculate the score that a given user would assign to a given item, not known (or seen, bought… ) by this user. Item recommendation is an extension of the former, but proposing new products to the user that may satisfy him/her expectations. Basically, both problems are treated similarly. The first recommender systems
IMDb corpus
In order to perform the experiments, it is needed a corpus to train and test a recommender system (items rated by users) but incorporating textual reviews or opinions, so sentiment analysis approaches can be applied on these pieces of texts given by users on items. The Internet Movie Database3 (IMDb) is a great online database that provides information on movies. It started in 1990 as a hobby by a group of fans of movies and TV shows. IMDb provides a
Rating prediction experiments
One of the tasks that solve recommender systems is the prediction of the score (named as rating prediction). Collaborative filtering algorithms used in recommender systems usually do not pay attention on textual information. With the aim of checking whether user reviews are helpful in this task. We perform a series of experiments that allow us to answer the following questions in a sequence that defines the rationale behind our study:
- 1.
Is there an implicit relationship between a user’s comments
Conclusions and ongoing work
The most interesting aspect of collaborative filtering algorithms, compared with well-known text mining approaches, is that we can estimate a user’s score on a movie without having any comment, i.e., compute a distance between the user and the item when there is no relation at all. This is what really makes these algorithms very valuable for recommending new items, while the previous solution can not recommend new products as we cannot know the opinion of a user previously. But the results
References (45)
- et al.
Review and functional classification of collaborative systems
International Journal of Information Management
(2002) - et al.
Incorporating filtering techniques in a fuzzy linguistic multi-agent model for information gathering on the web
Fuzzy Sets and Systems
(2004) Fuzzy logic methods in recommender systems
Fuzzy Sets and Systems
(2003)- et al.
Informed recommender: Basing recommendations on consumer product reviews
Intelligent Systems, IEEE
(2007) - Alves, D., Freitas, M., Moura, T., & Souza, D. (2013). Using social network information to identify user contexts for...
- et al.
Lessons from the netflix prize challenge
ACM SIGKDD Explorations Newsletter
(2007) - et al.
Modeling relationships at multiple scales to improve accuracy of large recommender systems
- Blanco Fernández, Y. (2007). Propuesta metodológica para el razonamiento semántico en sistemas de recomendación...
- et al.
Emotiblog: a finer-grained and more precise learning of subjectivity expression models
Hybrid recommender systems: Survey and experiments
User Modeling and User-adapted Interaction
(2002)
Feature weighting in content based recommendation system using social network analysis
Immunizing online reputation reporting systems against unfair ratings and discriminatory behavior
Personalized information delivery: An analysis of information filtering methods
Communications of the ACM
Large-scale matrix factorization with distributed stochastic gradient descent
We feel fine and searching the emotional web
A study of cross-validation and bootstrap for accuracy estimation and model selection
Factor in the neighbors: Scalable and accurate collaborative filtering
ACM Trans. Knowl. Discov. Data
Cited by (68)
A computational approach towards food-wine recommendations
2024, Expert Systems with ApplicationsMulticriteria Recommender System Using Different Approaches
2021, Cognitive Big Data Intelligence with a Metaheuristic ApproachCF4J 2.0: Adapting Collaborative Filtering for Java to new challenges of collaborative filtering based recommender systems
2021, Knowledge-Based SystemsCitation Excerpt :The first approach builds the recommendations using a model derived from the ratings, whereas the second one uses similarity metrics to obtain the distance between two users or items according to their corresponding ratings. Performance of the CF RSs can also be improved by analyzing the users’ comments with sentiment analysis techniques [3,4] Collaborative Filtering for Java (CF4J) is a framework to carry out CF based research experiments that were designed keeping the scientific community in mind.
Multicriteria recommender system using different approaches
2021, Cognitive Big Data Intelligence with a Metaheuristic ApproachIntegrating Machine Learning and Evidential Reasoning for User Profiling and Recommendation
2023, Journal of Systems Science and Systems Engineering
- ☆
This work has been granted by the Fondo Europeo de Desarrollo Regional (FEDER), TEXT-COOL 2.0 project (TIN2009-13391-C04-02), ATTOS project (TIN2012-38536-C03-0) from the Spanish Government. Also, this study is partially funded by the European Commission under the Seventh (FP7-2007-2013) Framework Programme for Research and Technological Development through the FIRST project (FP7-287607). This publication reflects the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein.