Leveraging sentiment analysis at the aspects level to predict ratings of reviews

doi:10.1016/j.ins.2018.04.009

Information Sciences

Volumes 451–452, July 2018, Pages 295-309

https://doi.org/10.1016/j.ins.2018.04.009 Get rights and content

Abstract

Online reviews are an important asset for users who are deciding to buy a product, see a movie, or go to a restaurant and for managers who are making business decisions. The reviews from e-commerce websites are usually attached to ratings, which facilitates learning from the reviews by users. However, many reviews that spread across forums or social media are written in plain text, which is not rated, and these reviews are called non-rated reviews in this paper. From the perspective of sentiment analysis at the aspects level, this study develops a predictive framework for calculating ratings for non-rated reviews. The idea behind the framework began with an observation: the sentiment of an aspect is determined by its context; the rating of the review depends on the sentiment of the aspects and the number of positive and negative aspects in the review. Viewing term pairs that co-occur with aspects as their context, we conceived of a variant of a Conditional Random Field model, called SentiCRF, for generating term pairs and calculating their sentiment scores from a training set. Then, we developed a cumulative logit model that uses aspects and their sentiments in a review to predict the ratings of the review. In addition, we met the challenge of class imbalance when calculating the sentiment scores of term pairs. We also conceived of a heuristic re-sampling algorithm to tackle class imbalance. Experiments were conducted on the Yelp dataset, and their results demonstrate that the predictive framework is feasible and effective at predicting the ratings of reviews.

Introduction

Online reviews are an important asset for users who are deciding to buy a product, see a movie, or go to a restaurant and for managers who are making business decisions. When we talk about the reviews in the context of e-commerce, they usually refer to the text that is posted under the products, services, or businesses shown on the e-commerce website. Always, they are attached by star ratings that vary from 1-star to 5-star, which could facilitate learning about the reviews by the visitors. Fig. 1 exhibits an example of such a review on the Amazon website, which involves the Samsung Galaxy S6. A 3-star rating is assigned to the item.

However, other types of reviews are also widely spread across forums or social media. They do not show significant differences compared with those on the e-commerce website, except for the lack of ratings. This paper calls them non-rated reviews. For example, Fig. 2 illustrates a tweet on Twitter that addresses the iPhone 7.

Many business intelligence (BI) applications consider text messages that are scattered across the Internet to be important data sources. If these applications can provide ratings for the non-rated reviews, it will help a substantial amount in making business decisions. For example, in a system, we seek to obtain reviews from the Internet that are associated with a certain business and then calculate the star ratings for them. Furthermore, we aggregate the rated reviews to determine a dynamic performance that enables managers to conveniently gain insight into the business. Predicting the ratings of the non-rated reviews could be a valuable tool for building business intelligence applications.

The ratings prediction is a subfield of sentiment classification [3], [4], [9], [24], [26]. This subfield has attracted much interest over the past decade since it emerged in 2005 [8]. For instance, TASS is an experimental evaluation workshop for sentiment analysis focused on the Spanish language since 2012. Participants are expected to submit experiments for the 6-label evaluation (strong positive, positive, neutral, negative, strong negative and a no sentiment tag) on a public corpus. The six labels are equivalent to the rating of stars in the e-commerce.

However, most previous research regards predicting ratings of reviews as a task of the document-level multilabel classification [6], [7], [8]. Motivated by Observation 1, we believe the performance of prediction of ratings of reviews can be significantly improved by leveraging sentiment analysis at the aspects-level. Examining reviews on the e-commerce website, we obtain Observation 1.

Observation 1: Every review involves at least one aspect; almost all of the aspects in a 5-star rating review are positive sentiment; the aspects of a 1-star review tend to be negative sentiments; and a 3-star review comprises a nearly equal number of positive and negative aspects.¹

Fig. 3 shows a 3-star rating review that was derived from the Yelp website, where the aspects highlighted in a red color have positive sentiments and those highlighted in a yellow color have negative sentiments. Observing the review, we find that the number of positive aspects is almost equal to the number of negative aspects.

This study intends to build a predictive framework based on sentiment analysis at the aspects-level to provide star ratings for non-rated reviews. Observation 1 builds the cornerstone for this study. We also seek to provide the evidence for the reliability of the observation via experiments (see Section 4.3 for details). Motivated by Observation 1, the task of predicting ratings can be formally decomposed into three steps: extracting the aspects, obtaining their sentiments, and then, predicting the rating based on the aspects. To calculate the sentiment scores of the aspects, we must derive the context of the aspects, i.e., terms that co-occur with the aspects in the same sentence. The sentiment of the aspects depends on their context. Traditional lexicon-based sentiment analysis employs a sentiment score of words provided by the lexicon to determine the sentiment scores of phrases or sentences, even documents. Prior work [4], [5], [25] has proven that the sentiments of words could change depending on their context. Hence, in this study, we use the term pair 〈w₁, w₂〉 as a basic element, where both terms w₁ and w₂ are considered the context of their counterpart. We use a list of term pairs that co-occur with an aspect as the context of the aspect. Calculating the sentiment scores of the term pairs and then aggregating them, we obtain the sentiment score of the aspects. Furthermore, a cumulative logit model is developed that uses the sentiment scores of the aspects as features for predicting the ratings of the reviews. The main contributions of this study include the following:

(1)
Develop a variant of the Conditional Random Field model, called SentiCRF, to build term pairs and calculate their sentiment scores.
(2)
Conceive of a heuristic re-sampling algorithm to address the class imbalance that is encountered when we train the SentiCRF model.
(3)
Build a framework to predict the ratings of non-rated reviews.²

The remainder of this paper is organized as follows. Section 2 reviews the related research on predicting the star ratings of the reviews. Section 3 introduces a predictive framework. Section 4 presents the experimental results. We also discuss interesting findings in Section 5. Section 6 offers the concluding remarks.

Section snippets

Related work

The task of predicting the star ratings of reviews originates from the sentiment classification of reviews, i.e., classifying reviews as recommended (thumbs up) or not recommended (thumbs down) [3]. Research work for review classification can be divided into machine learning methods, lexicon-based methods and hybrid methods. Some studies see review classification as a problem of text classification, which employs traditional machine learning methods, e.g., SVM (Support Vector Machine) [1] or

A predictive framework

To better present this study, we first provide three definitions. The Yelp website³ uses the term business to refer to the target of the reviews. We also employ this term to present our work.

Definition 1 (Business). This study gives the items, products or services presented in the e-commerce website, which are the targets of reviews, a general name, business.

Liu [17] uses the term entity to indicate the products and uses aspects to refer to the features or attributes of the

Experiments

In this section, we evaluate both the feasibility and the performance of the predictive framework developed in this study using the Yelp dataset.⁴ The experiments are conducted on a PC with an INTEL i7 processor and 32 GB RAM.

SentiCRF model

In Section 4, we employ the Yelp dataset to train the SentiCRF model. We list the top 20 positive sentiment and negative sentiment term pairs in Table 8. We rank the term pairs by their sentiment score, sc, which is calculated using the formula $s c = \log (λ_{pos} + 1) - \log (λ_{neg} + 1)$ . If sc is larger than 0, then the term pair exhibits a positive sentiment; otherwise, it indicates a negative sentiment.

From observing Table 8, we find that ‘downside’ surprisingly shows at the top of the list of positive

Conclusions

Predicting the ratings of the reviews is a task that is both interesting and challenging. This task could be a basic and even key component in a business intelligence application. In this paper, we propose a predictive framework to meet the challenges. We develop a SentiCRF model to build a collection of term pairs and to calculate their sentiments from a training set. To predict the star rating of a non-rated review, we extract the aspects of the review and their contexts, i.e., term pairs

Acknowledgments

This work was supported by National Natural Science Foundation of China (no. 71571145), Humanities and Social Science Foundation of Ministry of Education of China (no. 14YJAZH063) and the Fundamental Research Funds for the Central Universities (nos. JBK120505 and JBK150503).

References (28)

R. Prabowo1 et al.
Sentiment analysis: a combined approach
J. Inf.
(2009)
A. Ortigosa et al.
Sentiment analysis in Facebook and its application to e-learning
Comput. Hum. Behav.
(2014)
V. García et al.
On the effectiveness of preprocessing methods when dealing with different levels of class imbalance
Knowl. Based Syst.
(2012)
Y. Freund et al.
A decision-theoretic generalization of on-line learning and an application to boosting
J. Comput. Syst. Sci.
(1997)
O. Appel et al.
A hybrid approach to the sentiment analysis problem at the sentence level
Knowl. Based Syst.
(2016)
A. Muhammad et al.
Contextual sentiment analysis for social media genres
Knowl. Based Syst.
(2016)
S. Poria et al.
Aspect extraction for opinion mining with a deep convolutional neural network
Knowl. Based Syst.
(2016)
B. Pang et al.
Thumbs up? Sentiment classification using machine learning techniques
Li, F., et al., Incorporating reviewer and product information for review rating prediction, Proceedings of the...
P.D. Turney
Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews

T. Wilson et al.

Recognizing contextual polarity: an exploration of features for phrase-level sentiment analysis

Comput. Ling.

(2009)

A. Weichselbraun et al.

Extracting and grounding context-aware sentiment lexicons

IEEE Intell. Syst.

(2013)

Qu, L., Ifrim, G., Weikum, G., The bag-of-opinions method for review rating prediction from sparse text patterns,...

D. Tang et al.

User modeling with neural network for review rating prediction

Cited by (52)

Predicting determinants influencing user satisfaction with mental health app: An explainable machine learning approach based on unstructured data
2024, Expert Systems with Applications
In the contemporary digital landscape, the rising concern for mental health has sparked a surge in the use of mental health apps (MHAs) as accessible tools for addressing psychological well-being. Maintaining a high level of user satisfaction (USAT) is important for MHAs in the highly competitive app market. Leveraging BERT (Bidirectional Encoder Representations from Transformers), a state-of-the-art deep learning (DL) model, we perform topic modeling and sentiment analysis on 17,717 user online reviews. Specifically, we employ the BERTopic model to identify the determinants of USAT with MHAs. Utilizing a BERT-base-multilingual-uncase-sentiment model, we perform sentiment analysis to distinguish determinants that elicit satisfaction from those causing dissatisfaction. Also, this study tests and compares six machine learning (ML) algorithms to predict the influence of determinants on USAT with MHAs. The Light Gradient Boosting Machine (LightGBM) emerges as the top performer, showcasing its efficacy in predicting USAT determinants. By using SHAP (Shapley Additive exPlanations), an explainable ML model with cross-validation, we visualize the results of the LightGBM. The SHAP values show that the five most influential determinants of USAT with MHAs include soothing audio experience, smoking cessation support, payment and subscription management, tracking progress and mindful meditation experience. This study facilitates a deeper understanding of user experiences through the identification and prediction of determinants of USAT with MHAs. Understanding these factors and their interplay is essential for developers, clinicians, and stakeholders who aim to enhance MHAs’ services and ultimately improve the well-being of users.
Knowledge evolutionary process of Artificial intelligence in E-commerce: Main path analysis and science mapping analysis
2024, Expert Systems with Applications
Artificial intelligence (AI) is the latest designing and interacting technology used to support organizations in the competitive electronic commerce (E-commerce) context. The studies on AI in E-commerce remain fragmentary since few papers review the evolutionary process of knowledge infrastructure. To bridge this gap, we constructed a conceptual framework from the resource orchestration perspective, and retrieved 2252 documents from the Web of Science (WoS) database dating from 1998 to 2022 to synthesize the extant research on AI in E-commerce. Specifically, we integrated main path analysis with science mapping analysis covering the strategic diagram and evolution map. Through qualitative analysis of critical nodes in main paths, this article found that knowledge disseminated around recommender systems (RSs). This article also examined the performance traits and inter-relationships of significant themes over three consecutive time zones, as well as identified four main AI-enabled outcomes (thematic areas): Optimization and decision support, Trust and personalized recommendation, Sentiment analysis, and AI theories with allied technologies. Potential research prospects and implications are proposed based on these findings. Overall, this article provides state-of-the-art information on how AI can facilitate the implementation of E-commerce operations and could serve as a stepping stone to newcomers in the scientific community.
An end-to-end deep learning model for solving data-driven newsvendor problem with accessibility to textual review data
2023, International Journal of Production Economics
We investigate a data-driven single-period inventory management problem with uncertain demand, where large amounts of textual online reviews and historical data are accessible. Unlike two-step frameworks (i.e., predict-then-optimization), we propose an end-to-end (E2E) framework that directly suggests the order quantity by leveraging a deep learning model that inputs textual online reviews and other demand-related feature data, without any intermediate steps such as text sentiment analysis. The E2E model does not require any prior assumptions about the demand distribution and can automatically determine the order quantity that minimizes the newsvendor cost by employing the information from real-world data. Our experiments, using publicly available real-world data, demonstrate that our method can significantly reduce the sum of overage and underage costs, outperforming other data-driven models proposed in recent years. Specifically, the inclusion of textual online review data improves ordering decisions by a 28.7% cost reduction.
Towards risk-aware artificial intelligence and machine learning systems: An overview
2022, Decision Support Systems
The adoption of artificial intelligence (AI) and machine learning (ML) in risk-sensitive environments is still in its infancy because it lacks a systematic framework for reasoning about risk, uncertainty, and their potentially catastrophic consequences. In high-impact applications, inference on risk and uncertainty will become decisive in the adoption of AI/ML systems. To this end, there is a pressing need for a consolidated understanding on the varied risks arising from AI/ML systems, and how these risks and their side effects emerge and unfold in practice. In this paper, we provide a systematic and comprehensive overview of a broad array of inherent risks that can arise in AI/ML systems. These risks are grouped into two categories: data-level risk (e.g., data bias, dataset shift, out-of-domain data, and adversarial attacks) and model-level risk (e.g., model bias, misspecification, and uncertainty). In addition, we highlight the research needs for developing a holistic framework for risk management dedicated to AI/ML systems to hedge the corresponding risks. Furthermore, we outline several research related challenges and opportunities along with the development of risk-aware AI/ML systems. Our research has the potential to significantly increase the credibility of deploying AI/ML models in high-stakes decision settings for facilitating safety assurance, and preventing systems from unintended consequences.
Incorporating explicit syntactic dependency for aspect level sentiment classification
2021, Neurocomputing
Citation Excerpt :
Aspect level sentiment classification aims to extract sentiment expressed towards specific aspects from a sentence. It has important implications for various tasks, e.g., commodity recommendation [1,2], political stance analysis [3,4]. It can provide more fine-grained sentiment compared to sentence-level sentiment classification [5].
Aspect level sentiment classification aims to extract fine-grained sentiment expressed towards specific aspects from a sentence. The key to this task lies in connecting aspects and their respective sentiment contexts. Existing methods measure the dependency weights between aspects and context words via either the semantic similarity between words captured by attention mechanism or the structural proximity between words in syntactic structures. However, methods in both groups fail to fully exploit explicit syntactic dependency, which we argue should be critical to identify sentiment contexts. In this paper, we propose a novel syntactic-dependency-based attention network (SDATT) to incorporate explicit syntactic dependency for aspect level sentiment classification. SDATT first models the dependency path between each word and the aspect to characterize aspect-oriented syntactic representation of each word. The generated syntactic representations are later fed into the attention layer to help infer the dependency weights for sentiment prediction. Experimental results on five benchmark datasets show the superior performance of the proposed model over state-of-the-art baselines.
S2SAN: A sentence-to-sentence attention network for sentiment analysis of online reviews
2021, Decision Support Systems
Many existing attention-based deep learning approaches to sentiment analysis have focused on words and represent an entire review text as a word sequence. However, these approaches overlook the differences in the importance of each sentence to the complete text. To solve this problem, some work has been performed to calculate sentence-level attention, but these studies use the same approach that is applied to word-level attention, which leads to unnecessary sequential structures and increased complexity of sentence representation. Therefore, in this paper, we propose a sentence-to-sentence attention network¹ (S2SAN) using multihead self-attention. We conducted several domain-specific, cross-domain and multidomain sentiment analysis experiments with real-world datasets. The experimental results show that S2SAN outperforms other state-of-the-art models. Some classical sentiment classifiers [e.g., convolutional neural network (CNN), recurrent neural network (RNN), and long short-term memory (LSTM) models] achieve better accuracies when they are reconfigured to include sentence-to-sentence attention.

View all citing articles on Scopus

View full text

Published by Elsevier Inc.

Leveraging sentiment analysis at the aspects level to predict ratings of reviews

Abstract

Introduction

Section snippets

Related work

A predictive framework

Experiments

SentiCRF model

Conclusions

Acknowledgments

J. Inf.

Comput. Hum. Behav.

Knowl. Based Syst.

J. Comput. Syst. Sci.

Knowl. Based Syst.

Knowl. Based Syst.

Knowl. Based Syst.

Thumbs up? Sentiment classification using machine learning techniques

Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews

Recognizing contextual polarity: an exploration of features for phrase-level sentiment analysis

Comput. Ling.

Extracting and grounding context-aware sentiment lexicons

IEEE Intell. Syst.

User modeling with neural network for review rating prediction