ConSent: Context-based sentiment analysis
Introduction
Sentiment analysis refers to the inference of people’s views, positions and attitudes in their written or spoken texts. Before the coining of the term, the field was studied under names such as subjectivity [1], point of view [2] and opinion mining [3]. Nowadays, the field is rapidly evolving due to the rise of new platforms such as blogs, social media and user-generated reviews.
A large body of work exists on the analysis of latent sentiment in social media platforms such as Twitter. The goal of these studies is to extract timely and relevant information as well as to gauge widespread opinions and sentiment. Agarwal et al. [4], for example, used lexicons and machine learning methods to infer sentiment in tweets while Wang et al. [5] used graph and hashtag analysis to evaluate opinions regarding various topics. In industry, studies of this kind could be beneficial for corporations interested in engaging disgruntled customers and reducing customer turnover [6].
Another domain that has received increased attention in recent years is sentiment analysis of transcribed text, i.e., speech converted to text using automatic machine translation [7]. By analyzing transcribed call center conversations, for example, corporation could identify (and approach) dissatisfied customers, assess their attitudes towards services and products and identify problems at their early stages.
Sentiment analysis is considered a challenging natural language processing (NLP) problem [8], and particularly so for Twitter and transcribed text. Twitter is difficult to analyze due to the short length of the text as well as the non-standard abbreviations that are often used [9]. These two traits create a highly sparse representation of terms and difficulties in the identification of synonyms and other relations between terms. In transcribed text, the relatively high error rate of the transcription process [10] causes a decline in the performance of NLP techniques such as parsing. For example, parsing the transcribed utterance “got we can’t was may need to delete files” yields a different structure from parsing the true utterance “well we may need to delete some files.” The mistaken analysis is mainly due to the error at the root of the tree – got instead of well. Apart from errors in speech recognition, spontaneous speech is often grammatically erroneous and always without punctuation marks.
We present Context-based Sentiment analysis (ConSent), a novel approach for sentiment analysis which has proven effective both for regular texts (those that adhere to the rules of grammar and use existing words) and texts with a high degree of noise. Our proposed approach consists of two phases: we first apply techniques from the field of information retrieval to detect key terms in the text and analyze the context in which they appear. Then, we use the detected terms to generate features for supervised learning. ConSent has several traits that make it very suitable for sentiment analysis in general and noisy text in particular: first, it considers the context of terms, which is important when the text is noisy [11], [12]; secondly, it does not rely on grammatical structures, which may not be reliable in noisy text; and finally, it enables an easy integration of information from additional sources, such as the metadata of the analyzed text (e.g., the length of the text, the time of creation, the use of emoticons, etc.).
We tested our approach on three benchmark datasets: hotel reviews from the TripAdvisor website,1 movie reviews and Twitter. Additionally, we created two automatically transcribed datasets, from two real-world call-centers. While the reviews datasets are used to evaluate our approach in frequently used domains, the other datasets enable us to test ConSent in a more challenging setting: noisy and unstructured text. In our evaluation, ConSent fares better than the commonly used and highly effective baseline in the majority of cases.
The contribution of this paper is threefold: first, we present a novel method for sentiment analysis whose strength lies in its ability to effectively analyze context and handle noisy data. This stems from ConSent’s ability to selectively analyze terms that are in close proximity to one another while ignoring terms it deems irrelevant. Secondly, we evaluate our method on two datasets of transcribed phone conversations, obtained from real-world telecommunication companies. These difficult-to-obtain datasets enable us to conduct an accurate evaluation of our algorithm as well as to report on lessons learned from this real-world data. Thirdly, we propose a novel set of meta-features – developed specifically for sentiment analysis in transcribed text. As shown by our experiments, these features can greatly enhance the performance of learning-based approaches.
The structure of the paper is as follows: in Section 2 we present related work; ConSent is presented in Section 3, and the evaluation process is described in detail in Section 4. Finally, we discuss our results and propose directions for future research in Section 5.
Section snippets
Sentiment analysis in text
The techniques utilized for sentiment analysis can be roughly divided to lexical and learning-based approaches. The lexical methods often use predefined dictionaries of terms annotated with “positive” or “negative” scores, indicating the strength (i.e. polarity) of the sentiment they represent. These dictionaries are then used to determine the sentiment of the analyzed document by examining the frequencies and other measures of the negative and positive words in the text. The lexical approach
The proposed approach
ConSent consists of two phases: learning and detection. During the learning phase we first generate a set of key terms and context terms from a training set. Then, we generate features based on these terms and train a classifier that will be used to analyze sentiment in the detection phase. The detection phase is designed as a classification task; we scan each document in search of the key and context terms, generate features based on the terms that were found, and use the classifier to
Evaluation
In order to conduct a comprehensive evaluation of our method, we used several types of datasets from different domains. To evaluate our method on text with little or no noise we use hotel reviews from TripAdvisor and movie reviews from the Internet Movie Database (IMDB). For evaluation on the more difficult task of noisy and short texts we use data from Twitter and automatically-transcribed phone calls made to the call centers of two large telecommunication companies.
It is also important to
Summary and future work
In this study we present ConSent—a novel context-based method for sentiment analysis. Our approach builds on techniques from the field of information retrieval in order to identify key terms which are indicative of sentiment and the context in which they appear. Our experiments demonstrate that ConSent significantly outperforms leading baselines both in standard and “noisy” text.
The strength of our approach stems both from its focus on key terms and its heavy reliance on context, which has the
References (97)
- et al.
Sentiment classification of online reviews to travel destinations by supervised machine learning approaches
Expert Syst. Appl.
(2009) The optimality of naive Bayes
AA
(2004)- et al.
Leveraging word confusion networks for named entity modeling and detection from conversational telephone speech
Speech Commun.
(2012) - et al.
CoBAn: a context based model for data leakage prevention
Inform. Sci.
(2014) - et al.
Effectiveness of template detection on noise reduction and websites summarization
Inform. Sci.
(2013) The use of the area under the ROC curve in the evaluation of machine learning algorithms
Pattern Recogn.
(1997)- et al.
Agreement, the f-measure, and reliability in information retrieval
J. Am. Med. Inform. Assoc.
(2005) Centrality in social networks conceptual clarification
Soc. Netw.
(1978)Observations and speculations on subjectivity
Iconicity Syntax
(1985)- (2002)
Opinion mining and sentiment analysis
Found. Trends Inform. Ret.
Sentiment analysis of twitter data
Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach
Loyalty Rules!: How Today’s Leaders Build Lasting Relationships
Thumbs up?: Sentiment classification using machine learning techniques
Boosting statistical machine translation by lemmatization and linear interpolation
Recognizing contextual polarity in phrase-level sentiment analysis
Recognizing contextual polarity: an exploration of features for phrase-level sentiment analysis
Comput. Linguist.
Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews
Predicting the semantic orientation of adjectives
Mining and summarizing customer reviews
Contextual valence shifters
Fuzzy clustering for semi-supervised learning–case study: construction of an emotion lexicon
Effects of adjective orientation and gradability on sentence subjectivity
Mining product reputations on the web
New avenues in opinion mining and sentiment analysis
IEEE Intell. Syst.
Towards real-time measurement of customer satisfaction using automatically generated call transcripts
A machine learning approach to sentiment analysis in multilingual Web texts
Inform. Ret.
Sentiment analysis of blogs by combining lexical knowledge with text classification
Choosing multiple parameters for support vector machines
Mach. Learn.
Learning with compositional semantics as structural inference for subsentential sentiment analysis
Learning word vectors for sentiment analysis
Latent aspect rating analysis on review text data: a rating regression approach
Weakly supervised techniques for domain-independent sentiment classification
Crowdsourcing a wikipedia vandalism corpus
Co-training for cross-lingual sentiment classification
The relationships of customer satisfaction, customer loyalty, and profitability: an empirical study
Int J. Serv. Ind. Manage.
Personal characteristics as moderators of the relationship between customer satisfaction and loyalty—an empirical analysis
Psychol. Market.
Text mining of business-oriented conversations at a call center
Cited by (67)
A constrained optimization approach for cross-domain emotion distribution learning
2021, Knowledge-Based SystemsCitation Excerpt :Ablation experiments validate the effectiveness of the two contributions, i.e., the many-to-many relationships between document clusters and labels, and the content-based constraint. Sentiment analysis, which refers to the inference of users’ views, positions, and attitudes in their written or spoken documents [18–23], plays an increasingly important role in real-life applications [24–30]. Both lexical and learning-based approaches have been utilized for this task [31,32].
Kernel compositional embedding and its application in linguistic structured data classification
2020, Knowledge-Based SystemsTwitter sentiment analysis with a deep neural network: An enhanced approach using user behavioral information
2019, Cognitive Systems ResearchCitation Excerpt :Most existing classification algorithms for social media sentiment analysis focus on tweet contents (document-level); for instance (Katz, Ofek, & Shapira, 2015; Khan, Qamar, & Bashir, 2016). As an illustration, let us take the algorithm named ConSent (for Context-based Sentiment Analysis), which is proposed in Katz et al. (2015). ConSent has two phases, learning and detection.
Three-way enhanced convolutional neural networks for sentence-level sentiment classification
2019, Information SciencesA cost-sensitive three-way combination technique for ensemble learning in sentiment classification
2019, International Journal of Approximate ReasoningCitation Excerpt :Paltoglou et al. [41] and Kim et al. [42] studied the feature weights by investigating variants weighting functions from information retrieval. Katz et al. [43] presented ConSent, a novel context-based approach which was effective both in noiseless and noisy text. Salvador et al. [44] used meta-learning to combine and enrich several baseline methods (bag of words, n-grams, lexical resource-based classifier) aiming at cross-domain polarity classification.
Sentiment classification with modified RoBERTa and recurrent neural networks
2024, Multimedia Tools and Applications