Elsevier

Knowledge-Based Systems

Volume 84, August 2015, Pages 162-178
Knowledge-Based Systems

ConSent: Context-based sentiment analysis

https://doi.org/10.1016/j.knosys.2015.04.009Get rights and content

Abstract

We present ConSent, a novel context-based approach for the task of sentiment analysis. Our approach builds on techniques from the field of information retrieval to identify key terms indicative of the existence of sentiment. We model these terms and the contexts in which they appear and use them to generate features for supervised learning. The two major strengths of the proposed model are its robustness against noise and the easy addition of features from multiple sources to the feature set. Empirical evaluation over multiple real-world domains demonstrates the merit of our approach, compared to state-of the art methods both in noiseless and noisy text.

Introduction

Sentiment analysis refers to the inference of people’s views, positions and attitudes in their written or spoken texts. Before the coining of the term, the field was studied under names such as subjectivity [1], point of view [2] and opinion mining [3]. Nowadays, the field is rapidly evolving due to the rise of new platforms such as blogs, social media and user-generated reviews.

A large body of work exists on the analysis of latent sentiment in social media platforms such as Twitter. The goal of these studies is to extract timely and relevant information as well as to gauge widespread opinions and sentiment. Agarwal et al. [4], for example, used lexicons and machine learning methods to infer sentiment in tweets while Wang et al. [5] used graph and hashtag analysis to evaluate opinions regarding various topics. In industry, studies of this kind could be beneficial for corporations interested in engaging disgruntled customers and reducing customer turnover [6].

Another domain that has received increased attention in recent years is sentiment analysis of transcribed text, i.e., speech converted to text using automatic machine translation [7]. By analyzing transcribed call center conversations, for example, corporation could identify (and approach) dissatisfied customers, assess their attitudes towards services and products and identify problems at their early stages.

Sentiment analysis is considered a challenging natural language processing (NLP) problem [8], and particularly so for Twitter and transcribed text. Twitter is difficult to analyze due to the short length of the text as well as the non-standard abbreviations that are often used [9]. These two traits create a highly sparse representation of terms and difficulties in the identification of synonyms and other relations between terms. In transcribed text, the relatively high error rate of the transcription process [10] causes a decline in the performance of NLP techniques such as parsing. For example, parsing the transcribed utterance “got we can’t was may need to delete files” yields a different structure from parsing the true utterance “well we may need to delete some files.” The mistaken analysis is mainly due to the error at the root of the tree – got instead of well. Apart from errors in speech recognition, spontaneous speech is often grammatically erroneous and always without punctuation marks.

We present Context-based Sentiment analysis (ConSent), a novel approach for sentiment analysis which has proven effective both for regular texts (those that adhere to the rules of grammar and use existing words) and texts with a high degree of noise. Our proposed approach consists of two phases: we first apply techniques from the field of information retrieval to detect key terms in the text and analyze the context in which they appear. Then, we use the detected terms to generate features for supervised learning. ConSent has several traits that make it very suitable for sentiment analysis in general and noisy text in particular: first, it considers the context of terms, which is important when the text is noisy [11], [12]; secondly, it does not rely on grammatical structures, which may not be reliable in noisy text; and finally, it enables an easy integration of information from additional sources, such as the metadata of the analyzed text (e.g., the length of the text, the time of creation, the use of emoticons, etc.).

We tested our approach on three benchmark datasets: hotel reviews from the TripAdvisor website,1 movie reviews and Twitter. Additionally, we created two automatically transcribed datasets, from two real-world call-centers. While the reviews datasets are used to evaluate our approach in frequently used domains, the other datasets enable us to test ConSent in a more challenging setting: noisy and unstructured text. In our evaluation, ConSent fares better than the commonly used and highly effective baseline in the majority of cases.

The contribution of this paper is threefold: first, we present a novel method for sentiment analysis whose strength lies in its ability to effectively analyze context and handle noisy data. This stems from ConSent’s ability to selectively analyze terms that are in close proximity to one another while ignoring terms it deems irrelevant. Secondly, we evaluate our method on two datasets of transcribed phone conversations, obtained from real-world telecommunication companies. These difficult-to-obtain datasets enable us to conduct an accurate evaluation of our algorithm as well as to report on lessons learned from this real-world data. Thirdly, we propose a novel set of meta-features – developed specifically for sentiment analysis in transcribed text. As shown by our experiments, these features can greatly enhance the performance of learning-based approaches.

The structure of the paper is as follows: in Section 2 we present related work; ConSent is presented in Section 3, and the evaluation process is described in detail in Section 4. Finally, we discuss our results and propose directions for future research in Section 5.

Section snippets

Sentiment analysis in text

The techniques utilized for sentiment analysis can be roughly divided to lexical and learning-based approaches. The lexical methods often use predefined dictionaries of terms annotated with “positive” or “negative” scores, indicating the strength (i.e. polarity) of the sentiment they represent. These dictionaries are then used to determine the sentiment of the analyzed document by examining the frequencies and other measures of the negative and positive words in the text. The lexical approach

The proposed approach

ConSent consists of two phases: learning and detection. During the learning phase we first generate a set of key terms and context terms from a training set. Then, we generate features based on these terms and train a classifier that will be used to analyze sentiment in the detection phase. The detection phase is designed as a classification task; we scan each document in search of the key and context terms, generate features based on the terms that were found, and use the classifier to

Evaluation

In order to conduct a comprehensive evaluation of our method, we used several types of datasets from different domains. To evaluate our method on text with little or no noise we use hotel reviews from TripAdvisor and movie reviews from the Internet Movie Database (IMDB). For evaluation on the more difficult task of noisy and short texts we use data from Twitter and automatically-transcribed phone calls made to the call centers of two large telecommunication companies.

It is also important to

Summary and future work

In this study we present ConSent—a novel context-based method for sentiment analysis. Our approach builds on techniques from the field of information retrieval in order to identify key terms which are indicative of sentiment and the context in which they appear. Our experiments demonstrate that ConSent significantly outperforms leading baselines both in standard and “noisy” text.

The strength of our approach stems both from its focus on key terms and its heavy reliance on context, which has the

References (97)

  • B. Pang et al.

    Opinion mining and sentiment analysis

    Found. Trends Inform. Ret.

    (2008)
  • A. Agarwal et al.

    Sentiment analysis of twitter data

  • X. Wang et al.

    Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach

  • F.F. Reichheld

    Loyalty Rules!: How Today’s Leaders Build Lasting Relationships

    (2001)
  • B. Liem, H. Zhang, Y. Chen, An iterative dual pathway structure for speech-to-text transcription, in: Human...
  • B. Pang et al.

    Thumbs up?: Sentiment classification using machine learning techniques

  • A. Pak, P. Paroubek, Twitter as a corpus for sentiment analysis and opinion mining, in: LREC,...
  • R. Zhang et al.

    Boosting statistical machine translation by lemmatization and linear interpolation

  • T. Wilson et al.

    Recognizing contextual polarity in phrase-level sentiment analysis

  • T. Wilson et al.

    Recognizing contextual polarity: an exploration of features for phrase-level sentiment analysis

    Comput. Linguist.

    (2009)
  • P.D. Turney

    Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews

  • V. Hatzivassiloglou et al.

    Predicting the semantic orientation of adjectives

  • M. Taboada, C. Anthony, K. Voll, Methods for creating semantic orientation dictionaries, in: Proceedings of the 5th...
  • H. Takamura, T. Inui, M. Okumura, Extracting semantic orientations of phrases from dictionary, in: HLT-NAACL,...
  • M. Hu et al.

    Mining and summarizing customer reviews

  • R.M. Tong, An operational system for detecting and tracking opinions in on-line discussion, in: Working Notes of the...
  • L. Polanyi et al.

    Contextual valence shifters

  • S. Poria et al.

    Fuzzy clustering for semi-supervised learning–case study: construction of an emotion lexicon

  • V. Hatzivassiloglou et al.

    Effects of adjective orientation and gradability on sentence subjectivity

  • J. Wiebe, Learning subjective adjectives from corpora, in: AAAI/IAAI,...
  • F. Benamara, C. Cesarano, A. Picariello, D.R. Recupero, V.S. Subrahmanian, Sentiment analysis: adjectives and adverbs...
  • S. Morinaga et al.

    Mining product reputations on the web

  • E. Cambria et al.

    New avenues in opinion mining and sentiment analysis

    IEEE Intell. Syst.

    (2013)
  • Y. Park et al.

    Towards real-time measurement of customer satisfaction using automatically generated call transcripts

  • E. Boiy et al.

    A machine learning approach to sentiment analysis in multilingual Web texts

    Inform. Ret.

    (2009)
  • P. Melville et al.

    Sentiment analysis of blogs by combining lexical knowledge with text classification

  • A. Balahur, R. Steinberger, M. Kabadjov, V. Zavarella, E. Van Der Goot, M. Halkia, B. Pouliquen, J. Belyaeva, Sentiment...
  • O. Chapelle et al.

    Choosing multiple parameters for support vector machines

    Mach. Learn.

    (2002)
  • T. Mullen, N. Collier, Sentiment analysis using support vector machines with diverse information sources, in: EMNLP,...
  • Y. Choi et al.

    Learning with compositional semantics as structural inference for subsentential sentiment analysis

  • A.L. Maas et al.

    Learning word vectors for sentiment analysis

  • H. Wang et al.

    Latent aspect rating analysis on review text data: a rating regression approach

  • J. Read et al.

    Weakly supervised techniques for domain-independent sentiment classification

  • M. Potthast

    Crowdsourcing a wikipedia vandalism corpus

  • X. Wan

    Co-training for cross-lingual sentiment classification

  • R. Hallowell

    The relationships of customer satisfaction, customer loyalty, and profitability: an empirical study

    Int J. Serv. Ind. Manage.

    (1996)
  • C. Homburg et al.

    Personal characteristics as moderators of the relationship between customer satisfaction and loyalty—an empirical analysis

    Psychol. Market.

    (2001)
  • H. Takeuchi et al.

    Text mining of business-oriented conversations at a call center

  • Cited by (67)

    • A constrained optimization approach for cross-domain emotion distribution learning

      2021, Knowledge-Based Systems
      Citation Excerpt :

      Ablation experiments validate the effectiveness of the two contributions, i.e., the many-to-many relationships between document clusters and labels, and the content-based constraint. Sentiment analysis, which refers to the inference of users’ views, positions, and attitudes in their written or spoken documents [18–23], plays an increasingly important role in real-life applications [24–30]. Both lexical and learning-based approaches have been utilized for this task [31,32].

    • Twitter sentiment analysis with a deep neural network: An enhanced approach using user behavioral information

      2019, Cognitive Systems Research
      Citation Excerpt :

      Most existing classification algorithms for social media sentiment analysis focus on tweet contents (document-level); for instance (Katz, Ofek, & Shapira, 2015; Khan, Qamar, & Bashir, 2016). As an illustration, let us take the algorithm named ConSent (for Context-based Sentiment Analysis), which is proposed in Katz et al. (2015). ConSent has two phases, learning and detection.

    • A cost-sensitive three-way combination technique for ensemble learning in sentiment classification

      2019, International Journal of Approximate Reasoning
      Citation Excerpt :

      Paltoglou et al. [41] and Kim et al. [42] studied the feature weights by investigating variants weighting functions from information retrieval. Katz et al. [43] presented ConSent, a novel context-based approach which was effective both in noiseless and noisy text. Salvador et al. [44] used meta-learning to combine and enrich several baseline methods (bag of words, n-grams, lexical resource-based classifier) aiming at cross-domain polarity classification.

    View all citing articles on Scopus
    View full text