Sentimental causal rule discovery from Twitter

https://doi.org/10.1016/j.eswa.2014.02.024Get rights and content

Highlights

  • A new information extraction concept is proposed, entitled sentimental causal rules.

  • Causal rules were used to infer more concise summary of the subject case study.

  • Results prove the efficiency of sentimental causal rules on information extraction.

  • The proposed method is applied on a collection of subject related tweets.

  • We investigated the Kurdish political issues in Turkey as the case study.

Abstract

Social media, especially Twitter is now one of the most popular platforms where people can freely express their opinion. However, it is difficult to extract important summary information from many millions of tweets sent every hour. In this work we propose a new concept, sentimental causal rules, and techniques for extracting sentimental causal rules from textual data sources such as Twitter which combine sentiment analysis and causal rule discovery. Sentiment analysis refers to the task of extracting public sentiment from textual data. The value in sentiment analysis lies in its ability to reflect popularly voiced perceptions that are stated in natural language. Causal rules on the other hand indicate associations between different concepts in a context where one (or several concepts) cause(s) the other(s). We believe that sentimental causal rules are an effective summarization mechanism that combine causal relations among different aspects extracted from textual data as well as the sentiment embedded in these causal relationships. In order to show the effectiveness of sentimental causal rules, we have conducted experiments on Twitter data collected on the Kurdish political issue in Turkey which has been an ongoing heated public debate for many years. Our experiments on Twitter data show that sentimental causal rule discovery is an effective method to summarize information about important aspects of an issue in Twitter which may further be used by politicians for better policy making.

Introduction

Social media platforms offer individuals the opportunity to articulate opinions on various topics ranging from consumer products and services to socio-political issues. These opinions are quite useful in various areas such as marketing for managers or policy making for government agencies.

Sentiment analysis aims to extract opinions towards a topic generally from textual data sources. On the other hand causal rules have been proposed in the literature as an information extraction method in the form of causalities among items in a database. In this paper we combine these two notions, i.e. sentiment analysis and causal rules into “sentimental causal rules” where causal rules based on various aspects extracted from textual data sources are associated with sentiment. We also propose sentimental causal rule discovery techniques from textual data sources.

Sentimental causal rules are an efficient information extraction and mechanism summarizing a large set of textual document into a small subset of rules which can be used for decision making. The problem tackled in this paper is building an information extraction and summarization system which motivated us to propose the current methodology. Our main goal is summarizing a large collection of tweets by several (typically less than 100) sentimental causal rules which can easily be studied by human beings. Such a system can be quite useful for companies and policy makers, for example governments maybe interested in knowing what people think about the political issues.

Twitter is a popular microblogging and social networking website with a registered user base of around 650 millions as of 2013. which allows its users to send text messages of at most 140 characters (tweets). Twitter users tweet about everyday subject of life and especially in recent years, for launching political campaigns. Although we applied the mentioned techniques on Twitter, it can be used for other types of data sources.

The rationale for the choice of Twitter as a source of dataset collection lies in its ability to provide a huge number of tweets for the case study of this paper.

In order to demonstrate the effectiveness of sentimental causal rules, we have chosen the Kurdish issue in Turkey as the main topic of Tweets and extracted sentimental causal rules from Twitter on the Kurdish issue in Turkey.

We propose a four-step methodology for sentimental causal rule discovery:

  • The first step is extracting aspect keywords from tweets which will be basis of sentimental causal rules.

  • The second step is extracting causal rules among the extracted aspect keywords which frequently appear in tweets.

  • The third step is to identify the polarity of tweets as positive, negative, or neutral (objective).

  • Finally the forth step assigns polarity values to causal rules based on the aspect keywords in those rules and the polarity of the tweets which support those rules.

Tweets contain aspect keywords related to a topic where these aspect keywords may have polarity which indicates the attitude of the user towards the aspect. Because of the short length of tweets, we assume that the overall polarity of a tweet indicates the polarity of the aspect keywords appeared in it; in other words, the conveyed message by a tweet may include a few aspect keywords, and the message polarity covers the polarity of included aspects.

Our overall methodology for sentimental causal rule discovery is presented in Fig. 1. Two branches in the flowchart indicate two different information extraction tasks – sentiment analysis and causal rule extraction – and the last step in the bottom is combining those tasks to provide a more efficient method.

Although the majority of tweets about the Kurdish political issue in Turkey is in Turkish, in this paper, we have focused only on English tweets to get an idea about international opinion on the matter.

We first extracted aspect keywords from all tweets which were later used in causal rule extraction from Twitter. Our sentiment analysis system labeled tweets as positive, negative, or neutral. After completing the aspect keyword, rule extraction, and sentiment analysis on tweets, we assigned the aspect keyword and rule polarities.

Our main contribution to state of the art is introducing the notion of sentimental causal rules together with a methodology to extract sentimental causal rules from textual data sources. With only pure sentiment analysis on tweet level, the results would be the percentage of positive, negative, or neutral tweets, which itself does not tell much. Also pure causal rule extraction only gives causality relations between different aspects included in a dataset. However, with sentimental causal rules we were able to extract more useful information and see why and in which aspects (and concepts) hold positive or negative opinions by Twitter users. Suggested approach provides summarized information tagged by polarity values. For example “how much an aspect such as Syria or a concept included in a causal rule such as student, Kurds  attack has gained positive or negative sentiment by users on Twitter” is the summary of thousands of tweets.

Section snippets

Background and preliminaries

In this section, basic definitions and concepts are revised before introducing sentimental causal rules.

Association rules are proposed by Agrawal, Imieliński, and Swami (1993) which are the basis of causal rules. Below a special case of association rules (2-to-1 association rules) are defined based on the association rule definition of Agrawal et al. (1993).

Definition 1

Let I={i1,i2,,in} be a set of binary attributes, called items and R={τ1,τ2,,τn} be a set of transactions where each transaction τi

Related work

This section briefly addresses the most relevant research on sentiment analysis on Twitter and also causal rule inference in general.

Sentiment analysis on Twitter has recently attracted many researchers who competed in SemEval-2013 Task 2: sentiment analysis in Twitter (Nakov et al., 2013), a competition for tweet classification. Competitors classified tweets and SMS messages as positive, negative, or neutral. Different systems/models were proposed for the mentioned problem, each of which has

Sentimental causal rules

As mentioned earlier, causal rules provide causality among variables (participated in it). A real example in marketing can be soya, mushroom  ketchup meaning that for people, buying soya and mushroom causes most likely the purchase of ketchup. After assigning polarity values to those rules we call them sentimental causal rules. The formal definition of sentimental causal rules (SCR) is as follows:

Let K={ak1,ak2,,akn} be a set of aspect keywords extracted from a set of tweets T={t1,t2,,tm}. A

Experimental evaluation

In this section, the experimental setup for the proposed methodology (Section 4.2) has been described along with results followed by discussion on results.

Conclusion and future work

In this paper we have tackled the problem of information summarization from large textual data, also incorporating the sentiment towards the summary information. For that purpose, we propose a new concept sentimental causal rules which combines sentiment analysis and causal rules. We have also proposed a methodology for extracting sentimental causal rules from textual data sources.

Sentimental causal rules have many practical applications such as extracting summary information from customer

References (20)

  • L. Cavique

    A scalable algorithm for the market basket analysis

    Journal of Retailing and Consumer Services

    (2007)
  • E. Haddi et al.

    The role of text pre-processing in sentiment analysis

    Procedia Computer Science

    (2013)
  • A. Agarwal et al.

    Sentiment analysis of twitter data

  • R. Agrawal et al.

    Mining association rules between sets of items in large databases

  • G.F. Cooper

    A simple constraint-based algorithm for efficiently mining observational databases for causal relationships

    Data Mining and Knowledge Discovery

    (1997)
  • R. Dehkharghani et al.

    Adaptation and use of subjectivity lexicons for domain dependent sentiment classification

  • Dehkharghani, R., & Yilmaz, C. (2013). Automatically identifying a software product’s quality attributes through...
  • G. Gezici et al.

    Su-sentilab: A classification system for sentiment analysis in twitter

  • Go, A., Bhayani, R., & Huang, L., 2009. Twitter sentiment classification using distant supervision. CS224N Project...
  • Hou, W. -C. (2009). Quality of association rules by chi-squared test. In Encyclopedia of data warehousing and mining...
There are more references available in the full text version of this article.

Cited by (49)

  • A causal direction test for heterogeneous populations

    2022, Machine Learning with Applications
  • A new direction in social network analysis: Online social network analysis problems and applications

    2019, Physica A: Statistical Mechanics and its Applications
    Citation Excerpt :

    In the third step, the polarity of the tweets was defined as positive, negative or neutral (objective). The final stage involved assigning polarization values to causality rules developed on the basis of status keywords in these rules and determining the polarization value of tweets supporting these rules [433]. In the study they proposed, Pretti and Umara [434] used support vector machine (SVM) for sentimental analysis classification and causality relation.

View all citing articles on Scopus
View full text