Sentimental causal rule discovery from Twitter

doi:10.1016/j.eswa.2014.02.024

Expert Systems with Applications

Volume 41, Issue 10, August 2014, Pages 4950-4958

https://doi.org/10.1016/j.eswa.2014.02.024 Get rights and content

Highlights

•
A new information extraction concept is proposed, entitled sentimental causal rules.
•
Causal rules were used to infer more concise summary of the subject case study.
•
Results prove the efficiency of sentimental causal rules on information extraction.
•
The proposed method is applied on a collection of subject related tweets.
•
We investigated the Kurdish political issues in Turkey as the case study.

Abstract

Social media, especially Twitter is now one of the most popular platforms where people can freely express their opinion. However, it is difficult to extract important summary information from many millions of tweets sent every hour. In this work we propose a new concept, sentimental causal rules, and techniques for extracting sentimental causal rules from textual data sources such as Twitter which combine sentiment analysis and causal rule discovery. Sentiment analysis refers to the task of extracting public sentiment from textual data. The value in sentiment analysis lies in its ability to reflect popularly voiced perceptions that are stated in natural language. Causal rules on the other hand indicate associations between different concepts in a context where one (or several concepts) cause(s) the other(s). We believe that sentimental causal rules are an effective summarization mechanism that combine causal relations among different aspects extracted from textual data as well as the sentiment embedded in these causal relationships. In order to show the effectiveness of sentimental causal rules, we have conducted experiments on Twitter data collected on the Kurdish political issue in Turkey which has been an ongoing heated public debate for many years. Our experiments on Twitter data show that sentimental causal rule discovery is an effective method to summarize information about important aspects of an issue in Twitter which may further be used by politicians for better policy making.

Introduction

Social media platforms offer individuals the opportunity to articulate opinions on various topics ranging from consumer products and services to socio-political issues. These opinions are quite useful in various areas such as marketing for managers or policy making for government agencies.

Sentiment analysis aims to extract opinions towards a topic generally from textual data sources. On the other hand causal rules have been proposed in the literature as an information extraction method in the form of causalities among items in a database. In this paper we combine these two notions, i.e. sentiment analysis and causal rules into “sentimental causal rules” where causal rules based on various aspects extracted from textual data sources are associated with sentiment. We also propose sentimental causal rule discovery techniques from textual data sources.

Sentimental causal rules are an efficient information extraction and mechanism summarizing a large set of textual document into a small subset of rules which can be used for decision making. The problem tackled in this paper is building an information extraction and summarization system which motivated us to propose the current methodology. Our main goal is summarizing a large collection of tweets by several (typically less than 100) sentimental causal rules which can easily be studied by human beings. Such a system can be quite useful for companies and policy makers, for example governments maybe interested in knowing what people think about the political issues.

Twitter is a popular microblogging and social networking website with a registered user base of around 650 millions as of 2013. which allows its users to send text messages of at most 140 characters (tweets). Twitter users tweet about everyday subject of life and especially in recent years, for launching political campaigns. Although we applied the mentioned techniques on Twitter, it can be used for other types of data sources.

The rationale for the choice of Twitter as a source of dataset collection lies in its ability to provide a huge number of tweets for the case study of this paper.

In order to demonstrate the effectiveness of sentimental causal rules, we have chosen the Kurdish issue in Turkey as the main topic of Tweets and extracted sentimental causal rules from Twitter on the Kurdish issue in Turkey.

We propose a four-step methodology for sentimental causal rule discovery:

•
The first step is extracting aspect keywords from tweets which will be basis of sentimental causal rules.
•
The second step is extracting causal rules among the extracted aspect keywords which frequently appear in tweets.
•
The third step is to identify the polarity of tweets as positive, negative, or neutral (objective).
•
Finally the forth step assigns polarity values to causal rules based on the aspect keywords in those rules and the polarity of the tweets which support those rules.

Tweets contain aspect keywords related to a topic where these aspect keywords may have polarity which indicates the attitude of the user towards the aspect. Because of the short length of tweets, we assume that the overall polarity of a tweet indicates the polarity of the aspect keywords appeared in it; in other words, the conveyed message by a tweet may include a few aspect keywords, and the message polarity covers the polarity of included aspects.

Our overall methodology for sentimental causal rule discovery is presented in Fig. 1. Two branches in the flowchart indicate two different information extraction tasks – sentiment analysis and causal rule extraction – and the last step in the bottom is combining those tasks to provide a more efficient method.

Although the majority of tweets about the Kurdish political issue in Turkey is in Turkish, in this paper, we have focused only on English tweets to get an idea about international opinion on the matter.

We first extracted aspect keywords from all tweets which were later used in causal rule extraction from Twitter. Our sentiment analysis system labeled tweets as positive, negative, or neutral. After completing the aspect keyword, rule extraction, and sentiment analysis on tweets, we assigned the aspect keyword and rule polarities.

Our main contribution to state of the art is introducing the notion of sentimental causal rules together with a methodology to extract sentimental causal rules from textual data sources. With only pure sentiment analysis on tweet level, the results would be the percentage of positive, negative, or neutral tweets, which itself does not tell much. Also pure causal rule extraction only gives causality relations between different aspects included in a dataset. However, with sentimental causal rules we were able to extract more useful information and see why and in which aspects (and concepts) hold positive or negative opinions by Twitter users. Suggested approach provides summarized information tagged by polarity values. For example “how much an aspect such as Syria or a concept included in a causal rule such as student, Kurds $\to$ attack has gained positive or negative sentiment by users on Twitter” is the summary of thousands of tweets.

Section snippets

Background and preliminaries

In this section, basic definitions and concepts are revised before introducing sentimental causal rules.

Association rules are proposed by Agrawal, Imieliński, and Swami (1993) which are the basis of causal rules. Below a special case of association rules (2-to-1 association rules) are defined based on the association rule definition of Agrawal et al. (1993).

Definition 1

Let $I = {i_{1}, i_{2}, \dots, i_{n}}$ be a set of binary attributes, called items and $R = {τ_{1}, τ_{2}, \dots, τ_{n}}$ be a set of transactions where each transaction $τ_{i}$

Related work

This section briefly addresses the most relevant research on sentiment analysis on Twitter and also causal rule inference in general.

Sentiment analysis on Twitter has recently attracted many researchers who competed in SemEval-2013 Task 2: sentiment analysis in Twitter (Nakov et al., 2013), a competition for tweet classification. Competitors classified tweets and SMS messages as positive, negative, or neutral. Different systems/models were proposed for the mentioned problem, each of which has

Sentimental causal rules

As mentioned earlier, causal rules provide causality among variables (participated in it). A real example in marketing can be soya, mushroom $\to$ ketchup meaning that for people, buying soya and mushroom causes most likely the purchase of ketchup. After assigning polarity values to those rules we call them sentimental causal rules. The formal definition of sentimental causal rules $(SCR)$ is as follows:

Let $K = {{ak}_{1}, {ak}_{2}, \dots, {ak}_{n}}$ be a set of aspect keywords extracted from a set of tweets $T = {t_{1}, t_{2}, \dots, t_{m}}$ . A

Experimental evaluation

In this section, the experimental setup for the proposed methodology (Section 4.2) has been described along with results followed by discussion on results.

Conclusion and future work

In this paper we have tackled the problem of information summarization from large textual data, also incorporating the sentiment towards the summary information. For that purpose, we propose a new concept sentimental causal rules which combines sentiment analysis and causal rules. We have also proposed a methodology for extracting sentimental causal rules from textual data sources.

Sentimental causal rules have many practical applications such as extracting summary information from customer

References (20)

L. Cavique
A scalable algorithm for the market basket analysis
Journal of Retailing and Consumer Services
(2007)
E. Haddi et al.
The role of text pre-processing in sentiment analysis
Procedia Computer Science
(2013)
A. Agarwal et al.
Sentiment analysis of twitter data
R. Agrawal et al.
Mining association rules between sets of items in large databases
G.F. Cooper
A simple constraint-based algorithm for efficiently mining observational databases for causal relationships
Data Mining and Knowledge Discovery
(1997)
R. Dehkharghani et al.
Adaptation and use of subjectivity lexicons for domain dependent sentiment classification
Dehkharghani, R., & Yilmaz, C. (2013). Automatically identifying a software product’s quality attributes through...
G. Gezici et al.
Su-sentilab: A classification system for sentiment analysis in twitter
Go, A., Bhayani, R., & Huang, L., 2009. Twitter sentiment classification using distant supervision. CS224N Project...
Hou, W. -C. (2009). Quality of association rules by chi-squared test. In Encyclopedia of data warehousing and mining...

There are more references available in the full text version of this article.

Cited by (49)

A causal direction test for heterogeneous populations
2022, Machine Learning with Applications
A probabilistic expert system emulates the decision-making ability of a human expert through a directional graphical model. The first step in building such systems is to understand data generation mechanism. To this end, one may try to decompose a multivariate distribution into product of several conditionals, and evolving a blackbox machine learning predictive models towards transparent cause-and-effect discovery. Most causal models assume a single homogeneous population, an assumption that may fail to hold in many applications. We show that when the homogeneity assumption is violated, causal models developed based on such assumption can fail to identify the correct causal direction. We propose an adjustment to a commonly used causal direction test statistic by using a $k$ -means type clustering algorithm where both the labels and the number of components are estimated from the collected data to adjust the test statistic. Our simulation result show that the proposed adjustment significantly improves the performance of the causal direction test statistic for heterogeneous data. We study large sample behaviour of our proposed test statistic and demonstrate the application of the proposed method using real data.
A new direction in social network analysis: Online social network analysis problems and applications
2019, Physica A: Statistical Mechanics and its Applications
Citation Excerpt :
In the third step, the polarity of the tweets was defined as positive, negative or neutral (objective). The final stage involved assigning polarization values to causality rules developed on the basis of status keywords in these rules and determining the polarization value of tweets supporting these rules [433]. In the study they proposed, Pretti and Umara [434] used support vector machine (SVM) for sentimental analysis classification and causality relation.
The use of online social networks has made significant progress in recent years as the use of the Internet has become widespread worldwide as the technological infrastructure and the use of technological products evolve. It has become more suitable to reach online social networking sites such as Facebook, Twitter, Instagram and LinkedIn via the internet and web 3.0 technologies. Thus, people have shared their views on many different topics and their emotions with other users more widely on these platforms. This means that a huge amount of data is created on platforms where millions of people connect with each other through social networks. Nevertheless, the development of computational paradigms at high speed and complexity with technological possibilities allows analysis of valuable data by means of social network analysis methods. Our goal for this paper is to present a review of novel and popular online social network analysis problems with related applications and a reference work for researchers interested in analyzing online social network data and social network problems. Unlike other individual studies we have gathered 21 online social network problems and defined them with related studies. Thus, this study is original by presenting an important source of research by explaining the problems of online social network and the studies performed in this area.
A survey of the extraction and applications of causal relations
2022, Natural Language Engineering
ComStreamClust: a Communicative Multi-Agent Approach to Text Clustering in Streaming Data
2023, Annals of Data Science
Twitter sentiment analysis using support vector machine and deep learning model in e-learning implementation during the Covid-19 outbreak
2023, AIP Conference Proceedings
A survey on the use of association rules mining techniques in textual social media
2023, Artificial Intelligence Review

View all citing articles on Scopus

View full text

Sentimental causal rule discovery from Twitter

Highlights

Abstract

Introduction

Section snippets

Background and preliminaries

Related work

Sentimental causal rules

Experimental evaluation

Conclusion and future work

Journal of Retailing and Consumer Services

Procedia Computer Science

Sentiment analysis of twitter data

Mining association rules between sets of items in large databases

A simple constraint-based algorithm for efficiently mining observational databases for causal relationships

Data Mining and Knowledge Discovery

Adaptation and use of subjectivity lexicons for domain dependent sentiment classification

Su-sentilab: A classification system for sentiment analysis in twitter