Keywords

Adoption of business tactics based on the reviews and discussions in social media has gained much importance in recent times. The sentiments of the stakeholders are taken into consideration during planning and decision making. Sentiments can be classified as positive, negative, and neutral. The algorithms available for sentiment analysis focus mostly on availability of positive deviated words or phrases. Majority of sentiment analysis algorithms are likely to fail in case of the presence of sarcasm in the text. The presence of sarcasm in the textual data such as tweets, reviews and discussions pose challenges to the automated systems for identifying actual sentiment [10]. In the textual data, detection of sarcasm is tough due to the lack of intonation and facial expressions. In fact, according to the BBC report, the U.S. Secret Service was looking for a software system that could detect sarcasm in social media data [2]. Therefore, an automated system is required for sarcasm detection in the text.

Due to the restriction on tweet’s length (140 characters), users’ often use symbolic notations such as smilies, emoticons, @User, etc. to accommodate more information. While posting a tweet, people often include videos, images, #hashtag, etc. along with text to indicate context of text in tweet. These context shows more visual information which cannot be demonstrated through text. To make the tweet as self-explanatory, few more features are added by the users, such as #trending, @User, RT, etc. These features of tweets make it unique over other social media text. According to Davidov et al. [7], around 20–25% of tweets are falls into one of the following three categories after downloading it from Twitter.

  1. 1.

    A tiny tweet having a length upto three or four words.

  2. 2.

    A tweet contains only handles, i.e., @, RT, URL, and #tag.

  3. 3.

    An indirect tweet which depends on either videos or images to conveys the theme of the tweet.

In this article, these tweets are referred to as null tweets which often miss the important features within a tweet in the context of sarcasm detection. The context within a tweet may be topical, historical, temporal, or situational [1, 8, 9, 13]. The conventional method of collecting sarcastic tweets is hashtag based distant supervision using #sarcasm and #sarcastic.

Some of the past studies on sarcasm detection in tweets are based on context. They have used features like relationships, the chain of conversations, inter-sentential incongruity and embedded multimedia posts [1, 8, 9, 11, 13]. To identify sarcasm based on this context, they require additional information such as the user‘s profile, chat history, the cohesion of sentences, etc. These context-based approaches are likely to fail in the identification of a sarcastic tweet when a single tweet is given for detecting the context. This article exploits the context within a tweet and proposed a rule-based approach using a list of manually collected hashtag words and emoticons as shown in Table 1 that play a role of the context within a tweet to identify sarcasm. These hashtag words and emoticons are usually appended by the user at the end of the tweet to indicate the topical, situational context.

Table 1. Dictionary used for sarcasm detection in tweets.

In Table 1, the hashtag words and emoticons act as a guiding factor for polarizing the orientation of the tweet. It can be considered as a context of the tweet. For example: “Super easy to focus at work today #kidding”. In this example, the tweet sentiment seems positive, but due to “#kidding” being appended at the end of the tweet, it acts as a context here for this particular tweet. Due to the hashtag appended at the end, the sentiment of the tweet flips to negative. It indicates that the user had written the tweet intentionally to make the tweet as sarcastic. Similarly, a sample list of such sarcastic tweets is given in Table 2. These sarcastic tweets are based on “negation words”, “hashtag words” and “emoticons” dictionaries as shown in Table 1. In Table 2, the “negation words”, “hashtag words” and “emoticons” are indicated in underlined italics.

Table 2. Hashtag and Negation word based Sarcastic Tweets

The contribution of this article is as follows:

  1. 1.

    Proposed an algorithm to detect and eliminate null tweets automatically.

  2. 2.

    Proposed an algorithm to detect sarcastic tweets using manually collected dictionary words includes hashtag words, negation words and emoticons as shown in Table 1.

  3. 3.

    Experimented the proposed approach and observe that after eliminating null tweets, the performance of sarcasm detection system improves significantly in some of the existing system as well.

The rest of this paper is organized as follows. Section 1 presents related work on sarcasm detection in Twitter data. The proposed scheme has been described in Sect. 2. Section 3 presents the performance analysis of the proposed schemes. Finally, the conclusions are drawn in Sect. 4.

1 Related Work

In the current era, the research on sarcasm detection in text is grown rapidly [1, 3,4,5,6,7,8,9, 11,12,13]. The objective of research emphasizes on analyzing sentiment in text data in the presence of sarcasm. The content of social media such as tweets often carries sarcasm. Sarcasm detection techniques in the past emphasize on several classification techniques. The complete classification of sarcasm detection techniques is shown in Fig. 1.

Fig. 1.
figure 1

Various classification approaches used for sarcasm detection in text.

This article focused on one the supervised method given in Fig. 1, i.e., context-based sarcasm detection. This method is used in several text contexts in literature, such as topical, situational, relational, historical, etc.

1.1 Context-Based Approach

The relationship between an author and audience followed by the immediate communicative context can be helpful to improve the sarcasm prediction accuracy [1]. Message-level sarcasm detection on Twitter using a context-based model were used for sarcasm detection [13]. A framework based on the linguistic theory of context incongruity and an introduction of inter-sentential incongruity by considering the history of the posts in the discussion thread was considered for sarcasm detection [8]. A quantitative evidence of historical tweets of an author can provide additional context for sarcasm detection [9]. The author‘s past sentiment on the entities in a tweet was exploited to detect the sarcastic intent. Chains of tweets that work in a context were considered. They introduce a complex classification model that works over an entire tweet sequence and not on one tweet at a time. Integration between linguistic and contextual features extracted from the analysis of visuals embedded in multimodal posts was deployed for sarcasm detection [11].

2 Proposed Scheme

This section describes process of null tweets elimination as tweet filtration followed by sarcastic sentiment detection in filtered tweets using negation word, emoticons and hashtag word dictionaries.

2.1 Null Tweets Detection and Filtration

Preprocessing is an important step during the process of sarcastic sentiment detection in Twitter data. In the conventional method of preprocessing, one usually eliminates the trending information, i.e., hashtag, URL of videos and images, re-tweets, uppercase word to lower case word conversion and @user information, etc. For example, a given tweet “yeah, right! #sarcasm!” will look after conventional preprocessing “yeah, right!”. However, this article performs an additional preprocessing step, and the detail is discussed in Algorithm 1.

figure a

According to Algorithm 1, the tweet “yeah, right!” is considered as a null tweet. So, it is eliminated to enhance the accuracy of the proposed system. The Algorithm 1 shows the procedure for automatic detection and elimination of null tweets in the tweets corpus. Algorithm 1 takes tweet corpus (C) as input and tokenizes each tweet and stores in the list of tokens (LOT) file. It also counts the total number of tokens in each tweet. If the length count is less than or equal to three, then the given tweet is a null tweet and is discarded. Otherwise, if any token in LOT starts with HTTP://, then the given tweet is null tweet as it depends on some other source to conveys the meaning of the tweet (such tweets are called referred tweets). Similarly, if all the tokens in LOT contain only handles such as @, #tag, RT and no text, then the given tweet is a null tweet. If a tweet does not follow any of the three conditions as given in Algorithm 1, then the tweet is a valid tweet for sarcasm analysis and is stored in the list of filtered tweets (LOFT) file.

2.2 Proposed Algorithm for Sarcasm Detection

The proposed approach is based on the context within a tweet that is extracted from the three dictionaries namely, negation words, hashtag words, and emoticons as shown in Table 1. The negation words are capable of inverting the polarity of any word by appending it as a prefix whereas, the hashtag words dictionary is capable of flipping the polarity of the entire sentence by appending it as a suffix. For an instance of the negation word: “not happy”, happy is known as a positive word. As we append ‘not’ as the prefix of happy, the polarity of happy becomes negative. Similarly, an instance of hashtag word: “This is a perfect solution for sarcasm detection #not”. Here, without hashtag word “#not”, the sentiment of this tweet is positive. However, after appending “#not” at the end of the tweet, the overall tweet’s polarity inverted from positive to negative. Finally, emoticons are also capable of reversing the tweets polarity. For instance, “I see the diet is going well!”. The tweet seems positive, but emoticons made it sarcastic. Therefore, according to the Macmillan English Dictionary for sarcasm, “#not” is capable of flipping the meaning of a text. Hence, it plays a role of context to make the tweet as sarcastic. These hashtag words are capable of making a sentence sarcastic under certain constraints.

figure b
Table 3. Manually annotated dataset for testing

The process of identifying a sarcastic tweet is given in Algorithm 2. It explains the step-wise procedure for sarcasm detection in a single filtered tweet based on the context of hashtags words and emoticons dictionaries. Algorithm 2 takes filtered tweets (LOFT) as input, and determines the sentiment value of each tweet and stored it in \(\delta \). Subsequently, the last bunch of hashtag words is stored in \(\lambda \) which is usually appended after the text part of the tweet. If any hashtag word appears within the text part, then the algorithm remove the hash symbol and treat as a word. Further, it tokenizes the given tweet and checks for the presence of negation words. If a negation word is found, then flip the sentiment value of the corresponding tweet and look for new sentiment value. If the new sentiment value is positive or neutral and any \(\lambda \) value is present in hashtag dictionary, then the given tweet is classified as sarcastic, otherwise, non-sarcastic. If negation word is not found in the tweet’s text part and the sentiment value of the tweet is positive and any \(\lambda \) value present in hashtag dictionary, then the tweet is classified as sarcastic otherwise, it is considered as non-sarcastic.

3 Experimental Results

In this work, an experiment was evaluated with three statistical parameters namely, precisionrecall, and F1score. It starts with dataset annotation followed by performance analysis using confusion matrix.

3.1 Dataset Collection and Annotation

In this article, we collected and annotated 3000 tweets manually from Twitter using various hashtags, negation words, and emoticons given in Table 1. The manually annotated dataset (MADS) is shown in Table 3. We observed that 437 tweets are unpredictable during annotation. Most of the unpredictable tweets were missing the context of sarcasm meets the criteria of null tweet, which are treated as null tweets.

3.2 Performance Analysis

The performance of the proposed algorithm for sarcasm detection was analyzed using rule-based classification. The experimental results of the proposed algorithm is given in Tables 4 and 5 respectively. Table 4 describes the confusion matrix for error analysis and Table 5 shows the attained precision, recall, and F1-score.

Table 4. Confusion matrix of proposed approaches for error analysis
Table 5. Compared precision, recall, and F1-score of proposed approaches with some of the existing work.

4 Conclusion

This article deals with two things. First is the process to detect and eliminate null tweets which can be considered as a preprocessing step. Here, tweets that act as noisy in the dataset are eliminated. Secondly, a sarcasm detection algorithm based on context within a tweet was proposed. The properties of hashtags and emoticons were exploited as the context in a tweet to be identified as sarcastic. The proposed algorithm was implemented and evaluated with and without the presence of null tweets. Some of the state-of-arts algorithms for sarcasm detection were also evaluated in the same line. It is observed that the sarcasm detection algorithms perform better after filtration of null tweets in the dataset.