An Improved Approach for Sarcasm Detection Avoiding Null Tweets

Bharti, Santosh Kumar; Babu, Korra Sathya; Mishra, Sambit Kumar

doi:10.1007/978-3-030-34872-4_29

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11942))

Included in the following conference series:

International Conference on Pattern Recognition and Machine Intelligence

1319 Accesses

Abstract

Among the plethora of social media, Twitter has emerged as the favorite destination for researchers in recent times. Many researchers are inclined to work on Twitter due to the availability of massive tweets and its unique features like hashtags and short messages. In recent times, various studies have preferred the hashtags (#sarcasm and #sarcastic) to collect Twitter dataset for sarcasm detection. However, hashtag-based distant supervision suffers from the problem of the inclusion of null tweets in the datasets which can be considered as a critical one for sarcasm detection. In this article, an algorithm is proposed for automatic detection and filtration of null tweets in the Twitter data. Additionally, an algorithm to identify sarcastic tweets using context within a tweet is also proposed. This approach use dictionaries of handpicked hashtag words, emoticons as the context within a tweet. Finally, we deployed a rule-based algorithm to analyse the performance of the proposed approach. The proposed approach attains the accuracy of 97.3% (after filtering null tweets) and 83.13% (without filtering null tweets) using a rule-based approach. The attained results conclude that after elimination of null tweets, the performance of the proposed system improved significantly.

You have full access to this open access chapter, Download conference paper PDF

Sarcasm Annotation and Detection in Tweets

Sarcasm and Irony Detection in English Tweets

Sarcasm Detection Using Features Based on Indicator and Roles

Keywords

Adoption of business tactics based on the reviews and discussions in social media has gained much importance in recent times. The sentiments of the stakeholders are taken into consideration during planning and decision making. Sentiments can be classified as positive, negative, and neutral. The algorithms available for sentiment analysis focus mostly on availability of positive deviated words or phrases. Majority of sentiment analysis algorithms are likely to fail in case of the presence of sarcasm in the text. The presence of sarcasm in the textual data such as tweets, reviews and discussions pose challenges to the automated systems for identifying actual sentiment [10]. In the textual data, detection of sarcasm is tough due to the lack of intonation and facial expressions. In fact, according to the BBC report, the U.S. Secret Service was looking for a software system that could detect sarcasm in social media data [2]. Therefore, an automated system is required for sarcasm detection in the text.

Due to the restriction on tweet’s length (140 characters), users’ often use symbolic notations such as smilies, emoticons, @User, etc. to accommodate more information. While posting a tweet, people often include videos, images, #hashtag, etc. along with text to indicate context of text in tweet. These context shows more visual information which cannot be demonstrated through text. To make the tweet as self-explanatory, few more features are added by the users, such as #trending, @User, RT, etc. These features of tweets make it unique over other social media text. According to Davidov et al. [7], around 20–25% of tweets are falls into one of the following three categories after downloading it from Twitter.

1.
A tiny tweet having a length upto three or four words.
2.
A tweet contains only handles, i.e., @, RT, URL, and #tag.
3.
An indirect tweet which depends on either videos or images to conveys the theme of the tweet.

In this article, these tweets are referred to as null tweets which often miss the important features within a tweet in the context of sarcasm detection. The context within a tweet may be topical, historical, temporal, or situational [1, 8, 9, 13]. The conventional method of collecting sarcastic tweets is hashtag based distant supervision using #sarcasm and #sarcastic.

Some of the past studies on sarcasm detection in tweets are based on context. They have used features like relationships, the chain of conversations, inter-sentential incongruity and embedded multimedia posts [1, 8, 9, 11, 13]. To identify sarcasm based on this context, they require additional information such as the user‘s profile, chat history, the cohesion of sentences, etc. These context-based approaches are likely to fail in the identification of a sarcastic tweet when a single tweet is given for detecting the context. This article exploits the context within a tweet and proposed a rule-based approach using a list of manually collected hashtag words and emoticons as shown in Table 1 that play a role of the context within a tweet to identify sarcasm. These hashtag words and emoticons are usually appended by the user at the end of the tweet to indicate the topical, situational context.

Table 1. Dictionary used for sarcasm detection in tweets.

Full size table

In Table 1, the hashtag words and emoticons act as a guiding factor for polarizing the orientation of the tweet. It can be considered as a context of the tweet. For example: “Super easy to focus at work today #kidding”. In this example, the tweet sentiment seems positive, but due to “#kidding” being appended at the end of the tweet, it acts as a context here for this particular tweet. Due to the hashtag appended at the end, the sentiment of the tweet flips to negative. It indicates that the user had written the tweet intentionally to make the tweet as sarcastic. Similarly, a sample list of such sarcastic tweets is given in Table 2. These sarcastic tweets are based on “negation words”, “hashtag words” and “emoticons” dictionaries as shown in Table 1. In Table 2, the “negation words”, “hashtag words” and “emoticons” are indicated in underlined italics.

Table 2. Hashtag and Negation word based Sarcastic Tweets

Full size table

The contribution of this article is as follows:

1.
Proposed an algorithm to detect and eliminate null tweets automatically.
2.
Proposed an algorithm to detect sarcastic tweets using manually collected dictionary words includes hashtag words, negation words and emoticons as shown in Table 1.
3.
Experimented the proposed approach and observe that after eliminating null tweets, the performance of sarcasm detection system improves significantly in some of the existing system as well.

The rest of this paper is organized as follows. Section 1 presents related work on sarcasm detection in Twitter data. The proposed scheme has been described in Sect. 2. Section 3 presents the performance analysis of the proposed schemes. Finally, the conclusions are drawn in Sect. 4.

1 Related Work

In the current era, the research on sarcasm detection in text is grown rapidly [1, 3,4,5,6,7,8,9, 11,12,13]. The objective of research emphasizes on analyzing sentiment in text data in the presence of sarcasm. The content of social media such as tweets often carries sarcasm. Sarcasm detection techniques in the past emphasize on several classification techniques. The complete classification of sarcasm detection techniques is shown in Fig. 1.

This article focused on one the supervised method given in Fig. 1, i.e., context-based sarcasm detection. This method is used in several text contexts in literature, such as topical, situational, relational, historical, etc.

1.1 Context-Based Approach

The relationship between an author and audience followed by the immediate communicative context can be helpful to improve the sarcasm prediction accuracy [1]. Message-level sarcasm detection on Twitter using a context-based model were used for sarcasm detection [13]. A framework based on the linguistic theory of context incongruity and an introduction of inter-sentential incongruity by considering the history of the posts in the discussion thread was considered for sarcasm detection [8]. A quantitative evidence of historical tweets of an author can provide additional context for sarcasm detection [9]. The author‘s past sentiment on the entities in a tweet was exploited to detect the sarcastic intent. Chains of tweets that work in a context were considered. They introduce a complex classification model that works over an entire tweet sequence and not on one tweet at a time. Integration between linguistic and contextual features extracted from the analysis of visuals embedded in multimodal posts was deployed for sarcasm detection [11].

2 Proposed Scheme

This section describes process of null tweets elimination as tweet filtration followed by sarcastic sentiment detection in filtered tweets using negation word, emoticons and hashtag word dictionaries.

2.1 Null Tweets Detection and Filtration

Preprocessing is an important step during the process of sarcastic sentiment detection in Twitter data. In the conventional method of preprocessing, one usually eliminates the trending information, i.e., hashtag, URL of videos and images, re-tweets, uppercase word to lower case word conversion and @user information, etc. For example, a given tweet “yeah, right! #sarcasm!” will look after conventional preprocessing “yeah, right!”. However, this article performs an additional preprocessing step, and the detail is discussed in Algorithm 1.

According to Algorithm 1, the tweet “yeah, right!” is considered as a null tweet. So, it is eliminated to enhance the accuracy of the proposed system. The Algorithm 1 shows the procedure for automatic detection and elimination of null tweets in the tweets corpus. Algorithm 1 takes tweet corpus (C) as input and tokenizes each tweet and stores in the list of tokens (LOT) file. It also counts the total number of tokens in each tweet. If the length count is less than or equal to three, then the given tweet is a null tweet and is discarded. Otherwise, if any token in LOT starts with HTTP://, then the given tweet is null tweet as it depends on some other source to conveys the meaning of the tweet (such tweets are called referred tweets). Similarly, if all the tokens in LOT contain only handles such as @, #tag, RT and no text, then the given tweet is a null tweet. If a tweet does not follow any of the three conditions as given in Algorithm 1, then the tweet is a valid tweet for sarcasm analysis and is stored in the list of filtered tweets (LOFT) file.

2.2 Proposed Algorithm for Sarcasm Detection

The proposed approach is based on the context within a tweet that is extracted from the three dictionaries namely, negation words, hashtag words, and emoticons as shown in Table 1. The negation words are capable of inverting the polarity of any word by appending it as a prefix whereas, the hashtag words dictionary is capable of flipping the polarity of the entire sentence by appending it as a suffix. For an instance of the negation word: “not happy”, happy is known as a positive word. As we append ‘not’ as the prefix of happy, the polarity of happy becomes negative. Similarly, an instance of hashtag word: “This is a perfect solution for sarcasm detection #not”. Here, without hashtag word “#not”, the sentiment of this tweet is positive. However, after appending “#not” at the end of the tweet, the overall tweet’s polarity inverted from positive to negative. Finally, emoticons are also capable of reversing the tweets polarity. For instance, “I see the diet is going well!”. The tweet seems positive, but emoticons made it sarcastic. Therefore, according to the Macmillan English Dictionary for sarcasm, “#not” is capable of flipping the meaning of a text. Hence, it plays a role of context to make the tweet as sarcastic. These hashtag words are capable of making a sentence sarcastic under certain constraints.

Table 3. Manually annotated dataset for testing

Full size table

The process of identifying a sarcastic tweet is given in Algorithm 2. It explains the step-wise procedure for sarcasm detection in a single filtered tweet based on the context of hashtags words and emoticons dictionaries. Algorithm 2 takes filtered tweets (LOFT) as input, and determines the sentiment value of each tweet and stored it in \(\delta \). Subsequently, the last bunch of hashtag words is stored in \(\lambda \) which is usually appended after the text part of the tweet. If any hashtag word appears within the text part, then the algorithm remove the hash symbol and treat as a word. Further, it tokenizes the given tweet and checks for the presence of negation words. If a negation word is found, then flip the sentiment value of the corresponding tweet and look for new sentiment value. If the new sentiment value is positive or neutral and any \(\lambda \) value is present in hashtag dictionary, then the given tweet is classified as sarcastic, otherwise, non-sarcastic. If negation word is not found in the tweet’s text part and the sentiment value of the tweet is positive and any \(\lambda \) value present in hashtag dictionary, then the tweet is classified as sarcastic otherwise, it is considered as non-sarcastic.

3 Experimental Results

In this work, an experiment was evaluated with three statistical parameters namely, precision, recall, and F1score. It starts with dataset annotation followed by performance analysis using confusion matrix.

3.1 Dataset Collection and Annotation

In this article, we collected and annotated 3000 tweets manually from Twitter using various hashtags, negation words, and emoticons given in Table 1. The manually annotated dataset (MADS) is shown in Table 3. We observed that 437 tweets are unpredictable during annotation. Most of the unpredictable tweets were missing the context of sarcasm meets the criteria of null tweet, which are treated as null tweets.

3.2 Performance Analysis

The performance of the proposed algorithm for sarcasm detection was analyzed using rule-based classification. The experimental results of the proposed algorithm is given in Tables 4 and 5 respectively. Table 4 describes the confusion matrix for error analysis and Table 5 shows the attained precision, recall, and F1-score.

Table 4. Confusion matrix of proposed approaches for error analysis

Full size table

Table 5. Compared precision, recall, and F1-score of proposed approaches with some of the existing work.

Full size table

4 Conclusion

This article deals with two things. First is the process to detect and eliminate null tweets which can be considered as a preprocessing step. Here, tweets that act as noisy in the dataset are eliminated. Secondly, a sarcasm detection algorithm based on context within a tweet was proposed. The properties of hashtags and emoticons were exploited as the context in a tweet to be identified as sarcastic. The proposed algorithm was implemented and evaluated with and without the presence of null tweets. Some of the state-of-arts algorithms for sarcasm detection were also evaluated in the same line. It is observed that the sarcasm detection algorithms perform better after filtration of null tweets in the dataset.

References

Bamman, D., Smith, N.A.: Contextualized sarcasm detection on twitter. In: Ninth International AAAI Conference on Web and Social Media (2015)
Google Scholar
BBC: Us secret service seeks twitter sarcasm detector. http://www.bbc.com/news/technology-27711109/ (2014)
Bharti, S.K., Babu, K.S.: Sarcasm as a contradiction between a tweet and its temporal facts: a pattern-based approach. Int. J. Nat. Lang. Comput. (IJNLC) 7, 67–79 (2018)
Article Google Scholar
Bharti, S.K., Sathya Babu, K., Jena, S.K.: Harnessing online news for sarcasm detection in Hindi Tweets. In: Shankar, B.U., Ghosh, K., Mandal, D.P., Ray, S.S., Zhang, D., Pal, S.K. (eds.) PReMI 2017. LNCS, vol. 10597, pp. 679–686. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69900-4_86
Chapter Google Scholar
Bharti, S.K., Babu, K.S., Raman, R.: Context-based sarcasm detection in Hindi Tweets. In: 2017 Ninth International Conference on Advances in Pattern Recognition (ICAPR), pp. 1–6. IEEE (2017)
Google Scholar
Bharti, S., Vachha, B., Pradhan, R., Babu, K., Jena, S.: Sarcastic sentiment detection in tweets streamed in real time: a big data approach. Digit. Commun. Netw. 2(3), 108–121 (2016)
Article Google Scholar
Davidov, D., Tsur, O., Rappoport, A.: Semi-supervised recognition of sarcastic sentences in Twitter and Amazon. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning, pp. 107–116. Association for Computational Linguistics (2010)
Google Scholar
Joshi, A., Sharma, V., Bhattacharyya, P.: Harnessing context incongruity for sarcasm detection. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 2, pp. 757–762 (2015)
Google Scholar
Khattri, A., Joshi, A., Bhattacharyya, P., Carman, M.J.: Your sentiment precedes you: using an author’s historical tweets to predict sarcasm. In: 6th Workshop on Computation Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA) 2015, p. 25 (2015)
Google Scholar
Maynard, D., Greenwood, M.A.: Who cares about sarcastic tweets? Investigating the impact of sarcasm on sentiment analysis. In: LREC 2014 Proceedings. ELRA (2014)
Google Scholar
Schifanella, R., de Juan, P., Tetreault, J., Cao, L.: Detecting sarcasm in multimodal social platforms. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 1136–1145. ACM (2016)
Google Scholar
Tungthamthiti, P., Kiyoaki, S., Mohd, M.: Recognition of sarcasms in tweets based on concept level sentiment analysis and supervised learning approaches. In: Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing, pp. 404–413 (2014)
Google Scholar
Wang, Z., Wu, Z., Wang, R., Ren, Y.: Twitter sarcasm detection exploiting a context-based model. In: Wang, J., et al. (eds.) WISE 2015. LNCS, vol. 9418, pp. 77–91. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26190-4_6
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Pandit Deendayal Petroleum University, Gandhinagar, 382007, Gujarat, India
Santosh Kumar Bharti
National Institute of Technology, Rourkela, 769008, Odisha, India
Korra Sathya Babu
ITER, SOA University, Bhubaneswar, Odisha, India
Sambit Kumar Mishra

Authors

Santosh Kumar Bharti
View author publications
You can also search for this author in PubMed Google Scholar
Korra Sathya Babu
View author publications
You can also search for this author in PubMed Google Scholar
Sambit Kumar Mishra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Santosh Kumar Bharti .

Editor information

Editors and Affiliations

Tezpur University, Tezpur, India
Bhabesh Deka
Indian Statistical Institute, Kolkata, India
Pradipta Maji
Indian Statistical Institute, Kolkata, India
Sushmita Mitra
Tezpur University, Tezpur, India
Dhruba Kumar Bhattacharyya
Indian Institute of Technology Guwahati, Guwahati, India
Prabin Kumar Bora
Indian Statistical Institute, Kolkata, India
Sankar Kumar Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bharti, S.K., Babu, K.S., Mishra, S.K. (2019). An Improved Approach for Sarcasm Detection Avoiding Null Tweets. In: Deka, B., Maji, P., Mitra, S., Bhattacharyya, D., Bora, P., Pal, S. (eds) Pattern Recognition and Machine Intelligence. PReMI 2019. Lecture Notes in Computer Science(), vol 11942. Springer, Cham. https://doi.org/10.1007/978-3-030-34872-4_29

Download citation

DOI: https://doi.org/10.1007/978-3-030-34872-4_29
Published: 25 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34871-7
Online ISBN: 978-3-030-34872-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

An Improved Approach for Sarcasm Detection Avoiding Null Tweets

Abstract

Similar content being viewed by others

Sarcasm Annotation and Detection in Tweets

Sarcasm and Irony Detection in English Tweets

Sarcasm Detection Using Features Based on Indicator and Roles

Keywords