Skip to main content
Log in

Sarcasm identification in textual data: systematic review, research challenges and open directions

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Sarcasm is a form of sentiment whereby people express the implicit information, usually the opposite of the message content in order to hurt someone emotionally or criticise something in a humorous way. Sarcasm identification in textual data, being one of the hardest challenges in natural language processing (NLP), has recently become an interesting research area due to its importance in improving the sentiment analysis of social media data. A few studies have carried out a comprehensive literature review on sarcasm identification in the existing primary study within the last 11 years. Thus, this study carried out a review on the classification techniques for sarcasm identification under the aspects of datasets, pre-processing, feature engineering, classification algorithms, and performance metrics. The study has considered the published article from the period of 2008 to 2019. Forty (40) academic literature were selected from the 7 standard academic databases in order to carry out the review and realize the objectives. The study revealed that most researchers created their own datasets since there is no standard available datasets in the domain of sarcasm identification. Context and content-based linguistic features were used in most of the studies. This review shows that n-gram and parts of speech tagging techniques were the most commonly used feature extraction techniques. However, binary representation and term frequency were utilized for feature representation whereas Chi squared and information gain were used for the feature selection scheme. Moreover, classification algorithm such as support vector machine, Naïve Bayes, random forest, maximum entropy, and decision tree algorithm were mostly applied using accuracy, precision, recall and F-measure for performance measures. Finally, research challenges and future direction are summarized in this review. This review reveals the impact of sarcasm identification in building effective product reviews and would serve as handle resources for researchers and practitioners in sarcasm identification and text classification in general.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Abercrombie G, Hovy D (2016) Putting sarcasm detection into context: the effects of class imbalance and manual labelling on supervised machine classification of twitter conversations. Paper presented at the Proceedings of the ACL 2016 Student Research Workshop

  • Abulaish M, Kamal A (2018) Self-deprecating sarcasm detection: an amalgamation of rule-based and machine learning approach. Paper presented at the 2018 IEEE/WIC/ACM international conference on web intelligence (WI)

  • Al-Ghadhban, D., Alnkhilan, E., Tatwany, L., & Alrazgan, M. (2017). Arabic sarcasm detection in Twitter. Paper presented at the 2017 International Conference on Engineering & MIS (ICEMIS)

  • Altrabsheh N, Cocea M, Fallahkhair S (2015) Detecting sarcasm from students’ feedback in Twitter. In: Design for teaching and learning in a networked world. Springer, Cham, pp 551–555

  • Amir S, Wallace BC, Lyu H, Silva PCMJ (2016). Modelling context with user embeddings for sarcasm detection in social media. arXiv preprint arXiv:1607.00976

  • Barbieri F, Saggion H, Ronzano F (2014). Modelling sarcasm in twitter, a novel approach. Paper presented at the proceedings of the 5th workshop on computational approaches to subjectivity, sentiment and social media analysis

  • Bharti SK, Babu KS, Jena SK (2015) Parsing-based sarcasm sentiment recognition in Twitter data. Paper presented at the proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015—ASONAM ‘15

  • Bharti S, Vachha B, Pradhan R, Babu K, Jena S (2016) Sarcastic sentiment detection in tweets streamed in real time: a big data approach. Digit Commun Netw 2(3):108–121

    Google Scholar 

  • Bharti SK, Naidu R, Babu KS (2017) Hyperbolic feature-based sarcasm detection in tweets: a machine learning approach. Paper presented at the 2017 14th IEEE india council international conference (INDICON)

  • Bouazizi M, Ohtsuki T (2015a) Opinion mining in Twitter: how to make use of sarcasm to enhance sentiment analysis. Paper presented at the 2015 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM)

  • Bouazizi M, Ohtsuki T (2015b) Sarcasm detection in Twitter: “all your products are incredibly amazing!!!”—are they really? Paper presented at the 2015 IEEE global communications conference (GLOBECOM)

  • Bouazizi M, Ohtsuki TO (2016) A pattern-based approach for sarcasm detection on twitter. IEEE Access 4:5477–5488

    Google Scholar 

  • Burfoot C, Baldwin T (2009) Automatic satire detection: are you having a laugh? Paper presented at the proceedings of the ACL-IJCNLP 2009 conference short papers

  • Cotelo JM, Cruz FL, Troyano JA, Ortega FJ (2015) A modular approach for lexical normalization applied to Spanish tweets. Expert Syst Appl 42(10):4743–4754

    Google Scholar 

  • Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • da Silva NFF, Hruschka ER, Hruschka ER (2014) Tweet sentiment analysis with classifier ensembles. Decis Support Syst 66:170–179. https://doi.org/10.1016/j.dss.2014.07.003

    Article  Google Scholar 

  • Dai Q-Y, Zhang C-P, Wu H (2016) Research of decision tree classification algorithm in data mining. Int J Database Theory Appl 9(5):1–8

    Google Scholar 

  • Davidov D, Tsur O, Rappoport A (2010) Semi-supervised recognition of sarcastic sentences in twitter and amazon. Paper presented at the Proceedings of the fourteenth conference on computational natural language learning

  • Debole F, Sebastiani F (2004) Supervised term weighting for automated text categorization. In: Text mining and its applications. Springer, Berlin, pp 81–97

  • Dharwal P, Choudhury T, Mittal R, Kumar P (2017) Automatic sarcasm detection using feature selection. Paper presented at the 2017 3rd international conference on applied and theoretical computing and communication technology (iCATccT)

  • Dictionary C (2008) Cambridge advanced learner’s dictionary: PONS-Worterbucher. Klett Ernst Verlag GmbH, Stuttgart

    Google Scholar 

  • Dictionary ME, Rundell M (2007) Macmillan English dictionary. Macmillan Education, London

    Google Scholar 

  • Dumais S, Chen H (2000) Hierarchical classification of web content. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, pp 256–263. ACM Press

  • Eke CI, Norman AA, Shuib L, Nweke HF (2019) A survey of user profiling: state-of-the-art, challenges, and solutions. IEEE Access 7:144907–144924. https://doi.org/10.1109/ACCESS.2019.2944243

    Article  Google Scholar 

  • Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181

    MathSciNet  MATH  Google Scholar 

  • Fersini E, Pozzi FA, Messina E (2015) Detecting irony and sarcasm in microblogs: The role of expressive signals and ensemble classifiers. Paper presented at the 2015 IEEE international conference on data science and advanced analytics (DSAA)

  • Filatova E (2012) Irony and sarcasm: corpus generation and analysis using crowdsourcing. Paper presented at the LREC

  • Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3(Mar):1289–1305

    MATH  Google Scholar 

  • Ghosh A, Veale T (2016) Fracking sarcasm using neural network. Paper presented at the proceedings of the 7th workshop on computational approaches to subjectivity, sentiment and social media analysis

  • Ghosh D, Guo W, Muresan S (2015) Sarcastic or not: word embeddings to predict the literal or sarcastic meaning of words. Paper presented at the proceedings of the 2015 conference on empirical methods in natural language processing

  • González-Ibánez R, Muresan S, Wacholder N (2011) Identifying sarcasm in Twitter: a closer look. Paper presented at the proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: short papers, vol 2

  • Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182

    MATH  Google Scholar 

  • Hall MA, Smith LA (1998) Practical feature subset selection for machine learning. pp 181–191

  • He X, Xu S (2010) Process neural networks: theory and applications. Springer, Berlin

    Google Scholar 

  • Hsu CW, Chang CC, Lin CJ (2003) A practical guide to support vector classification technical report department of computer science and information engineering. National Taiwan University, Taipei

  • Joshi A, Tripathi V, Patel K, Bhattacharyya P, Carman M (2016) Are word embedding-based features useful for sarcasm detection? arXiv preprint arXiv:1610.00883

  • Joshi A, Bhattacharyya P, Carman MJ (2017) Automatic sarcasm detection: a survey. ACM Comput Surv CSUR 50(5):73

    Google Scholar 

  • Khattri A, Joshi A, Bhattacharyya P, Carman M (2015) Your sentiment precedes you: using an author’s historical tweets to predict sarcasm. Paper presented at the proceedings of the 6th workshop on computational approaches to subjectivity, sentiment and social media analysis

  • Khodak M, Saunshi N, Vodrahalli K (2017) A large self-annotated corpus for sarcasm. arXiv preprint arXiv:1704.05579

  • Kitchenham B, Brereton OP, Budgen D, Turner M, Bailey J, Linkman S (2009) Systematic literature reviews in software engineering—a systematic literature review. Inf Softw Technol 51(1):7–15

    Google Scholar 

  • Kumar HK, Harish B (2018) Sarcasm classification: a novel approach by using content based feature selection method. Proc Comput Sci 143:378–386

    Google Scholar 

  • Kumar A, Sangwan SR, Arora A, Nayyar A, Abdel-Basset M (2019) Sarcasm detection using soft attention-based bidirectional long short-term memory model with convolution network. IEEE Access 7:23319–23328

    Google Scholar 

  • Kunneman F, Liebrecht C, Van Mulken M, Van den Bosch A (2015) Signaling sarcasm: from hyperbole to hashtag. Inf Process Manage 51(4):500–509

    Google Scholar 

  • Lee H-S, Lee H-R, Park J-U, Han Y-S (2018) An abusive text detection system based on enhanced abusive and non-abusive word lists. Decis Support Syst 113:22–31. https://doi.org/10.1016/j.dss.2018.06.009

    Article  Google Scholar 

  • Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22

    Google Scholar 

  • Liebrecht C, Kunneman F, van Den Bosch A (2013) The perfect solution for detecting sarcasm in tweets# not. In Proceedings of the 4th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp 29–37

  • Ling J, Klinger R (2016) An empirical, quantitative analysis of the differences between sarcasm and irony. Paper presented at the European semantic web conference

  • Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167

    Google Scholar 

  • Liu P, Chen W, Ou G, Wang T, Yang D, Lei K (2014) Sarcasm detection in social media based on imbalanced classification. In: International conference on web-age information management. Springer, Cham, pp 459–471

  • Lunando E, Purwarianti A (2013) Indonesian social media sentiment analysis with sarcasm detection. In: 2013 international conference on advanced computer science and information systems (ICACSIS). IEEE, pp 195–198

  • Manjusha P, Raseek C (2018) Convolutional neural network based simile classification system. Paper presented at the 2018 international conference on emerging trends and innovations in engineering and technological research (ICETIETR)

  • Manohar MY, Kulkarni P (2017) Improvement sarcasm analysis using NLP and corpus based approach. Paper presented at the 2017 international conference on intelligent computing and control systems (ICICCS)

  • McCallum A, Nigam K (1998) A comparison of event models for naive Bayes text classification. Paper presented at the AAAI-98 workshop on learning for text categorization

  • Mehndiratta P, Sachdeva S, Soni D (2017) Detection of sarcasm in text data using deep convolutional neural networks. Scalable Comput Pract Exp 18(3):219–228

    Google Scholar 

  • Mohri M, Rostamizadeh A, Talwalkar A (2012) Foundations of machine learning. MIT Press, Cambridge

    MATH  Google Scholar 

  • Mujtaba G, Shuib L, Raj RG, Majeed N, Al-Garadi MA (2017) Email classification research trends: review and open issues. IEEE Access 5:9044–9064

    Google Scholar 

  • Mujtaba G, Shuib L, Idris N, Hoo WL, Raj RG, Khowaja K et al (2018) Clinical text classification research trends: systematic literature review and open issues. Expert Syst Appl 116:494–520

    Google Scholar 

  • Mukherjee S, Bala PK (2017a) Detecting sarcasm in customer tweets: an NLP based approach. Ind Manag Data Syst 117(6):1109–1126

    Google Scholar 

  • Mukherjee S, Bala PK (2017b) Sarcasm detection in microblogs using Naïve Bayes and fuzzy clustering. Technol Soc 48:19–27. https://doi.org/10.1016/j.techsoc.2016.10.003

    Article  Google Scholar 

  • Muresan S, Gonzalez-Ibanez R, Ghosh D, Wacholder N (2016) Identification of nonliteral language in social media: a case study on sarcasm. J Assoc Inf Sci Technol 67(11):2725–2737

    Google Scholar 

  • Nithya K, Kalaivaani PD, Thangarajan R (2012) An enhanced data mining model for text classification. Paper presented at the 2012 international conference on computing, communication and applications (ICCCA)

  • Nweke HF, Teh YW, Al-Garadi MA, Alo UR (2018) Deep learning algorithms for human activity recognition using mobile and wearable sensor networks: state of the art and research challenges. Expert Syst Appl 105:233–261

    Google Scholar 

  • Patro S, Sahu KK (2015) Normalization: a preprocessing stage. arXiv preprint arXiv:1503.06462

  • Pennebaker JW, Boyd RL, Jordan K, Blackburn K (2015) The development and psychometric properties of LIWC2015

  • Provost FJ, Fawcett T (1997) Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions. Paper presented at the KDD

  • Provost FJ, Fawcett T, Kohavi R (1998) The case against accuracy estimation for comparing induction algorithms. Paper presented at the ICML

  • Ptáček T, Habernal I, Hong J (2014) Sarcasm detection on Czech and English twitter. Paper presented at the proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers

  • Quinlan JR (1990) Decision trees and decision-making. IEEE Trans Syst Man Cybern 20(2):339–346

    Google Scholar 

  • Rajadesingan A, Zafarani R, Liu H (2015a) Sarcasm detection on Twitter. Paper presented at the proceedings of the eighth ACM international conference on web search and data mining—WSDM ‘15

  • Rajadesingan A, Zafarani R, Liu H (2015b) Sarcasm detection on twitter: a behavioral modeling approach. Paper presented at the proceedings of the eighth ACM international conference on web search and data mining

  • Ramos J (2003) Using TF-IDF to determine word relevance in document queries. Paper presented at the proceedings of the first instructional conference on machine learning

  • Ranjan P, Yadav J, Saha S (2017) Proposed approach for sarcasm detection in Twitter. Indian J Sci Technol 10(25):1–8. https://doi.org/10.17485/ijst/2017/v10i25/114443

    Article  Google Scholar 

  • Rennie JD, Shih L, Teevan J, Karger DR (2003) Tackling the poor assumptions of naive Bayes text classifiers. Paper presented at the proceedings of the 20th international conference on machine learning (ICML-03)

  • Reyes A, Rosso P, Buscaldi D (2012) From humor recognition to irony detection: the figurative language of social media. Data Knowl Eng 74:1–12

    Google Scholar 

  • Reyes A, Rosso P, Veale T (2013) A multidimensional approach for detecting irony in twitter. Lang Resour Eval 47(1):239–268

    Google Scholar 

  • Riloff E, Qadir A, Surve P, De Silva L, Gilbert N, Huang R (2013) Sarcasm as contrast between a positive sentiment and negative situation. Paper presented at the proceedings of the 2013 conference on empirical methods in natural language processing

  • Saha S, Yadav J, Ranjan P (2017) Proposed approach for sarcasm detection in twitter. Indian J Sci Technol 10:25

    Google Scholar 

  • Sahami M, Dumais S, Heckerman D, Horvitz E (1998) A Bayesian approach to filtering junk e-mail. Paper presented at the learning for text categorization: papers from the 1998 workshop

  • Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manage 24(5):513–523

    Google Scholar 

  • Salton G, McGill MJ (1986) Introduction to modern information retrieval. Facet Publishing, London

    MATH  Google Scholar 

  • Samonte MJC, Dollete CJT, Capanas PMM, Flores MLC, Soriano CB (2018) Sentence-level sarcasm detection in English and Filipino tweets. Paper presented at the Proceedings of the 4th international conference on industrial and business engineering—ICIBE’ 18. http://delivery.acm.org/10.1145/3290000/3288172/p181-Samonte.pdf?ip=103.18.0.19&id=3288172&acc=ACTIVE%20SERVICE&key=69AF3716A20387ED%2EE7759EC8BE158239%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&__acm__=1562041412_216ad611ed7438dea30eb1738af6b7df. Accessed 24 Oct 2018

  • Schifanella R, de Juan P, Tetreault J, Cao L (2016) Detecting sarcasm in multimodal social platforms. Paper presented at the proceedings of the 2016 ACM on multimedia conference

  • Sintsova V, Pu P (2016) Dystemo. ACM Trans Intell Syst Technol 8(1):1–22. https://doi.org/10.1145/2912147

    Article  Google Scholar 

  • Sreelakshmi K, Rafeeque P (2018) An effective approach for detection of sarcasm in tweets. Paper presented at the 2018 international CET conference on control, communication, and computing (IC4)

  • Strapparava C, Valitutti A (2004) Wordnet affect: an affective extension of wordnet. Paper presented at the LREC

  • Suhaimin MSM, Hijazi MHA, Alfred R, Coenen F (2017) Natural language processing based features for sarcasm detection: an investigation using bilingual social media texts. Paper presented at the 2017 8th international conference on information technology (ICIT)

  • Suhaimin MSM, Hijazi MHA, Alfred R, Coenen F (2018) Mechanism for sarcasm detection and classification in malay social media. Adv Sci Lett 24(2):1388–1392

    Google Scholar 

  • Suhaimin MSM, Hijazi MHA, Alfred R, Coenen F (2019) Modified framework for sarcasm detection and classification in sentiment analysis. Indones J Electr Eng Comput Sci 13(3):1175–1183

    Google Scholar 

  • Sulis E, Farías DIH, Rosso P, Patti V, Ruffo G (2016) Figurative messages and affect in Twitter: differences between# irony,# sarcasm and# not. Knowl-Based Syst 108:132–143

    Google Scholar 

  • Tsur O, Rappoport A (2012) What’s in a hashtag?: content based prediction of the spread of ideas in microblogging communities. Paper presented at the proceedings of the fifth ACM international conference on web search and data mining

  • Tsur O, Davidov D, Rappoport A (2010) ICWSM—a great catchy name: semi-supervised recognition of sarcastic sentences in online product reviews. Paper presented at the fourth international AAAI conference on weblogs and social media

  • van der Aalst WM (2001) Exterminating the dynamic change bug: a concrete approach to support workflow change. Inf Syst Front 3(3):297–317

    Google Scholar 

  • Wang Z, Wu Z, Wang R, Ren Y (2015) Twitter sarcasm detection exploiting a context-based model. Paper presented at the international conference on web information systems engineering

  • Wicana SG, İbisoglu TY, Yavanoglu U (2017) A review on sarcasm detection from machine-learning perspective. Paper presented at the 2017 IEEE 11th international conference on semantic computing (ICSC)

  • Yang M-S (1993) A survey of fuzzy clustering. Math Comput Model 18(11):1–16

    MathSciNet  MATH  Google Scholar 

  • Yang Y (1999) An evaluation of statistical approaches to text categorization. Inf Retrieval 1(1–2):69–90

    Google Scholar 

  • Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. Paper presented at the ICML

  • Yao X (1999) Evolving artificial neural networks. Proc IEEE 87(9):1423–1447

    Google Scholar 

  • Yavanoglu U, Ibisoglu TY, Wıcana SG (2018) Technical review: sarcasm detection algorithms. Int J Semant Comput 12(03):457–478

    Google Scholar 

  • Yee Liau B, Pei Tan P (2014) Gaining customer knowledge in low cost airlines through text mining. Ind Manag Data Syst 114(9):1344–1359

    Google Scholar 

  • Zhang M, Zhang Y, Fu G (2016) Tweet sarcasm detection using deep neural network. Paper presented at the proceedings of COLING 2016, The 26th international conference on computational linguistics: technical papers

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Christopher Ifeanyi Eke or Azah Anir Norman.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: The following abbreviations and their full form were used in this paper

Appendix: The following abbreviations and their full form were used in this paper

Abbreviations

Definitions

Abbreviations

Definitions

AB

Adaboost

ME

Maximum entropy

ACC

Accuracy

MI

Mutual information

ANN

Artificial neural networks

NB

Naïve Bayes

API

Application protocol interface

NLP

Natural language processing

AUC

Area under the curve

POS

Part of speech tagging

BoW

Bag of words

PRE

Precision

BR

Binary representation

RB

Rule base

CNN

Convolutional neural network

REC

Recall

BW

Balanced winnow

RF

Random forest

CUE-CNN

Convolutional user embedding convolutional neural network

RNN

Recurrent neural network

DNN

Deep neural network

SLR

Systematic literature review

DT

Decision tree

SMO

Sequential minimal optimization

FC

Fuzzy clustering

SVM

Support vector machine

F-M

F-measure

TF

Term frequency

FN

False negative

TFIDF

Term frequency with inverse document frequency

FP

False positive

TN

True negative

IG

Information gain

TP

True positive

KS

Kappa statistics

TPR

True positive rate

k-NN

k-nearest neighbours

URL

Universal resource locator

LSTM

Long short term memory

VSF

Visual semantic feature

LR

Logistic regression

  

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Eke, C.I., Norman, A.A., Liyana Shuib et al. Sarcasm identification in textual data: systematic review, research challenges and open directions. Artif Intell Rev 53, 4215–4258 (2020). https://doi.org/10.1007/s10462-019-09791-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-019-09791-8

Keywords

Navigation