Sarcasm Detection in Social Media Based on Imbalanced Classification

Liu, Peng; Chen, Wei; Ou, Gaoyan; Wang, Tengjiao; Yang, Dongqing; Lei, Kai

doi:10.1007/978-3-319-08010-9_49

Sarcasm Detection in Social Media Based on Imbalanced Classification

Peng Liu^20,21,22,
Wei Chen^20,21,
Gaoyan Ou^20,21,
Tengjiao Wang^20,21,
Dongqing Yang^20,21 &
…
Kai Lei²²

Conference paper

6100 Accesses
31 Citations
1 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8485))

Abstract

Sarcasm is a pervasive linguistic phenomenon in online documents that express subjective and deeply-felt opinions. Detection of sarcasm is of great importance and beneficial to many NLP applications, such as sentiment analysis, opinion mining and advertising. Current studies consider automatic sarcasm detection as a simple text classification problem. They do not use explicit features to detect sarcasm and ignore the imbalance between sarcastic and non-sarcastic samples in real applications. In this paper, we first explore the characteristics of both English and Chinese sarcastic sentences and introduce a set of features specifically for detecting sarcasm in social media. Then, we propose a novel multi-strategy ensemble learning approach(MSELA) to handle the imbalance problem. We evaluate our proposed model on English and Chinese data sets. Experimental results show that our ensemble approach outperforms the state-of-the-art sarcasm detection approaches and popular imbalanced classification methods.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Liu, B., Zhang, L.: A survey of opinion mining and sentiment analysis. In: Mining Text Data, pp. 415–463. Springer (2012)
Google Scholar
Carvalho, P., Sarmento, L., Silva, M.J.: Clues for detecting irony in user-generated contents: oh..!! it’s so easy;-). In: Proceedings of the 1st International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion, pp. 53–56. ACM (2009)
Google Scholar
Burfoot, C., Baldwin, T.: Automatic satire detection: Are you having a laugh? In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 161–164. Association for Computational Linguistics (2009)
Google Scholar
González-Ibáñez, R., Muresan, S., Wacholder, N.: Identifying sarcasm in twitter: A closer look. In: ACL (Short Papers), pp. 581–586. Citeseer (2011)
Google Scholar
Liebrecht, C., Kunneman, F., van den Bosch, A.: The perfect solution for detecting sarcasm in tweets# not. In: WASSA 2013, p. 29 (2013)
Google Scholar
Blake, C., Merz, C.J.: Uci repository of machine learning databases (1998)
Google Scholar
Gibbs Jr, R.W., Colston, H.L.: Irony in language and thought: A cognitive science reader. Psychology Press (2007)
Google Scholar
Tsur, O., Davidov, D., Rappoport, A.: Icwsm-a great catchy name: Semi-supervised recognition of sarcastic sentences in online product reviews. In: ICWSM (2010)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. arXiv preprint arXiv:1106.1813 (2011)
Google Scholar
Yen, S.J., Lee, Y.S.: Cluster-based under-sampling approaches for imbalanced data distributions. Expert Systems with Applications 36(3), 5718–5727 (2009)
Article MathSciNet Google Scholar
Elkan, C.: The foundations of cost-sensitive learning. In: International Joint Conference on Artificial Intelligence, vol. 17, pp. 973–978. Citeseer (2001)
Google Scholar
Wang, B.X., Japkowicz, N.: Boosting support vector machines for imbalanced data sets. Knowledge and Information Systems 25(1), 1–20 (2010)
Article Google Scholar
Tax, D.M., Duin, R.P.: Support vector data description. Machine Learning 54(1), 45–66 (2004)
Article MATH Google Scholar
Sun, Y., Kamel, M.S., Wong, A.K., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition 40(12), 3358–3378 (2007)
Article MATH Google Scholar
Maalouf, M.: Trafalis: Robust weighted kernel logistic regression in imbalanced and rare events data. Computational Statistics & Data Analysis 55(1), 168–183 (2011)
Article MATH MathSciNet Google Scholar
Reyes, A., Rosso, P., Buscaldi, D.: From humor recognition to irony detection: The figurative language of social media. Data & Knowledge Engineering 74, 1–12 (2012)
Article Google Scholar
Chen, H., Du, Y., Jiang, K.: Classification of incomplete data using classifier ensembles. In: Systems and Informatics (ICSAI), pp. 2229–2232. IEEE (2012)
Google Scholar
Reyes, A., Rosso, P.: Making objective decisions from subjective data: Detecting irony in customer reviews. Decision Support Systems 53(4), 754–760 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of High Confidence Software Technologies, Ministry of Education, Beijing, 100871, China
Peng Liu, Wei Chen, Gaoyan Ou, Tengjiao Wang & Dongqing Yang
School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, China
Peng Liu, Wei Chen, Gaoyan Ou, Tengjiao Wang & Dongqing Yang
The Shenzhen Key Lab for Cloud Computing Technology and Applications, Peking University Shenzhen Graduate School, Shenzhen, 518055, China
Peng Liu & Kai Lei

Authors

Peng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Gaoyan Ou
View author publications
You can also search for this author in PubMed Google Scholar
Tengjiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dongqing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Kai Lei
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing, University of Utah, 50 S. Central Campus Drive, 84112, Salt Lake City,, UT, USA
Feifei Li
Department of Computer Science, Tsinghua University, 100084, Beijing, China
Guoliang Li
POSTECH, Republic of Korea
Seung-won Hwang
Shanghai Key Laboratory of Scalable Computing and Systems, Department of Computer Science and Engineering,, Shanghai Jiao Tong University, China
Bin Yao
Advanced Digital Sciences Center (ADSC), 138632, Singapore, Singapore
Zhenjie Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, P., Chen, W., Ou, G., Wang, T., Yang, D., Lei, K. (2014). Sarcasm Detection in Social Media Based on Imbalanced Classification. In: Li, F., Li, G., Hwang, Sw., Yao, B., Zhang, Z. (eds) Web-Age Information Management. WAIM 2014. Lecture Notes in Computer Science, vol 8485. Springer, Cham. https://doi.org/10.1007/978-3-319-08010-9_49

Download citation

DOI: https://doi.org/10.1007/978-3-319-08010-9_49
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08009-3
Online ISBN: 978-3-319-08010-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics