An exploratory and automated study of sarcasm detection and classification in app stores using fine-tuned deep learning classifiers

Fatima, Eman; Kanwal, Hira; Khan, Javed Ali; Khan, Nek Dil

doi:10.1007/s10515-024-00468-3

An exploratory and automated study of sarcasm detection and classification in app stores using fine-tuned deep learning classifiers

Published: 27 August 2024

Volume 31, article number 69, (2024)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Eman Fatima¹,
Hira Kanwal¹,
Javed Ali Khan² &
…
Nek Dil Khan³

607 Accesses
1 Citation
Explore all metrics

Abstract

App stores enable users to provide insightful feedback on apps, which developers can use for future software application enhancement and evolution. However, finding user reviews that are valuable and relevant for quality improvement and app enhancement is challenging because of increasing end-user feedback. Also, to date, according to our knowledge, the existing sentiment analysis approaches lack in considering sarcasm and its types when identifying sentiments of end-user reviews for requirements decision-making. Moreover, no work has been reported on detecting sarcasm by analyzing app reviews. This paper proposes an automated approach by detecting sarcasm and its types in end-user reviews and identifying valuable requirements-related information using natural language processing (NLP) and deep learning (DL) algorithms to help software engineers better understand end-user sentiments. For this purpose, we crawled 55,000 end-user comments on seven software apps in the Play Store. Then, a novel sarcasm coding guideline is developed by critically analyzing end-user reviews and recovering frequently used sarcastic types such as Irony, Humor, Flattery, Self-Deprecation, and Passive Aggression. Next, using coding guidelines and the content analysis approach, we annotated the 10,000 user comments and made them parsable for the state-of-the-art DL algorithms. We conducted a survey at two different universities in Pakistan to identify participants’ accuracy in manually identifying sarcasm in the end-user reviews. We developed a ground truth to compare the results of DL algorithms. We then applied various fine-tuned DL classifiers to first detect sarcasm in the end-user feedback and then further classified the sarcastic reviews into more fine-grained sarcastic types. For this, end-user comments are first pre-processed and balanced with the instances in the dataset. Then, feature engineering is applied to fine-tune the DL classifiers. We obtain an average accuracy of 97%, 96%, 96%, 96%, 96%, 86%, and 90% with binary classification and 90%, 91%, 92%, 91%, 91%, 75%, and 89% with CNN, LSTM, BiLSTM, GRU, BiGRU, RNN, and BiRNN classifiers, respectively. Such information would help improve the performance of sentiment analysis approaches to understand better the associated sentiments with the identified new features or issues.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploring and mining rationale information for low-rating software applications

Article 05 August 2023

Sentiment analysis on google play store app users’ reviews based on deep learning approach

Article 19 April 2024

Automated Classification of Mobile App Reviews Considering User’s Quality Concerns

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

Access to the data sets and replication information is available through this link. $^{6}$https://github.com/nekdil566/sarcasm_detection_in_user_reviews.

Notes

References

Ali Khan, J., Liu, L., Wen, L., Ali, R.: Conceptualising, extracting and analysing requirements arguments in users’ forums: the crowdre-arg framework. J. Softw. Evol. Process 32(12), e2309 (2020)
Article Google Scholar
Ali Khan, J., Liu, L., Wen, L.: Requirements knowledge acquisition from online user forums. Iet Softw. 14(3), 242–253 (2020)
Article Google Scholar
AlOmar, E. A., Aljedaani, W., Tamjeed, M., Mkaouer, M. W., El-Glaly, Y. N.: Finding the needle in a haystack: on the automatic identification of accessibility user reviews. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, series CHI ’21. Association for Computing Machinery. https://doi.org/10.1145/3411764.3445281 (2021)
Aslam, N., Ramay, W.Y., Xia, K., Sarwar, N.: Convolutional neural network based classification of app reviews. IEEE Access 8, 185619–185628 (2020)
Article Google Scholar
Bakiu, E., Guzman, E.: Which feature is unusable? detecting usability and user experience issues from user reviews. In: IEEE 25th International Requirements Engineering Conference Workshops (REW), pp. 182–187. IEEE (2017)
Begel, A., Zimmermann, T.: Analyze this! 145 questions for data scientists in software engineering. In: Proceedings of the 36th International Conference on Software Engineering, pp. 12–23 (2014)
Bouazizi, M., Ohtsuki, T.: Opinion mining in twitter how to make use of sarcasm to enhance sentiment analysis. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, pp. 1594–1597 (2015)
Bouazizi, M., Ohtsuki, T.: Sarcasm over time and across platforms: does the way we express sarcasm change? IEEE Access 10, 55958–55987 (2022)
Article Google Scholar
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6(1), 1–6 (2004)
Article Google Scholar
Chen, Z., Cao, Y., Yao, H., Lu, X., Peng, X., Mei, H., Liu, X.: Emoji-powered sentiment and emotion detection from software developers’ communication data. ACM Trans. Softw. Eng. Methodol. (TOSEM) 30(2), 1–48 (2021)
Google Scholar
Cohen, J.: Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit. Psychol. Bull. 70(4), 213 (1968)
Article Google Scholar
Corbin, Juliet Strauss, Anselm: corbin 2015 basics. Basics of qualitative research,14, sage (2015)
Dąbrowski, J., Letier, E., Perini, A., Susi, A.: Finding and analyzing app reviews related to specific features: a research preview. In: International Working Conference on Requirements Engineering: Foundation for Software Quality, pp. 183–189. Springer (2019)
Dave, A. D., Desai, N. P.: A comprehensive study of classification techniques for sarcasm detection on textual data. In: 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), pp. 1985–1991. IEEE(2016)
Di Sorbo, A., Panichella, S., Alexandru, C. V., Shimagaki, J., Visaggio, C. A., Canfora, G., Gall, H. C.: What would users change in my app? summarizing app reviews for recommending software changes. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, series FSE 2016. Association for Computing Machinery, p. 499–510. https://doi.org/10.1145/2950290.2950299 (2016)
Di Sorbo, A., Panichella, S., Alexandru, C. V., Visaggio, C. A., Canfora, G.: Surf: summarizer of user reviews feedback. In: 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), pp. 55–58 (2017)
Eke, C.I., Norman, A.A., Shuib, L., Nweke, H.F.: Sarcasm identification in textual data: systematic review, research challenges and open directions. Artif. Intell. Rev. 53, 4215–4258 (2020)
Article Google Scholar
Felbo, B., Mislove, A., Søgaard, A., Rahwan, I., Lehmann, S.: Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. Preprint at arXiv:1708.00524, (2017)
Filatova, E.: Irony and sarcasm: corpus generation and analysis using crowdsourcing. In: Lrec, pp. 392–398. Citeseer (2012)
Franzmann, D., Eichner, A., Holten, R.: How mobile app design overhauls can be disastrous in terms of user perception: The case of snapchat. ACM Trans. Soc. Comput. 3(4), 1–21 (2020)
Article Google Scholar
Ghosh, D., Guo, W., Muresan, S.: Sarcastic or not: Word embeddings to predict the literal or sarcastic meaning of words. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1003–1012 (2015)
Gu, X., Kim, S.: What parts of your apps are loved by users?”(t). In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 760–770. IEEE (2015)
Guzman, E., Maalej, W.: How do users like this feature? a fine grained sentiment analysis of app reviews. In: IEEE 22nd International Requirements Engineering Conference (RE), vol. 2014, pp. 153–162. IEEE (2014)
Hadi, M.A., Fard, F.H.: Evaluating pre-trained models for user feedback analysis in software engineering: a study on classification of app-reviews. Empir. Softw. Eng. 28(4), 88 (2023)
Article Google Scholar
Haering, M., Stanik, C., Maalej, W.: Automatically matching bug reports with related app reviews. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 970–981 (2021)
Hassan, S., Li, Q., Aurangzeb, K., Yasin, A., Khan, J.A., Anwar, M.S.: A systematic mapping to investigate the application of machine learning techniques in requirement engineering activities. CAAI Trans. Intell. Technol. (2024). https://doi.org/10.1049/cit2.12348
Article Google Scholar
Imtiaz, S.: A novel auto-ML framework for sarcasm detection. University of East London, https://books.google.com.pk/books?id=jo9EzwEACAAJ (2022)
Jain, S., Ranjan, A., Baviskar, D. P.: Sarcasm detection in amazon product reviews (2018)
Kamal, A., Abulaish, M.: CAT-BiGRU: convolution and attention with bi-directional gated recurrent unit for self-deprecating sarcasm detection. Cognit. Comput. 14, 01 (2022)
Article Google Scholar
Khan, J. A., Liu, L., Jia, Y., Wen, L.: Linguistic analysis of crowd requirements: an experimental study. In: IEEE 7th International Workshop on Empirical Requirements Engineering (EmpiRE). pp 24–31. IEEE (2018)
Khan, J. A., Liu, L., Wen, L., Ali, R.: Crowd intelligence in requirements engineering: Current status and future directions. In: Requirements Engineering: Foundation for software quality: 25th International Working Conference, REFSQ: Essen, Germany, March 18–21, Proceedings 25, pp. 245–261. Springer (2019)
Khan, J. A., Xie, Y., Liu, L., Wen, L.: Analysis of requirements-related arguments in user forums. In: IEEE 27th International Requirements Engineering Conference (RE), pp. 63–74. IEEE (2019)
Khan, J. A., Yasin, A., Fatima, R., Vasan, D., Khan, A. A., Khan, A. W.: Valuating requirements arguments in the online user’s forum for requirements decision-making: The crowdre-varg framework. In: Software: Practice and Experience, vol. 52, no. 12, pp. 2537–2573, (2022)
Khan, J.A., Ullah, T., Khan, A.A., Yasin, A., Akbar, M.A., Aurangzeb, K.: Can end-user feedback in social media be trusted for software evolution: exploring and analyzing fake reviews. Concurr. Comput. Pract. Exp. 36, e7990 (2023)
Article Google Scholar
Khan, N.D., Khan, J.A., Li, J., Ullah, T., Alwadain, A., Yasin, A., Zhao, Q.: How do crowd-users express their opinions against software applications in social media? a fine-grained classification approach. IEEE Access 12, 1 (2024)
Article Google Scholar
Khan, N., Khan, J., Li, J., Ullah, T., Zhao, Q.: Mining software insights: uncovering the frequently occurring issues in low-rating software applications. PeerJ Comput. Sci. 10, e2115 (2024)
Article Google Scholar
Li, S., Guo, J., Fan, M., Lou, J.-G., Zheng, Q., Liu, T.: Automated bug reproduction from user reviews for android applications,” ser. ICSE-SEIP ’20. New York, NY, USA: Association for Computing Machinery, (2020), p. 51–60. https://doi.org/10.1145/3377813.3381355
Li, T., Zhang, F., Wang, D.: Automatic user preferences elicitation: a data-driven approach. In Requirements Engineering: Foundation for Software Quality: 24th International Working Conference, REFSQ: Utrecht, The Netherlands, March 19–22, Proceedings 24, pp. 324–331. Springer (2018)
Maalej, W., Nayebi, M., Johann, T., Ruhe, G.: Toward data-driven requirements engineering. IEEE Softw. 33(1), 48–54 (2015)
Article Google Scholar
Martens, D., Johann, T.: On the emotion of users in app reviews. In: IEEE/ACM 2nd International Workshop on Emotion Awareness in Software Engineering (SEmotion), pp. 8–14. IEEE (2017)
Martin, W., Sarro, F., Jia, Y., Zhang, Y., Harman, M.: A survey of app store analysis for software engineering. IEEE Trans. Softw. Eng. 43(9), 817–847 (2016)
Article Google Scholar
Martin, W., Sarro, F., Jia, Y., Zhang, Y., Harman, M.: A survey of app store analysis for software engineering. IEEE Trans. Softw. Eng. 43(9), 817–847 (2017)
Article Google Scholar
Maynard, D. G., Greenwood, M. A.: Who cares about sarcastic tweets? Investigating the impact of sarcasm on sentiment analysis. In: Lrec 2014 proceedings, ELRA (2014)
Mekala, R. R., Irfan, A., Groen, E. C., Porter, A., Lindvall, M.: Classifying user requirements from online feedback in small dataset environments using deep learning. In: 2021 IEEE 29th International Requirements Engineering Conference (RE), pp. 185619–185628 (2021)
Neuendorf, K.A.: Defining content analysis. In: Content analysis guidebook. Sage, Thousand Oaks (2002)
Google Scholar
Noei, E., Zhang, F., Wang, S., Zou, Y.: Towards prioritizing user-related issue reports of mobile applications. Empir. Softw. Eng. 24, 1964–1996 (2019)
Article Google Scholar
Palomba, F., Linares-Vá¡squez, M., Bavota, G., Oliveto, R., Di Penta, M., Poshyvanyk, D., De Lucia, A.: User reviews matter! Tracking crowdsourced reviews to support evolution of successful apps. In: 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 291–300 (2015)
Panichella, S., Di Sorbo, A., Guzman, E., Visaggio, C.A., Canfora, G., Gall, H.C.: How can i improve my app? classifying user reviews for software maintenance and evolution. In: IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 281–290. IEEE (2015)
Rajadesingan, A., Zafarani, R., Liu, H.: Sarcasm detection on twitter: a behavioral modeling approach. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 97–106 (2015)
Reyes, A., Rosso, P., Veale, T.: A multidimensional approach for detecting irony in twitter. Lang. Resour. Eval. 47, 239–268 (2013)
Article Google Scholar
Riloff, E., Qadir, A., Surve, P., De Silva, L., Gilbert, N., Huang, R.: Sarcasm as contrast between a positive sentiment and negative situation. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 704–714 (2013)
Sinha, S., Vijeta, T., Kubde, P. K., Gajbhiye, A. P., Radke, M. A., Jones, C.: Sarcasm detection in product reviews using textual entailment approach. In: Proceedings of the 2023 7th International Conference on Natural Language Processing and Information Retrieval, pp. 310–318 (2023)
Ullah, T., Khan, J.A., Khan, N.D., Yasin, A., Arshad, H.: Exploring and mining rationale information for low-rating software applications. Soft Comput. (2023). https://doi.org/10.1007/s00500-023-09054-3
Article Google Scholar
van Vliet, M., Groen, E.C., Dalpiaz, F., Brinkkemper, S.: Identifying and classifying user requirements in online feedback via crowdsourcing. In: Madhavji, N., Pasquale, L., Ferrari, A., Gnesi, S. (eds.) Requirements Engineering: Foundation for Software Quality, pp. 143–159. Springer, Cham (2020)
Chapter Google Scholar
Wallace, B. C., Kertz, L., Charniak, E., et al.: Humans require context to infer ironic intent (so computers probably do, too). In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Vol. 2: Short Papers, pp. 512–516 (2014)
Wankhade, M., Rao, A.C.S., Kulkarni, C.: A survey on sentiment analysis methods, applications, and challenges. Artif. Intell. Rev. 55(7), 5731–5780 (2022)
Article Google Scholar
Wei, J., Courbis, A.-L., Lambolais, T., Xu, B., Bernard, P., Dray, G.: Towards a data-driven requirements engineering approach: automatic analysis of user reviews. arXiv:2206.14669 (2022)
Wu, H., Deng, W., Niu, X., Nie, C.: Identifying key features from app user reviews. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 922–932. IEEE (2021)
Zhang, M., Zhang, Y., Fu, G.: Tweet sarcasm detection using deep neural network. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2449–2460 (2016)
Zhao, L., Zhao, A.: Sentiment analysis based requirement evolution prediction. Futur. Internet 11(2), 52 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of CS and IT, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
Eman Fatima & Hira Kanwal
Department of Computer Science, School of Physics, Engineering and Computer Science, University of Hertfordshire, Hatfield, UK
Javed Ali Khan
Faculty of Information Technology, Beijing University of Technology, Beijing, China
Nek Dil Khan

Authors

Eman Fatima
View author publications
You can also search for this author inPubMed Google Scholar
Hira Kanwal
View author publications
You can also search for this author inPubMed Google Scholar
Javed Ali Khan
View author publications
You can also search for this author inPubMed Google Scholar
Nek Dil Khan
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Methodology, formal analysis, and experimentation were conducted by E.F. and J.A.K. Data curation and the initial draft were prepared by H.K., while experimental validation and fine-tuning of DL experimentation and final revisions and English polishing were undertaken by N.D.K. and J.A.K. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Javed Ali Khan.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Fatima, E., Kanwal, H., Khan, J.A. et al. An exploratory and automated study of sarcasm detection and classification in app stores using fine-tuned deep learning classifiers. Autom Softw Eng 31, 69 (2024). https://doi.org/10.1007/s10515-024-00468-3

Download citation

Received: 01 April 2024
Accepted: 21 August 2024
Published: 27 August 2024
DOI: https://doi.org/10.1007/s10515-024-00468-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An exploratory and automated study of sarcasm detection and classification in app stores using fine-tuned deep learning classifiers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exploring and mining rationale information for low-rating software applications

Sentiment analysis on google play store app users’ reviews based on deep learning approach

Automated Classification of Mobile App Reviews Considering User’s Quality Concerns

Explore related subjects

Data availability

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now