Skip to main content

Advertisement

Log in

An exploratory and automated study of sarcasm detection and classification in app stores using fine-tuned deep learning classifiers

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

App stores enable users to provide insightful feedback on apps, which developers can use for future software application enhancement and evolution. However, finding user reviews that are valuable and relevant for quality improvement and app enhancement is challenging because of increasing end-user feedback. Also, to date, according to our knowledge, the existing sentiment analysis approaches lack in considering sarcasm and its types when identifying sentiments of end-user reviews for requirements decision-making. Moreover, no work has been reported on detecting sarcasm by analyzing app reviews. This paper proposes an automated approach by detecting sarcasm and its types in end-user reviews and identifying valuable requirements-related information using natural language processing (NLP) and deep learning (DL) algorithms to help software engineers better understand end-user sentiments. For this purpose, we crawled 55,000 end-user comments on seven software apps in the Play Store. Then, a novel sarcasm coding guideline is developed by critically analyzing end-user reviews and recovering frequently used sarcastic types such as Irony, Humor, Flattery, Self-Deprecation, and Passive Aggression. Next, using coding guidelines and the content analysis approach, we annotated the 10,000 user comments and made them parsable for the state-of-the-art DL algorithms. We conducted a survey at two different universities in Pakistan to identify participants’ accuracy in manually identifying sarcasm in the end-user reviews. We developed a ground truth to compare the results of DL algorithms. We then applied various fine-tuned DL classifiers to first detect sarcasm in the end-user feedback and then further classified the sarcastic reviews into more fine-grained sarcastic types. For this, end-user comments are first pre-processed and balanced with the instances in the dataset. Then, feature engineering is applied to fine-tune the DL classifiers. We obtain an average accuracy of 97%, 96%, 96%, 96%, 96%, 86%, and 90% with binary classification and 90%, 91%, 92%, 91%, 91%, 75%, and 89% with CNN, LSTM, BiLSTM, GRU, BiGRU, RNN, and BiRNN classifiers, respectively. Such information would help improve the performance of sentiment analysis approaches to understand better the associated sentiments with the identified new features or issues.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

Access to the data sets and replication information is available through this link. \(^{6}\)https://github.com/nekdil566/sarcasm_detection_in_user_reviews.

Notes

  1. https://www.iub.edu.pk/

  2. https://www.kfueit.edu.pk/

  3. https://docs.google.com/forms/d/e/1FAIpQLSdN958qs5e9XfrC_ZnqceY1UmJ4ouYi5-xr6lgdgF1NE-2pnA/viewform?usp=sf_link

  4. https://www.iub.edu.pk/

  5. https://www.kfueit.edu.pk/

References

  • Ali Khan, J., Liu, L., Wen, L., Ali, R.: Conceptualising, extracting and analysing requirements arguments in users’ forums: the crowdre-arg framework. J. Softw. Evol. Process 32(12), e2309 (2020)

    Article  Google Scholar 

  • Ali Khan, J., Liu, L., Wen, L.: Requirements knowledge acquisition from online user forums. Iet Softw. 14(3), 242–253 (2020)

    Article  Google Scholar 

  • AlOmar, E. A., Aljedaani, W., Tamjeed, M., Mkaouer, M. W., El-Glaly, Y. N.: Finding the needle in a haystack: on the automatic identification of accessibility user reviews. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, series CHI ’21. Association for Computing Machinery. https://doi.org/10.1145/3411764.3445281 (2021)

  • Aslam, N., Ramay, W.Y., Xia, K., Sarwar, N.: Convolutional neural network based classification of app reviews. IEEE Access 8, 185619–185628 (2020)

    Article  Google Scholar 

  • Bakiu, E., Guzman, E.: Which feature is unusable? detecting usability and user experience issues from user reviews. In: IEEE 25th International Requirements Engineering Conference Workshops (REW), pp. 182–187. IEEE (2017)

  • Begel, A., Zimmermann, T.: Analyze this! 145 questions for data scientists in software engineering. In: Proceedings of the 36th International Conference on Software Engineering, pp. 12–23 (2014)

  • Bouazizi, M., Ohtsuki, T.: Opinion mining in twitter how to make use of sarcasm to enhance sentiment analysis. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, pp. 1594–1597 (2015)

  • Bouazizi, M., Ohtsuki, T.: Sarcasm over time and across platforms: does the way we express sarcasm change? IEEE Access 10, 55958–55987 (2022)

    Article  Google Scholar 

  • Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6(1), 1–6 (2004)

    Article  Google Scholar 

  • Chen, Z., Cao, Y., Yao, H., Lu, X., Peng, X., Mei, H., Liu, X.: Emoji-powered sentiment and emotion detection from software developers’ communication data. ACM Trans. Softw. Eng. Methodol. (TOSEM) 30(2), 1–48 (2021)

    Google Scholar 

  • Cohen, J.: Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit. Psychol. Bull. 70(4), 213 (1968)

    Article  Google Scholar 

  • Corbin, Juliet Strauss, Anselm: corbin 2015 basics. Basics of qualitative research,14, sage (2015)

  • Dąbrowski, J., Letier, E., Perini, A., Susi, A.: Finding and analyzing app reviews related to specific features: a research preview. In: International Working Conference on Requirements Engineering: Foundation for Software Quality, pp. 183–189. Springer (2019)

  • Dave, A. D., Desai, N. P.: A comprehensive study of classification techniques for sarcasm detection on textual data. In: 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), pp. 1985–1991. IEEE(2016)

  • Di Sorbo, A., Panichella, S., Alexandru, C. V., Shimagaki, J., Visaggio, C. A., Canfora, G., Gall, H. C.: What would users change in my app? summarizing app reviews for recommending software changes. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, series FSE 2016. Association for Computing Machinery, p. 499–510. https://doi.org/10.1145/2950290.2950299 (2016)

  • Di Sorbo, A., Panichella, S., Alexandru, C. V., Visaggio, C. A., Canfora, G.: Surf: summarizer of user reviews feedback. In: 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), pp. 55–58 (2017)

  • Eke, C.I., Norman, A.A., Shuib, L., Nweke, H.F.: Sarcasm identification in textual data: systematic review, research challenges and open directions. Artif. Intell. Rev. 53, 4215–4258 (2020)

    Article  Google Scholar 

  • Felbo, B., Mislove, A., Søgaard, A., Rahwan, I., Lehmann, S.: Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. Preprint at arXiv:1708.00524, (2017)

  • Filatova, E.: Irony and sarcasm: corpus generation and analysis using crowdsourcing. In: Lrec, pp. 392–398. Citeseer (2012)

  • Franzmann, D., Eichner, A., Holten, R.: How mobile app design overhauls can be disastrous in terms of user perception: The case of snapchat. ACM Trans. Soc. Comput. 3(4), 1–21 (2020)

    Article  Google Scholar 

  • Ghosh, D., Guo, W., Muresan, S.: Sarcastic or not: Word embeddings to predict the literal or sarcastic meaning of words. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1003–1012 (2015)

  • Gu, X., Kim, S.: What parts of your apps are loved by users?”(t). In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 760–770. IEEE (2015)

  • Guzman, E., Maalej, W.: How do users like this feature? a fine grained sentiment analysis of app reviews. In: IEEE 22nd International Requirements Engineering Conference (RE), vol. 2014, pp. 153–162. IEEE (2014)

  • Hadi, M.A., Fard, F.H.: Evaluating pre-trained models for user feedback analysis in software engineering: a study on classification of app-reviews. Empir. Softw. Eng. 28(4), 88 (2023)

    Article  Google Scholar 

  • Haering, M., Stanik, C., Maalej, W.: Automatically matching bug reports with related app reviews. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 970–981 (2021)

  • Hassan, S., Li, Q., Aurangzeb, K., Yasin, A., Khan, J.A., Anwar, M.S.: A systematic mapping to investigate the application of machine learning techniques in requirement engineering activities. CAAI Trans. Intell. Technol. (2024). https://doi.org/10.1049/cit2.12348

    Article  Google Scholar 

  • Imtiaz, S.: A novel auto-ML framework for sarcasm detection. University of East London, https://books.google.com.pk/books?id=jo9EzwEACAAJ (2022)

  • Jain, S., Ranjan, A., Baviskar, D. P.: Sarcasm detection in amazon product reviews (2018)

  • Kamal, A., Abulaish, M.: CAT-BiGRU: convolution and attention with bi-directional gated recurrent unit for self-deprecating sarcasm detection. Cognit. Comput. 14, 01 (2022)

    Article  Google Scholar 

  • Khan, J. A., Liu, L., Jia, Y., Wen, L.: Linguistic analysis of crowd requirements: an experimental study. In: IEEE 7th International Workshop on Empirical Requirements Engineering (EmpiRE). pp 24–31. IEEE (2018)

  • Khan, J. A., Liu, L., Wen, L., Ali, R.: Crowd intelligence in requirements engineering: Current status and future directions. In: Requirements Engineering: Foundation for software quality: 25th International Working Conference, REFSQ: Essen, Germany, March 18–21, Proceedings 25, pp. 245–261. Springer (2019)

  • Khan, J. A., Xie, Y., Liu, L., Wen, L.: Analysis of requirements-related arguments in user forums. In: IEEE 27th International Requirements Engineering Conference (RE), pp. 63–74. IEEE (2019)

  • Khan, J. A., Yasin, A., Fatima, R., Vasan, D., Khan, A. A., Khan, A. W.: Valuating requirements arguments in the online user’s forum for requirements decision-making: The crowdre-varg framework. In: Software: Practice and Experience, vol. 52, no. 12, pp. 2537–2573, (2022)

  • Khan, J.A., Ullah, T., Khan, A.A., Yasin, A., Akbar, M.A., Aurangzeb, K.: Can end-user feedback in social media be trusted for software evolution: exploring and analyzing fake reviews. Concurr. Comput. Pract. Exp. 36, e7990 (2023)

    Article  Google Scholar 

  • Khan, N.D., Khan, J.A., Li, J., Ullah, T., Alwadain, A., Yasin, A., Zhao, Q.: How do crowd-users express their opinions against software applications in social media? a fine-grained classification approach. IEEE Access 12, 1 (2024)

    Article  Google Scholar 

  • Khan, N., Khan, J., Li, J., Ullah, T., Zhao, Q.: Mining software insights: uncovering the frequently occurring issues in low-rating software applications. PeerJ Comput. Sci. 10, e2115 (2024)

    Article  Google Scholar 

  • Li, S., Guo, J., Fan, M., Lou, J.-G., Zheng, Q., Liu, T.: Automated bug reproduction from user reviews for android applications,” ser. ICSE-SEIP ’20. New York, NY, USA: Association for Computing Machinery, (2020), p. 51–60. https://doi.org/10.1145/3377813.3381355

  • Li, T., Zhang, F., Wang, D.: Automatic user preferences elicitation: a data-driven approach. In Requirements Engineering: Foundation for Software Quality: 24th International Working Conference, REFSQ: Utrecht, The Netherlands, March 19–22, Proceedings 24, pp. 324–331. Springer (2018)

  • Maalej, W., Nayebi, M., Johann, T., Ruhe, G.: Toward data-driven requirements engineering. IEEE Softw. 33(1), 48–54 (2015)

    Article  Google Scholar 

  • Martens, D., Johann, T.: On the emotion of users in app reviews. In: IEEE/ACM 2nd International Workshop on Emotion Awareness in Software Engineering (SEmotion), pp. 8–14. IEEE (2017)

  • Martin, W., Sarro, F., Jia, Y., Zhang, Y., Harman, M.: A survey of app store analysis for software engineering. IEEE Trans. Softw. Eng. 43(9), 817–847 (2016)

    Article  Google Scholar 

  • Martin, W., Sarro, F., Jia, Y., Zhang, Y., Harman, M.: A survey of app store analysis for software engineering. IEEE Trans. Softw. Eng. 43(9), 817–847 (2017)

    Article  Google Scholar 

  • Maynard, D. G., Greenwood, M. A.: Who cares about sarcastic tweets? Investigating the impact of sarcasm on sentiment analysis. In: Lrec 2014 proceedings, ELRA (2014)

  • Mekala, R. R., Irfan, A., Groen, E. C., Porter, A., Lindvall, M.: Classifying user requirements from online feedback in small dataset environments using deep learning. In: 2021 IEEE 29th International Requirements Engineering Conference (RE), pp. 185619–185628 (2021)

  • Neuendorf, K.A.: Defining content analysis. In: Content analysis guidebook. Sage, Thousand Oaks (2002)

    Google Scholar 

  • Noei, E., Zhang, F., Wang, S., Zou, Y.: Towards prioritizing user-related issue reports of mobile applications. Empir. Softw. Eng. 24, 1964–1996 (2019)

    Article  Google Scholar 

  • Palomba, F., Linares-Vá¡squez, M., Bavota, G., Oliveto, R., Di Penta, M., Poshyvanyk, D., De Lucia, A.: User reviews matter! Tracking crowdsourced reviews to support evolution of successful apps. In: 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 291–300 (2015)

  • Panichella, S., Di Sorbo, A., Guzman, E., Visaggio, C.A., Canfora, G., Gall, H.C.: How can i improve my app? classifying user reviews for software maintenance and evolution. In: IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 281–290. IEEE (2015)

  • Rajadesingan, A., Zafarani, R., Liu, H.: Sarcasm detection on twitter: a behavioral modeling approach. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 97–106 (2015)

  • Reyes, A., Rosso, P., Veale, T.: A multidimensional approach for detecting irony in twitter. Lang. Resour. Eval. 47, 239–268 (2013)

    Article  Google Scholar 

  • Riloff, E., Qadir, A., Surve, P., De Silva, L., Gilbert, N., Huang, R.: Sarcasm as contrast between a positive sentiment and negative situation. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 704–714 (2013)

  • Sinha, S., Vijeta, T., Kubde, P. K., Gajbhiye, A. P., Radke, M. A., Jones, C.: Sarcasm detection in product reviews using textual entailment approach. In: Proceedings of the 2023 7th International Conference on Natural Language Processing and Information Retrieval, pp. 310–318 (2023)

  • Ullah, T., Khan, J.A., Khan, N.D., Yasin, A., Arshad, H.: Exploring and mining rationale information for low-rating software applications. Soft Comput. (2023). https://doi.org/10.1007/s00500-023-09054-3

    Article  Google Scholar 

  • van Vliet, M., Groen, E.C., Dalpiaz, F., Brinkkemper, S.: Identifying and classifying user requirements in online feedback via crowdsourcing. In: Madhavji, N., Pasquale, L., Ferrari, A., Gnesi, S. (eds.) Requirements Engineering: Foundation for Software Quality, pp. 143–159. Springer, Cham (2020)

    Chapter  Google Scholar 

  • Wallace, B. C., Kertz, L., Charniak, E., et al.: Humans require context to infer ironic intent (so computers probably do, too). In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Vol. 2: Short Papers, pp. 512–516 (2014)

  • Wankhade, M., Rao, A.C.S., Kulkarni, C.: A survey on sentiment analysis methods, applications, and challenges. Artif. Intell. Rev. 55(7), 5731–5780 (2022)

    Article  Google Scholar 

  • Wei, J., Courbis, A.-L., Lambolais, T., Xu, B., Bernard, P., Dray, G.: Towards a data-driven requirements engineering approach: automatic analysis of user reviews. arXiv:2206.14669 (2022)

  • Wu, H., Deng, W., Niu, X., Nie, C.: Identifying key features from app user reviews. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 922–932. IEEE (2021)

  • Zhang, M., Zhang, Y., Fu, G.: Tweet sarcasm detection using deep neural network. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2449–2460 (2016)

  • Zhao, L., Zhao, A.: Sentiment analysis based requirement evolution prediction. Futur. Internet 11(2), 52 (2019)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

Methodology, formal analysis, and experimentation were conducted by E.F. and J.A.K. Data curation and the initial draft were prepared by H.K., while experimental validation and fine-tuning of DL experimentation and final revisions and English polishing were undertaken by N.D.K. and J.A.K. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Javed Ali Khan.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fatima, E., Kanwal, H., Khan, J.A. et al. An exploratory and automated study of sarcasm detection and classification in app stores using fine-tuned deep learning classifiers. Autom Softw Eng 31, 69 (2024). https://doi.org/10.1007/s10515-024-00468-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10515-024-00468-3

Keywords