On the Limitations of Combining Sentiment Analysis Tools in a Cross-Platform Setting

Obaidi, Martin; Holm, Henrik; Schneider, Kurt; Klünder, Jil

doi:10.1007/978-3-031-21388-5_8

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13709))

Included in the following conference series:

International Conference on Product-Focused Software Process Improvement

1482 Accesses

Abstract

A positive working climate is essential in modern software development. It enhances productivity since a satisfied developer tends to deliver better results. Sentiment analysis tools are a means to analyze and classify textual communication between developers according to the polarity of the statements. Most of these tools deliver promising results when used with test data from the domain they are developed for (e.g., GitHub). But the tools’ outcomes lack reliability when used in a different domain (e.g., Stack Overflow). One possible way to mitigate this problem is to combine different tools trained in different domains. In this paper, we analyze a combination of three sentiment analysis tools in a voting classifier according to their reliability and performance. The tools are trained and evaluated using five already existing polarity data sets (e.g. from GitHub). The results indicate that this kind of combination of tools is a good choice in the within-platform setting. However, a majority vote does not necessarily lead to better results when applying in cross-platform domains. In most cases, the best individual tool in the ensemble is preferable. This is mainly due to the often large difference in performance of the individual tools, even on the same data set. However, this may also be due to the different annotated data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cabrera-Diego, L.A., Bessis, N., Korkontzelos, I.: Classifying emotions in stack overflow and JIRA using a multi-label approach. Knowl. Based Syst. 195, 105633 (2020). https://doi.org/10.1016/j.knosys.2020.105633
Article Google Scholar
Calefato, F., Lanubile, F., Maiorano, F., Novielli, N.: Sentiment polarity detection for software development. Empir. Softw. Eng. 23(3), 1352–1382 (2017). https://doi.org/10.1007/s10664-017-9546-9
Article Google Scholar
Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psycholog. Bull. 76(5), 378–382 (1971). https://doi.org/10.1037/h0031619
Gachechiladze, D., Lanubile, F., Novielli, N., Serebrenik, A.: Anger and its direction in collaborative software development. In: Proceedings of the 39th International Conference on Software Engineering: New Ideas and Emerging Results Track, ICSE-NIER 2017, pp. 11–14. IEEE Press (2017). https://doi.org/10.1109/ICSE-NIER.2017.18
Graziotin, D., Wang, X., Abrahamsson, P.: Do feelings matter? On the correlation of affects and the self-assessed productivity in software engineering. J. Softw. Evol. Process 27(7), 467–487 (2015). https://doi.org/10.1002/smr.1673
Article Google Scholar
Herrmann, M., Klünder, J.: From textual to verbal communication: towards applying sentiment analysis to a software project meeting. In: 2021 IEEE 29th International Requirements Engineering Conference Workshops (REW), pp. 371–376 (2021). https://doi.org/10.1109/REW53955.2021.00065
Herrmann, M., Obaidi, M., Chazette, L., Klünder, J.: On the subjectivity of emotions in software projects: how reliable are pre-labeled data sets for sentiment analysis? J. Syst. Softw. 193, 111448 (2022). https://doi.org/10.1016/j.jss.2022.111448
Article Google Scholar
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977). https://doi.org/10.2307/2529310
Article MATH Google Scholar
Lin, B., Cassee, N., Serebrenik, A., Bavota, G., Novielli, N., Lanza, M.: Opinion mining for software development: a systematic literature review. ACM Trans. Softw. Eng. Methodol. 31(3) (2022). https://doi.org/10.1145/3490388
Lin, B., Zampetti, F., Bavota, G., Di Penta, M., Lanza, M., Oliveto, R.: Sentiment analysis for software engineering: how far can we go? In: Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, pp. 94–104. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3180155.3180195
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach (2019). https://doi.org/10.48550/ARXIV.1907.11692
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, reprinted Cambridge University Press, Cambridge (2009)
MATH Google Scholar
Novielli, N., Calefato, F., Dongiovanni, D., Girardi, D., Lanubile, F.: Can we use se-specific sentiment analysis tools in a cross-platform setting? In: Proceedings of the 17th International Conference on Mining Software Repositories, MSR 20220. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3379597.3387446
Novielli, N., Calefato, F., Dongiovanni, D., Girardi, D., Lanubile, F.: A gold standard for polarity of emotions of software developers in GitHub (2020). https://doi.org/10.6084/m9.figshare.11604597.v1
Novielli, N., Calefato, F., Lanubile, F.: A gold standard for emotion annotation in stack overflow. In: 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR), MSR 2018, pp. 14–17. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3196398.3196453
Novielli, N., Calefato, F., Lanubile, F., Serebrenik, A.: Assessment of off-the-shelf SE-specific sentiment analysis tools: an extended replication study. Empir. Softw. Eng. 26(4), 1–29 (2021). https://doi.org/10.1007/s10664-021-09960-w
Article Google Scholar
Novielli, N., Girardi, D., Lanubile, F.: A benchmark study on sentiment analysis for software engineering research. In: Proceedings of the 15th International Conference on Mining Software Repositories, MSR 2018, pp. 364–375. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3196398.3196403
Obaidi, M., Nagel, L., Specht, A., Klünder, J.: Sentiment analysis tools in software engineering: a systematic mapping study. Inf. Softw. Technol. 151, 107018 (2022). https://doi.org/10.1016/j.infsof.2022.107018
Article Google Scholar
Ortu, M., et al.: The emotional side of software developers in JIRA. In: Proceedings of the 13th International Conference on Mining Software Repositories, MSR 2016, pp. 480–483. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2901739.2903505
Parrott, W.G.: Emotions in Social Psychology: Essential Readings. Psychology Press (2001)
Google Scholar
Schneider, K., Klünder, J., Kortum, F., Handke, L., Straube, J., Kauffeld, S.: Positive affect through interactions in meetings: the role of proactive and supportive statements. J. Syst. Softw. 143, 59–70 (2018). https://doi.org/10.1016/j.jss.2018.05.001
Article Google Scholar
Uddin, G., Guéhénuc, Y.G., Khomh, F., Roy, C.K.: An empirical study of the effectiveness of an ensemble of stand-alone sentiment detection tools for software engineering datasets. ACM Trans. Softw. Eng. Methodol. 31(3) (2022). https://doi.org/10.1145/3491211
Uddin, G., Khomh, F.: Automatic mining of opinions expressed about APIS in stack overflow. IEEE Trans. Software Eng. 47(3), 522–559 (2021). https://doi.org/10.1109/TSE.2019.2900245
Article Google Scholar
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M.C., Regnell, B., Wesslén, A.: Experimentation in Software Engineering. Springer, Berlin (2012). https://doi.org/10.1007/978-3-642-29044-2
Zhang, T., Xu, B., Thung, F., Haryono, S.A., Lo, D., Jiang, L.: Sentiment analysis for software engineering: How far can pre-trained transformer models go? In: 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 70–80 (2020). https://doi.org/10.1109/ICSME46990.2020.00017

Download references

Acknowledgment

This research was funded by the Leibniz University Hannover as a Leibniz Young Investigator Grant (Project ComContA, Project Number 85430128, 2020–2022).

Author information

Authors and Affiliations

Software Engineering Group, Leibniz University Hannover, Welfengarten 1, 30167, Hannover, Germany
Martin Obaidi, Henrik Holm, Kurt Schneider & Jil Klünder

Authors

Martin Obaidi
View author publications
You can also search for this author in PubMed Google Scholar
Henrik Holm
View author publications
You can also search for this author in PubMed Google Scholar
Kurt Schneider
View author publications
You can also search for this author in PubMed Google Scholar
Jil Klünder
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Obaidi .

Editor information

Editors and Affiliations

Tampere University, Tampere, Finland
Davide Taibi
Reutlingen University, Reutlingen, Germany
Marco Kuhrmann
University of Jyväskylä, Jyväskylä, Finland
Tommi Mikkonen
Leibniz University Hannover, Hannover, Germany
Jil Klünder
University of Jyväskylä, Jyväskylä, Finland
Pekka Abrahamsson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Obaidi, M., Holm, H., Schneider, K., Klünder, J. (2022). On the Limitations of Combining Sentiment Analysis Tools in a Cross-Platform Setting. In: Taibi, D., Kuhrmann, M., Mikkonen, T., Klünder, J., Abrahamsson, P. (eds) Product-Focused Software Process Improvement. PROFES 2022. Lecture Notes in Computer Science, vol 13709. Springer, Cham. https://doi.org/10.1007/978-3-031-21388-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-21388-5_8
Published: 14 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21387-8
Online ISBN: 978-3-031-21388-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On the Limitations of Combining Sentiment Analysis Tools in a Cross-Platform Setting