An exploratory qualitative and quantitative analysis of emotions in issue report comments of open source systems

Murgia, Alessandro; Ortu, Marco; Tourani, Parastou; Adams, Bram; Demeyer, Serge

doi:10.1007/s10664-017-9526-0

An exploratory qualitative and quantitative analysis of emotions in issue report comments of open source systems

Published: 09 June 2017

Volume 23, pages 521–564, (2018)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Alessandro Murgia ORCID: orcid.org/0000-0002-8990-0624¹^nAff4,
Marco Ortu²,
Parastou Tourani³,
Bram Adams³ &
…
Serge Demeyer¹

1711 Accesses
42 Citations
4 Altmetric
Explore all metrics

Abstract

Software development—just like any other human collaboration—inevitably evokes emotions like joy or sadness, which are known to affect the group dynamics within a team. Today, little is known about those individual emotions and whether they can be discerned at all in the development artifacts produced during a project. This paper analyzes (a) whether issue reports—a common development artifact, rich in content—convey emotional information and (b) whether humans agree on the presence of these emotions. From the analysis of the issue comments of 117 projects of the Apache Software Foundation, we find that developers express emotions (in particular gratitude, joy and sadness). However, the more context is provided about an issue report, the more human raters start to doubt and nuance their interpretation. Based on these results, we demonstrate the feasibility of a machine learning classifier for identifying issue comments containing gratitude, joy and sadness. Such a classifier, using emotion-driving words and technical terms, obtains a good precision and recall for identifying the emotion love, while for joy and sadness a lower recall is obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on sentiment analysis methods, applications, and challenges

Article 07 February 2022

A review on sentiment analysis and emotion detection from text

Article 28 August 2021

The Turing test of online reviews: Can we tell the difference between human-written and GPT-4-written online reviews?

Article 12 April 2024

Notes

In the pilot study, 4 raters were permuted to label 400 comments —200 comments per rater (cf. Section 4.2.1), while in the full study, 16 raters were permuted to label 392 comments —98 comment per rater (cf. Section 4.2.2). In all studies, each rater paired up with each other rater the same number of times.
Data set can be downloaded for replication purposes at a web-site hosted by the University of Antwerp: http://ansymore.uantwerpen.be/system/files/uploads/artefacts/alessandro/MSR16/archive3.zip.

References

Ahmed T, Srivastava A (2017) Understanding and evaluating the behavior of technical users. a study of developer interaction at stackoverflow. Human-centric Computing and Information Sciences 7(1):8
Article Google Scholar
Amabile T M, Barsade S G, Mueller J S, Staw B M (2005) Affect and creativity at work. Adm Sci Q 50(3):367–403. doi:10.2307/30037208
Article Google Scholar
Aman S, Szpakowicz S (2007) Identifying expressions of emotion in text 10th international conference on text, speech and dialogue (TSD). Springer, pp 196–205
Google Scholar
Ambler S (2002) Agile modeling: effective practices for extreme programming and the unified process. Wiley, New York
Google Scholar
Bacchelli A, Lanza M, Robbes R (2010) Linking e-mails and source code artifacts Proceedings of the international conference on software engineering (ICSE), pp 375–384
Google Scholar
Bacchelli A, Sasso TD, D’Ambros M, Lanza M (2012) Content classification of development emails Proceedings of the international conference on software engineering (ICSE), pp 375–385
Google Scholar
Balabantaray R, Mohammad M, Sharma N (2012) Multi-class twitter emotion classification: a new approach. International Journal of Applied Information Systems 4 (1):48–53
Article Google Scholar
Bazelli B, Hindle A, Stroulia E (2013) On the personality traits of stackoverflow users International conference on software maintenance (ICSM). doi:10.1109/ICSM.2013.72, pp 460–463
Google Scholar
Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. Journal of Computational Science 2(1):1–8
Article Google Scholar
Brodkin J (2013) Linus torvalds defends his right to shame linux kernel developers. http://www.webcitation.org/6O2zErgzE
Brooks FP Jr (1987) No silver bullet essence and accidents of software engineering. Computer 20(4):10–19
Article Google Scholar
Campbell DT, Stanley JC (1963) Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin
Cataldi M, Ballatore A, Tiddi I, Aufaure M A (2013) Good location, terrible food: detecting feature sentiment in user-generated reviews. Social Netw Analys Mining 3(4):1149–1163
Article Google Scholar
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46
Article MathSciNet Google Scholar
Das S R, Chen M Y (2007) Yahoo! for amazon: sentiment extraction from small talk on the web. Manag Sci 53(9):1375–1388. http://EconPapers.repec.org/RePEc:inm:ormnsc:v:53:y:2007:i:9:p:1375-1388
Article Google Scholar
De Choudhury M, Counts S (2013) Understanding affect in the workplace via social media Proceedings of the conference on computer supported cooperative work. ACM, New York. doi:10.1145/2441776.2441812, pp 303–316
Google Scholar
DeMarco T, Lister T (1999) Peopleware: productive projects and teams, 2nd edn. Dorset House Publishing Co. Inc, New York
Google Scholar
Destefanis G, Marco O, Steve C, Steve S, Michele M, Roberto T (2016) Software development: do good manners matter? PeerJ Comp Sci 2:e73. doi:10.7717/peerj-cs.73
Elfenbein H A, Ambady N (2002) On the universality and cultural specificity of emotion recognition: a meta-analysis. Psychol Bull 128(2):203
Article Google Scholar
Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82–89. doi:10.1145/2436256.2436274
Article Google Scholar
Fleiss J L (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378
Article Google Scholar
Fowler J H, Christakis N A (2008) Dynamic spread of happiness in a large social network: longitudinal analysis over 20 years in the framingham heart study. BMJ 337. doi:10.1136/bmj.a2338
Fredrickson B L (2001) The role of positive emotions in positive psychology: the broaden-and-build theory of positive emotions. Am Psychol 56(3):218
Article Google Scholar
Fritz T, Müller S (2016) Leveraging biometric data to boost software developer productivity International conference on software analysis, evolution and reengeneering (future of software engineering track), s.n
Google Scholar
Gold J (2015) A prominent linux kernel developer is stepping down from her direct work in the kernel community. http://www.networkworld.com/article/2988850/opensource-subnet/linux-kernel-dev-sarah-sharp-quits-citing-brutal-communications-style.html
Graziotin D, Wang X, Abrahamsson P (2014) Happy software developers solve problems better: psychological measurements in empirical software engineering. PeerJ e289. doi:10.7717/peerj.289
Guillory J, Spiegel J, Drislane M, Weiss B, Donner W, Hancock J (2011) Upset now?: emotion contagion in distributed groups Proceedings of the conference on human factors in computing systems (CHI), pp 745–748
Google Scholar
Guzman E, Bruegge B (2013) Towards emotional awareness in software development teams Proceedings of the joint meeting on foundations of software engineering (ESEC/FSE), pp 671–674
Google Scholar
Guzman E, Azócar D, Li Y (2014) Sentiment analysis of commit comments in github: an empirical study Proceedings of the working conference on mining software repositories (MSR). ACM, New York, MSR 2014. doi:10.1145/2597073.2597118, pp 352–355
Guzzi A, Bacchelli A, Lanza M, Pinzger M, van Deursen A (2013) Communication in open source software development mailing lists Proceedings of the working conference on mining software repositories (MSR), pp 277–286
Hancock JT, Gee K, Ciaccio K, Lin JMH (2008) I’m sad you’re sad: emotional contagion in CMC Proceedings of the 2008 ACM conference on computer supported cooperative work (CSCW), pp 295–298
Heritage Dictionary A (2005) The american heritage science dictionary. http://dictionary.reference.com/browse/
Hu M, Liu B (2004) Mining and summarizing customer reviews Proceedings of the international conference on knowledge discovery and data mining, ACM, New York, KDD ’04. doi:10.1145/1014052.1014073, pp 168–177
Jongeling R, Datta S, Serebrenik A (2015) (2015) Choosing Your weapons: On sentiment analysis tools for software engineering research IEEE international conference on software maintenance and evolution (ICSME)
Mäntylä M, Adams B, Destefanis G, Graziotin D, Ortu M (2016) Mining valence, arousal, and dominance: possibilities for detecting burnout and productivity? Proceedings of the 13th international workshop on mining software repositories, ACM, pp 247–258
Mitchell T M (1997) Machine learning, 1st edn. McGraw-Hill Inc, New York
MATH Google Scholar
Murgia A, Tourani P, Adams B, Ortu M (2014) Do developers feel emotions? An exploratory analysis of emotions in software artifacts Proceedings of the working conference on mining software repositories (MSR). ACM, pp 262–271
Nagappan M, Zimmermann T, Bird C (2013) Diversity in software engineering research Proceedings of the 2013 9th joint meeting on foundations of software engineering, ACM, New York, ESEC/FSE 2013. doi:10.1145/2491411.2491415, pp 466–476
Ortu M, Adams B, Destefanis G, Tourani P, Marchesi M, Tonelli R (2015a) Are bullies more productive? Empirical study of affectiveness vs. issue fixing time Proceedings of the working conference on mining software repositories (MSR). Florence, Italy
Ortu M, Destefanis G, Kassab M, Counsell S, Marchesi M, Tonelli R (2015b) Would you mind fixing this issue? International conference on agile software development. Springer, pp 129–140
Ortu M, Destefanis G, Counsell S, Swift S, Tonelli R, Marchesi M (2016a) Arsonists or firefighters? Affectiveness in agile software development International conference on agile software development. Springer, pp 144–155
Ortu M, Murgia A, Destefanis G, Tourani P, Tonelli R, Marchesi M, Adams B (2016b) The emotional side of software developers in jira Proceedings of the 13th international conference on mining software repositories, ACM, New York, MSR ’16. doi:10.1145/2901739.2903505, pp 480–483
Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: Chair N C C, Choukri K, Maegaard B, Mariani J, Odijk J, Piperidis S, Rosner M, Tapias D (eds) Proceedings of the international conference on language resources and evaluation (LREC), European language resources association (ELRA). Valletta, Malta
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1-2):1–135
Article Google Scholar
Parrott W (2001) Emotions in social psychology. Psychology Press
Piller C (1999) Everyone is a critic in cyberspace. Los Angeles Times 3(12):A1
Google Scholar
Plutchik R (2001) The nature of emotions human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. Am Sci 89(4):344–350
Article Google Scholar
Rigby PC, Hassan AE (2007) What can OSS mailing lists tell us? a preliminary psychometric text analysis of the apache developer mailing list Proceedings of the working conference on mining software repositories (MSR), p 23
Robinson M D (2004) Personality as performance categorization tendencies and their correlates. Curr Dir Psychol Sci 13(3):127–129
Article Google Scholar
Sehgal V, Song C (2007) Sops: stock prediction using web sentiment Proceedings of the international conference on data mining workshops (ICDMW). IEEE Computer Society, Washington, DC, pp 21–26
semotion (2016) The first international workshop on emotion awareness in software engineering, ICSE 2016, Workshop, Austin, Texas (USA)
Shivhare S N, Khethawat S (2012) Emotion detection from text. Computer Science, Engineering and Applications
Strapparava C, Valitutti A, et al. (2004) Wordnet affect: an affective extension of wordnet LREC, vol 4, pp 1083–1086
Tepperman J, Traum D, Narayanan SS (2006) “Yeah right”: sarcasm recognition for spoken dialogue systems Proceedings of interspeech, pp 1838–1841
Tourani P, Adams B (2016) The impact of human discussions on just-in-time quality assurance Proceedings of the 23rd IEEE international conference on software analysis, evolution, and reengineering (SANER). Osaka, Japan, pp 189–200
Tourani P, Jiang Y, Adams B (2014) Monitoring sentiment in open source mailing lists — exploratory study on the apache ecosystem Proceedings of the 2014 conference of the center for advanced studies on collaborative research (CASCON). Toronto, ON, Canada, pp 34–44
Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann

Download references

Acknowledgements

This work was sponsored by (a) the Institute for the Promotion of Innovation through Science and Technology in Flanders by means of a project entitled Change-centric Quality Assurance (CHAQ) with number 120028, as well as (b) the Regione Autonoma della Sardegna (RAS), Regional Law No. 7-2007, project CRP-17938, “LEAN 2.0”.

Author information

Alessandro Murgia
Present address: , Middelheimlaan 1, 2020, Antwerpen, Belgium

Authors and Affiliations

University of Antwerp, Antwerp, Belgium
Alessandro Murgia & Serge Demeyer
University of Cagliari, Cagliari, Italy
Marco Ortu
MCIS, Polytechnique Montréal, Montréal, QC, Canada
Parastou Tourani & Bram Adams

Authors

Alessandro Murgia
View author publications
You can also search for this author in PubMed Google Scholar
Marco Ortu
View author publications
You can also search for this author in PubMed Google Scholar
Parastou Tourani
View author publications
You can also search for this author in PubMed Google Scholar
Bram Adams
View author publications
You can also search for this author in PubMed Google Scholar
Serge Demeyer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alessandro Murgia.

Additional information

Communicated by: Sunghun Kim

This work was sponsored by (1) the Institute for the Promotion of Innovation through Science and Technology in Flanders by means of aproject entitled Change-centric Quality Assurance (CHAQ) with number 120028, and (2) the Regione Autonoma della Sardegna (RAS), Regional Law No. 7-2007, project CRP-17938, “LEAN 2.0”

Appendices

Appendix A: Email Sent to Raters

To ensure that participants understand emotions, yet are not biased during the labeling process, we provide a minimal and dry training. Once they accepted to participate in an “ongoing experiment”, we sent them an email to clarify the goal of the experiment. The participants were not aware of how many other participants were involved in the experiment, nor about the underlying goals. All the experiments were carried out via Google Spreadsheets. Here follows the email we sent to participants.

Dear XXXXX,

We are performing an experiment on emotions in bug reports, and we would like you to participate in this experiment.

We have created a dataset containing bug report comments by real open source developers. Your task would be to label these comments using a mixture of 6 emotions: Love, Joy, Sadness, Fear, Anger or Surprise. If no emotion can be observed, then the comment automatically is labeled as Neutral.

Attached to this mail, you can find a document that describes the 6 emotions that we use for the experiment. Moreover, it provides some examples of emotion labeling. Please take a look.

Following the link: XXXXX

You get access to a spreadsheet with 2 pages:

> ExampleLabeling: describes an example on how to label text comments. If you think an emotion can be observed in the comment, there will be an x in the corresponding cell. Multiple cells can be selected if multiple emotions are present. Absence of any x means that that comment is Neutral. You have to label only the emotions in the comment reported in the red-highlighted column (Comment N). The other comments (Comment N-1, until Comment 1), if available, are the preceding comments of an issue report, meant to explain the context of Comment N.

> Round1-SpreadsheetX: this document contains the comments that you have to label in Round 1.

The deadline for the results of round 1 are due XXXXX. Thanks again for participating and for returning your results on time!

Appendix B: Analysis of Full Study Excluding the Authors

To assess the impact of the learning effect between the pilot study and the full study in the ratings made by the first four authors, this appendix analyzes the results of the full study by removing the ratings from the first four authors. As specified in Section 7, we re-analyze RQ1 and RQ2 using a dataset of 210 comments labeled by the 12 raters not involved in the pilot case (i.e., excluding the authors). To simplify the comparison, Table 19 maps the tables reported in the original case study results to the new ones.

Table 19 Mapping between appendix and case study tables

Full size table

Table 20 Percentage of agreement out of 210 comments (absolute number in parentheses) and Cohen κ values (with confidence intervals) for each emotion in RQ1 — Rater’s Agreement (full study)

Full size table

Table 21 Percentage of agreement out of 210 comments between four (2nd column) and at least three (3rd column) raters, together with the Fleiss κ inter-rater agreement and confidence intervals (4th to 6th column)

Full size table

Note that in Tables 20, 21 and 22 the confidence interval is ± 7% instead of the ± 5% used in Tables 9, 10 and 12. This is due to the fact that the sample used for the tables in this appendix only consists of 210 commits.

Table 22 Percentage of agreement and Cohen κ values (with confidence intervals) for comments without and with context (RQ2 — Context Influence)

Full size table

Table 23 The number of times raters changed their rating from the rating in a row (comment without context) to the one in a column (comment with context)

Full size table

Table 24 How often raters went from disagreement (d) to agreement (a) or vice versa when comparing the set of comments without context (rows) to the set of comments with context (columns), for groups A, B, and when combining both groups (at least three raters agreeing)

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Murgia, A., Ortu, M., Tourani, P. et al. An exploratory qualitative and quantitative analysis of emotions in issue report comments of open source systems. Empir Software Eng 23, 521–564 (2018). https://doi.org/10.1007/s10664-017-9526-0

Download citation

Published: 09 June 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s10664-017-9526-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An exploratory qualitative and quantitative analysis of emotions in issue report comments of open source systems

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

A review on sentiment analysis and emotion detection from text

The Turing test of online reviews: Can we tell the difference between human-written and GPT-4-written online reviews?

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Email Sent to Raters

Appendix B: Analysis of Full Study Excluding the Authors

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An exploratory qualitative and quantitative analysis of emotions in issue report comments of open source systems

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

A review on sentiment analysis and emotion detection from text

The Turing test of online reviews: Can we tell the difference between human-written and GPT-4-written online reviews?

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Email Sent to Raters

Appendix B: Analysis of Full Study Excluding the Authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation