Skip to main content
Log in

An exploratory qualitative and quantitative analysis of emotions in issue report comments of open source systems

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Software development—just like any other human collaboration—inevitably evokes emotions like joy or sadness, which are known to affect the group dynamics within a team. Today, little is known about those individual emotions and whether they can be discerned at all in the development artifacts produced during a project. This paper analyzes (a) whether issue reports—a common development artifact, rich in content—convey emotional information and (b) whether humans agree on the presence of these emotions. From the analysis of the issue comments of 117 projects of the Apache Software Foundation, we find that developers express emotions (in particular gratitude, joy and sadness). However, the more context is provided about an issue report, the more human raters start to doubt and nuance their interpretation. Based on these results, we demonstrate the feasibility of a machine learning classifier for identifying issue comments containing gratitude, joy and sadness. Such a classifier, using emotion-driving words and technical terms, obtains a good precision and recall for identifying the emotion love, while for joy and sadness a lower recall is obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. In the pilot study, 4 raters were permuted to label 400 comments —200 comments per rater (cf. Section 4.2.1), while in the full study, 16 raters were permuted to label 392 comments —98 comment per rater (cf. Section 4.2.2). In all studies, each rater paired up with each other rater the same number of times.

  2. Data set can be downloaded for replication purposes at a web-site hosted by the University of Antwerp: http://ansymore.uantwerpen.be/system/files/uploads/artefacts/alessandro/MSR16/archive3.zip.

References

  • Ahmed T, Srivastava A (2017) Understanding and evaluating the behavior of technical users. a study of developer interaction at stackoverflow. Human-centric Computing and Information Sciences 7(1):8

    Article  Google Scholar 

  • Amabile T M, Barsade S G, Mueller J S, Staw B M (2005) Affect and creativity at work. Adm Sci Q 50(3):367–403. doi:10.2307/30037208

    Article  Google Scholar 

  • Aman S, Szpakowicz S (2007) Identifying expressions of emotion in text 10th international conference on text, speech and dialogue (TSD). Springer, pp 196–205

    Google Scholar 

  • Ambler S (2002) Agile modeling: effective practices for extreme programming and the unified process. Wiley, New York

    Google Scholar 

  • Bacchelli A, Lanza M, Robbes R (2010) Linking e-mails and source code artifacts Proceedings of the international conference on software engineering (ICSE), pp 375–384

    Google Scholar 

  • Bacchelli A, Sasso TD, D’Ambros M, Lanza M (2012) Content classification of development emails Proceedings of the international conference on software engineering (ICSE), pp 375–385

    Google Scholar 

  • Balabantaray R, Mohammad M, Sharma N (2012) Multi-class twitter emotion classification: a new approach. International Journal of Applied Information Systems 4 (1):48–53

    Article  Google Scholar 

  • Bazelli B, Hindle A, Stroulia E (2013) On the personality traits of stackoverflow users International conference on software maintenance (ICSM). doi:10.1109/ICSM.2013.72, pp 460–463

    Google Scholar 

  • Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. Journal of Computational Science 2(1):1–8

    Article  Google Scholar 

  • Brodkin J (2013) Linus torvalds defends his right to shame linux kernel developers. http://www.webcitation.org/6O2zErgzE

  • Brooks FP Jr (1987) No silver bullet essence and accidents of software engineering. Computer 20(4):10–19

    Article  Google Scholar 

  • Campbell DT, Stanley JC (1963) Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin

  • Cataldi M, Ballatore A, Tiddi I, Aufaure M A (2013) Good location, terrible food: detecting feature sentiment in user-generated reviews. Social Netw Analys Mining 3(4):1149–1163

    Article  Google Scholar 

  • Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46

    Article  MathSciNet  Google Scholar 

  • Das S R, Chen M Y (2007) Yahoo! for amazon: sentiment extraction from small talk on the web. Manag Sci 53(9):1375–1388. http://EconPapers.repec.org/RePEc:inm:ormnsc:v:53:y:2007:i:9:p:1375-1388

    Article  Google Scholar 

  • De Choudhury M, Counts S (2013) Understanding affect in the workplace via social media Proceedings of the conference on computer supported cooperative work. ACM, New York. doi:10.1145/2441776.2441812, pp 303–316

    Google Scholar 

  • DeMarco T, Lister T (1999) Peopleware: productive projects and teams, 2nd edn. Dorset House Publishing Co. Inc, New York

    Google Scholar 

  • Destefanis G, Marco O, Steve C, Steve S, Michele M, Roberto T (2016) Software development: do good manners matter? PeerJ Comp Sci 2:e73. doi:10.7717/peerj-cs.73

  • Elfenbein H A, Ambady N (2002) On the universality and cultural specificity of emotion recognition: a meta-analysis. Psychol Bull 128(2):203

    Article  Google Scholar 

  • Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82–89. doi:10.1145/2436256.2436274

    Article  Google Scholar 

  • Fleiss J L (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378

    Article  Google Scholar 

  • Fowler J H, Christakis N A (2008) Dynamic spread of happiness in a large social network: longitudinal analysis over 20 years in the framingham heart study. BMJ 337. doi:10.1136/bmj.a2338

  • Fredrickson B L (2001) The role of positive emotions in positive psychology: the broaden-and-build theory of positive emotions. Am Psychol 56(3):218

    Article  Google Scholar 

  • Fritz T, Müller S (2016) Leveraging biometric data to boost software developer productivity International conference on software analysis, evolution and reengeneering (future of software engineering track), s.n

    Google Scholar 

  • Gold J (2015) A prominent linux kernel developer is stepping down from her direct work in the kernel community. http://www.networkworld.com/article/2988850/opensource-subnet/linux-kernel-dev-sarah-sharp-quits-citing-brutal-communications-style.html

  • Graziotin D, Wang X, Abrahamsson P (2014) Happy software developers solve problems better: psychological measurements in empirical software engineering. PeerJ e289. doi:10.7717/peerj.289

  • Guillory J, Spiegel J, Drislane M, Weiss B, Donner W, Hancock J (2011) Upset now?: emotion contagion in distributed groups Proceedings of the conference on human factors in computing systems (CHI), pp 745–748

    Google Scholar 

  • Guzman E, Bruegge B (2013) Towards emotional awareness in software development teams Proceedings of the joint meeting on foundations of software engineering (ESEC/FSE), pp 671–674

    Google Scholar 

  • Guzman E, Azócar D, Li Y (2014) Sentiment analysis of commit comments in github: an empirical study Proceedings of the working conference on mining software repositories (MSR). ACM, New York, MSR 2014. doi:10.1145/2597073.2597118, pp 352–355

  • Guzzi A, Bacchelli A, Lanza M, Pinzger M, van Deursen A (2013) Communication in open source software development mailing lists Proceedings of the working conference on mining software repositories (MSR), pp 277–286

  • Hancock JT, Gee K, Ciaccio K, Lin JMH (2008) I’m sad you’re sad: emotional contagion in CMC Proceedings of the 2008 ACM conference on computer supported cooperative work (CSCW), pp 295–298

  • Heritage Dictionary A (2005) The american heritage science dictionary. http://dictionary.reference.com/browse/

  • Hu M, Liu B (2004) Mining and summarizing customer reviews Proceedings of the international conference on knowledge discovery and data mining, ACM, New York, KDD ’04. doi:10.1145/1014052.1014073, pp 168–177

  • Jongeling R, Datta S, Serebrenik A (2015) (2015) Choosing Your weapons: On sentiment analysis tools for software engineering research IEEE international conference on software maintenance and evolution (ICSME)

  • Mäntylä M, Adams B, Destefanis G, Graziotin D, Ortu M (2016) Mining valence, arousal, and dominance: possibilities for detecting burnout and productivity? Proceedings of the 13th international workshop on mining software repositories, ACM, pp 247–258

  • Mitchell T M (1997) Machine learning, 1st edn. McGraw-Hill Inc, New York

    MATH  Google Scholar 

  • Murgia A, Tourani P, Adams B, Ortu M (2014) Do developers feel emotions? An exploratory analysis of emotions in software artifacts Proceedings of the working conference on mining software repositories (MSR). ACM, pp 262–271

  • Nagappan M, Zimmermann T, Bird C (2013) Diversity in software engineering research Proceedings of the 2013 9th joint meeting on foundations of software engineering, ACM, New York, ESEC/FSE 2013. doi:10.1145/2491411.2491415, pp 466–476

  • Ortu M, Adams B, Destefanis G, Tourani P, Marchesi M, Tonelli R (2015a) Are bullies more productive? Empirical study of affectiveness vs. issue fixing time Proceedings of the working conference on mining software repositories (MSR). Florence, Italy

  • Ortu M, Destefanis G, Kassab M, Counsell S, Marchesi M, Tonelli R (2015b) Would you mind fixing this issue? International conference on agile software development. Springer, pp 129–140

  • Ortu M, Destefanis G, Counsell S, Swift S, Tonelli R, Marchesi M (2016a) Arsonists or firefighters? Affectiveness in agile software development International conference on agile software development. Springer, pp 144–155

  • Ortu M, Murgia A, Destefanis G, Tourani P, Tonelli R, Marchesi M, Adams B (2016b) The emotional side of software developers in jira Proceedings of the 13th international conference on mining software repositories, ACM, New York, MSR ’16. doi:10.1145/2901739.2903505, pp 480–483

  • Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: Chair N C C, Choukri K, Maegaard B, Mariani J, Odijk J, Piperidis S, Rosner M, Tapias D (eds) Proceedings of the international conference on language resources and evaluation (LREC), European language resources association (ELRA). Valletta, Malta

  • Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1-2):1–135

    Article  Google Scholar 

  • Parrott W (2001) Emotions in social psychology. Psychology Press

  • Piller C (1999) Everyone is a critic in cyberspace. Los Angeles Times 3(12):A1

    Google Scholar 

  • Plutchik R (2001) The nature of emotions human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. Am Sci 89(4):344–350

    Article  Google Scholar 

  • Rigby PC, Hassan AE (2007) What can OSS mailing lists tell us? a preliminary psychometric text analysis of the apache developer mailing list Proceedings of the working conference on mining software repositories (MSR), p 23

  • Robinson M D (2004) Personality as performance categorization tendencies and their correlates. Curr Dir Psychol Sci 13(3):127–129

    Article  Google Scholar 

  • Sehgal V, Song C (2007) Sops: stock prediction using web sentiment Proceedings of the international conference on data mining workshops (ICDMW). IEEE Computer Society, Washington, DC, pp 21–26

  • semotion (2016) The first international workshop on emotion awareness in software engineering, ICSE 2016, Workshop, Austin, Texas (USA)

  • Shivhare S N, Khethawat S (2012) Emotion detection from text. Computer Science, Engineering and Applications

  • Strapparava C, Valitutti A, et al. (2004) Wordnet affect: an affective extension of wordnet LREC, vol 4, pp 1083–1086

  • Tepperman J, Traum D, Narayanan SS (2006) “Yeah right”: sarcasm recognition for spoken dialogue systems Proceedings of interspeech, pp 1838–1841

  • Tourani P, Adams B (2016) The impact of human discussions on just-in-time quality assurance Proceedings of the 23rd IEEE international conference on software analysis, evolution, and reengineering (SANER). Osaka, Japan, pp 189–200

  • Tourani P, Jiang Y, Adams B (2014) Monitoring sentiment in open source mailing lists — exploratory study on the apache ecosystem Proceedings of the 2014 conference of the center for advanced studies on collaborative research (CASCON). Toronto, ON, Canada, pp 34–44

  • Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann

Download references

Acknowledgements

This work was sponsored by (a) the Institute for the Promotion of Innovation through Science and Technology in Flanders by means of a project entitled Change-centric Quality Assurance (CHAQ) with number 120028, as well as (b) the Regione Autonoma della Sardegna (RAS), Regional Law No. 7-2007, project CRP-17938, “LEAN 2.0”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alessandro Murgia.

Additional information

Communicated by: Sunghun Kim

This work was sponsored by (1) the Institute for the Promotion of Innovation through Science and Technology in Flanders by means of aproject entitled Change-centric Quality Assurance (CHAQ) with number 120028, and (2) the Regione Autonoma della Sardegna (RAS), Regional Law No. 7-2007, project CRP-17938, “LEAN 2.0”

Appendices

Appendix A: Email Sent to Raters

To ensure that participants understand emotions, yet are not biased during the labeling process, we provide a minimal and dry training. Once they accepted to participate in an “ongoing experiment”, we sent them an email to clarify the goal of the experiment. The participants were not aware of how many other participants were involved in the experiment, nor about the underlying goals. All the experiments were carried out via Google Spreadsheets. Here follows the email we sent to participants.

Dear XXXXX,

We are performing an experiment on emotions in bug reports, and we would like you to participate in this experiment.

We have created a dataset containing bug report comments by real open source developers. Your task would be to label these comments using a mixture of 6 emotions: Love, Joy, Sadness, Fear, Anger or Surprise. If no emotion can be observed, then the comment automatically is labeled as Neutral.

Attached to this mail, you can find a document that describes the 6 emotions that we use for the experiment. Moreover, it provides some examples of emotion labeling. Please take a look.

Following the link: XXXXX

You get access to a spreadsheet with 2 pages:

> ExampleLabeling: describes an example on how to label text comments. If you think an emotion can be observed in the comment, there will be an x in the corresponding cell. Multiple cells can be selected if multiple emotions are present. Absence of any x means that that comment is Neutral. You have to label only the emotions in the comment reported in the red-highlighted column (Comment N). The other comments (Comment N-1, until Comment 1), if available, are the preceding comments of an issue report, meant to explain the context of Comment N.

> Round1-SpreadsheetX: this document contains the comments that you have to label in Round 1.

The deadline for the results of round 1 are due XXXXX. Thanks again for participating and for returning your results on time!

Appendix B: Analysis of Full Study Excluding the Authors

To assess the impact of the learning effect between the pilot study and the full study in the ratings made by the first four authors, this appendix analyzes the results of the full study by removing the ratings from the first four authors. As specified in Section 7, we re-analyze RQ1 and RQ2 using a dataset of 210 comments labeled by the 12 raters not involved in the pilot case (i.e., excluding the authors). To simplify the comparison, Table 19 maps the tables reported in the original case study results to the new ones.

Table 19 Mapping between appendix and case study tables
Table 20 Percentage of agreement out of 210 comments (absolute number in parentheses) and Cohen κ values (with confidence intervals) for each emotion in RQ1 — Rater’s Agreement (full study)
Table 21 Percentage of agreement out of 210 comments between four (2nd column) and at least three (3rd column) raters, together with the Fleiss κ inter-rater agreement and confidence intervals (4th to 6th column)

Note that in Tables 2021 and 22 the confidence interval is ± 7% instead of the ± 5% used in Tables 910 and 12. This is due to the fact that the sample used for the tables in this appendix only consists of 210 commits.

Table 22 Percentage of agreement and Cohen κ values (with confidence intervals) for comments without and with context (RQ2 — Context Influence)
Table 23 The number of times raters changed their rating from the rating in a row (comment without context) to the one in a column (comment with context)
Table 24 How often raters went from disagreement (d) to agreement (a) or vice versa when comparing the set of comments without context (rows) to the set of comments with context (columns), for groups A, B, and when combining both groups (at least three raters agreeing)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Murgia, A., Ortu, M., Tourani, P. et al. An exploratory qualitative and quantitative analysis of emotions in issue report comments of open source systems. Empir Software Eng 23, 521–564 (2018). https://doi.org/10.1007/s10664-017-9526-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-017-9526-0

Keywords

Navigation