Abstract
Software development—just like any other human collaboration—inevitably evokes emotions like joy or sadness, which are known to affect the group dynamics within a team. Today, little is known about those individual emotions and whether they can be discerned at all in the development artifacts produced during a project. This paper analyzes (a) whether issue reports—a common development artifact, rich in content—convey emotional information and (b) whether humans agree on the presence of these emotions. From the analysis of the issue comments of 117 projects of the Apache Software Foundation, we find that developers express emotions (in particular gratitude, joy and sadness). However, the more context is provided about an issue report, the more human raters start to doubt and nuance their interpretation. Based on these results, we demonstrate the feasibility of a machine learning classifier for identifying issue comments containing gratitude, joy and sadness. Such a classifier, using emotion-driving words and technical terms, obtains a good precision and recall for identifying the emotion love, while for joy and sadness a lower recall is obtained.
Similar content being viewed by others
Notes
In the pilot study, 4 raters were permuted to label 400 comments —200 comments per rater (cf. Section 4.2.1), while in the full study, 16 raters were permuted to label 392 comments —98 comment per rater (cf. Section 4.2.2). In all studies, each rater paired up with each other rater the same number of times.
Data set can be downloaded for replication purposes at a web-site hosted by the University of Antwerp: http://ansymore.uantwerpen.be/system/files/uploads/artefacts/alessandro/MSR16/archive3.zip.
References
Ahmed T, Srivastava A (2017) Understanding and evaluating the behavior of technical users. a study of developer interaction at stackoverflow. Human-centric Computing and Information Sciences 7(1):8
Amabile T M, Barsade S G, Mueller J S, Staw B M (2005) Affect and creativity at work. Adm Sci Q 50(3):367–403. doi:10.2307/30037208
Aman S, Szpakowicz S (2007) Identifying expressions of emotion in text 10th international conference on text, speech and dialogue (TSD). Springer, pp 196–205
Ambler S (2002) Agile modeling: effective practices for extreme programming and the unified process. Wiley, New York
Bacchelli A, Lanza M, Robbes R (2010) Linking e-mails and source code artifacts Proceedings of the international conference on software engineering (ICSE), pp 375–384
Bacchelli A, Sasso TD, D’Ambros M, Lanza M (2012) Content classification of development emails Proceedings of the international conference on software engineering (ICSE), pp 375–385
Balabantaray R, Mohammad M, Sharma N (2012) Multi-class twitter emotion classification: a new approach. International Journal of Applied Information Systems 4 (1):48–53
Bazelli B, Hindle A, Stroulia E (2013) On the personality traits of stackoverflow users International conference on software maintenance (ICSM). doi:10.1109/ICSM.2013.72, pp 460–463
Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. Journal of Computational Science 2(1):1–8
Brodkin J (2013) Linus torvalds defends his right to shame linux kernel developers. http://www.webcitation.org/6O2zErgzE
Brooks FP Jr (1987) No silver bullet essence and accidents of software engineering. Computer 20(4):10–19
Campbell DT, Stanley JC (1963) Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin
Cataldi M, Ballatore A, Tiddi I, Aufaure M A (2013) Good location, terrible food: detecting feature sentiment in user-generated reviews. Social Netw Analys Mining 3(4):1149–1163
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46
Das S R, Chen M Y (2007) Yahoo! for amazon: sentiment extraction from small talk on the web. Manag Sci 53(9):1375–1388. http://EconPapers.repec.org/RePEc:inm:ormnsc:v:53:y:2007:i:9:p:1375-1388
De Choudhury M, Counts S (2013) Understanding affect in the workplace via social media Proceedings of the conference on computer supported cooperative work. ACM, New York. doi:10.1145/2441776.2441812, pp 303–316
DeMarco T, Lister T (1999) Peopleware: productive projects and teams, 2nd edn. Dorset House Publishing Co. Inc, New York
Destefanis G, Marco O, Steve C, Steve S, Michele M, Roberto T (2016) Software development: do good manners matter? PeerJ Comp Sci 2:e73. doi:10.7717/peerj-cs.73
Elfenbein H A, Ambady N (2002) On the universality and cultural specificity of emotion recognition: a meta-analysis. Psychol Bull 128(2):203
Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82–89. doi:10.1145/2436256.2436274
Fleiss J L (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378
Fowler J H, Christakis N A (2008) Dynamic spread of happiness in a large social network: longitudinal analysis over 20 years in the framingham heart study. BMJ 337. doi:10.1136/bmj.a2338
Fredrickson B L (2001) The role of positive emotions in positive psychology: the broaden-and-build theory of positive emotions. Am Psychol 56(3):218
Fritz T, Müller S (2016) Leveraging biometric data to boost software developer productivity International conference on software analysis, evolution and reengeneering (future of software engineering track), s.n
Gold J (2015) A prominent linux kernel developer is stepping down from her direct work in the kernel community. http://www.networkworld.com/article/2988850/opensource-subnet/linux-kernel-dev-sarah-sharp-quits-citing-brutal-communications-style.html
Graziotin D, Wang X, Abrahamsson P (2014) Happy software developers solve problems better: psychological measurements in empirical software engineering. PeerJ e289. doi:10.7717/peerj.289
Guillory J, Spiegel J, Drislane M, Weiss B, Donner W, Hancock J (2011) Upset now?: emotion contagion in distributed groups Proceedings of the conference on human factors in computing systems (CHI), pp 745–748
Guzman E, Bruegge B (2013) Towards emotional awareness in software development teams Proceedings of the joint meeting on foundations of software engineering (ESEC/FSE), pp 671–674
Guzman E, Azócar D, Li Y (2014) Sentiment analysis of commit comments in github: an empirical study Proceedings of the working conference on mining software repositories (MSR). ACM, New York, MSR 2014. doi:10.1145/2597073.2597118, pp 352–355
Guzzi A, Bacchelli A, Lanza M, Pinzger M, van Deursen A (2013) Communication in open source software development mailing lists Proceedings of the working conference on mining software repositories (MSR), pp 277–286
Hancock JT, Gee K, Ciaccio K, Lin JMH (2008) I’m sad you’re sad: emotional contagion in CMC Proceedings of the 2008 ACM conference on computer supported cooperative work (CSCW), pp 295–298
Heritage Dictionary A (2005) The american heritage science dictionary. http://dictionary.reference.com/browse/
Hu M, Liu B (2004) Mining and summarizing customer reviews Proceedings of the international conference on knowledge discovery and data mining, ACM, New York, KDD ’04. doi:10.1145/1014052.1014073, pp 168–177
Jongeling R, Datta S, Serebrenik A (2015) (2015) Choosing Your weapons: On sentiment analysis tools for software engineering research IEEE international conference on software maintenance and evolution (ICSME)
Mäntylä M, Adams B, Destefanis G, Graziotin D, Ortu M (2016) Mining valence, arousal, and dominance: possibilities for detecting burnout and productivity? Proceedings of the 13th international workshop on mining software repositories, ACM, pp 247–258
Mitchell T M (1997) Machine learning, 1st edn. McGraw-Hill Inc, New York
Murgia A, Tourani P, Adams B, Ortu M (2014) Do developers feel emotions? An exploratory analysis of emotions in software artifacts Proceedings of the working conference on mining software repositories (MSR). ACM, pp 262–271
Nagappan M, Zimmermann T, Bird C (2013) Diversity in software engineering research Proceedings of the 2013 9th joint meeting on foundations of software engineering, ACM, New York, ESEC/FSE 2013. doi:10.1145/2491411.2491415, pp 466–476
Ortu M, Adams B, Destefanis G, Tourani P, Marchesi M, Tonelli R (2015a) Are bullies more productive? Empirical study of affectiveness vs. issue fixing time Proceedings of the working conference on mining software repositories (MSR). Florence, Italy
Ortu M, Destefanis G, Kassab M, Counsell S, Marchesi M, Tonelli R (2015b) Would you mind fixing this issue? International conference on agile software development. Springer, pp 129–140
Ortu M, Destefanis G, Counsell S, Swift S, Tonelli R, Marchesi M (2016a) Arsonists or firefighters? Affectiveness in agile software development International conference on agile software development. Springer, pp 144–155
Ortu M, Murgia A, Destefanis G, Tourani P, Tonelli R, Marchesi M, Adams B (2016b) The emotional side of software developers in jira Proceedings of the 13th international conference on mining software repositories, ACM, New York, MSR ’16. doi:10.1145/2901739.2903505, pp 480–483
Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: Chair N C C, Choukri K, Maegaard B, Mariani J, Odijk J, Piperidis S, Rosner M, Tapias D (eds) Proceedings of the international conference on language resources and evaluation (LREC), European language resources association (ELRA). Valletta, Malta
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1-2):1–135
Parrott W (2001) Emotions in social psychology. Psychology Press
Piller C (1999) Everyone is a critic in cyberspace. Los Angeles Times 3(12):A1
Plutchik R (2001) The nature of emotions human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. Am Sci 89(4):344–350
Rigby PC, Hassan AE (2007) What can OSS mailing lists tell us? a preliminary psychometric text analysis of the apache developer mailing list Proceedings of the working conference on mining software repositories (MSR), p 23
Robinson M D (2004) Personality as performance categorization tendencies and their correlates. Curr Dir Psychol Sci 13(3):127–129
Sehgal V, Song C (2007) Sops: stock prediction using web sentiment Proceedings of the international conference on data mining workshops (ICDMW). IEEE Computer Society, Washington, DC, pp 21–26
semotion (2016) The first international workshop on emotion awareness in software engineering, ICSE 2016, Workshop, Austin, Texas (USA)
Shivhare S N, Khethawat S (2012) Emotion detection from text. Computer Science, Engineering and Applications
Strapparava C, Valitutti A, et al. (2004) Wordnet affect: an affective extension of wordnet LREC, vol 4, pp 1083–1086
Tepperman J, Traum D, Narayanan SS (2006) “Yeah right”: sarcasm recognition for spoken dialogue systems Proceedings of interspeech, pp 1838–1841
Tourani P, Adams B (2016) The impact of human discussions on just-in-time quality assurance Proceedings of the 23rd IEEE international conference on software analysis, evolution, and reengineering (SANER). Osaka, Japan, pp 189–200
Tourani P, Jiang Y, Adams B (2014) Monitoring sentiment in open source mailing lists — exploratory study on the apache ecosystem Proceedings of the 2014 conference of the center for advanced studies on collaborative research (CASCON). Toronto, ON, Canada, pp 34–44
Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann
Acknowledgements
This work was sponsored by (a) the Institute for the Promotion of Innovation through Science and Technology in Flanders by means of a project entitled Change-centric Quality Assurance (CHAQ) with number 120028, as well as (b) the Regione Autonoma della Sardegna (RAS), Regional Law No. 7-2007, project CRP-17938, “LEAN 2.0”.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Sunghun Kim
This work was sponsored by (1) the Institute for the Promotion of Innovation through Science and Technology in Flanders by means of aproject entitled Change-centric Quality Assurance (CHAQ) with number 120028, and (2) the Regione Autonoma della Sardegna (RAS), Regional Law No. 7-2007, project CRP-17938, “LEAN 2.0”
Appendices
Appendix A: Email Sent to Raters
To ensure that participants understand emotions, yet are not biased during the labeling process, we provide a minimal and dry training. Once they accepted to participate in an “ongoing experiment”, we sent them an email to clarify the goal of the experiment. The participants were not aware of how many other participants were involved in the experiment, nor about the underlying goals. All the experiments were carried out via Google Spreadsheets. Here follows the email we sent to participants.
Dear XXXXX,
We are performing an experiment on emotions in bug reports, and we would like you to participate in this experiment.
We have created a dataset containing bug report comments by real open source developers. Your task would be to label these comments using a mixture of 6 emotions: Love, Joy, Sadness, Fear, Anger or Surprise. If no emotion can be observed, then the comment automatically is labeled as Neutral.
Attached to this mail, you can find a document that describes the 6 emotions that we use for the experiment. Moreover, it provides some examples of emotion labeling. Please take a look.
Following the link: XXXXX
You get access to a spreadsheet with 2 pages:
> ExampleLabeling: describes an example on how to label text comments. If you think an emotion can be observed in the comment, there will be an x in the corresponding cell. Multiple cells can be selected if multiple emotions are present. Absence of any x means that that comment is Neutral. You have to label only the emotions in the comment reported in the red-highlighted column (Comment N). The other comments (Comment N-1, until Comment 1), if available, are the preceding comments of an issue report, meant to explain the context of Comment N.
> Round1-SpreadsheetX: this document contains the comments that you have to label in Round 1.
The deadline for the results of round 1 are due XXXXX. Thanks again for participating and for returning your results on time!
Appendix B: Analysis of Full Study Excluding the Authors
To assess the impact of the learning effect between the pilot study and the full study in the ratings made by the first four authors, this appendix analyzes the results of the full study by removing the ratings from the first four authors. As specified in Section 7, we re-analyze RQ1 and RQ2 using a dataset of 210 comments labeled by the 12 raters not involved in the pilot case (i.e., excluding the authors). To simplify the comparison, Table 19 maps the tables reported in the original case study results to the new ones.
Note that in Tables 20, 21 and 22 the confidence interval is ± 7% instead of the ± 5% used in Tables 9, 10 and 12. This is due to the fact that the sample used for the tables in this appendix only consists of 210 commits.
Rights and permissions
About this article
Cite this article
Murgia, A., Ortu, M., Tourani, P. et al. An exploratory qualitative and quantitative analysis of emotions in issue report comments of open source systems. Empir Software Eng 23, 521–564 (2018). https://doi.org/10.1007/s10664-017-9526-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-017-9526-0