Abstract
Technical collaboration between multiple contributors is a natural phenomenon in distributed open source software development projects. Macro-collaboration, where each code commit is attributed to a single collaborator, has been extensively studied in the research literature. This is much less the case for so-called micro-collaboration practices, in which multiple authors contribute to the same commit. To support such practices, GitLab and GitHub started supporting social coding mechanisms such as the “Co-Authored-By:” trailers in commit messages, which, in turn, enable to empirically study such micro-collaboration. In order to understand the mechanisms, benefits and limitations of micro-collaboration, this article provides an exemplar case study of collaboration practices in the OpenStack ecosystem. Following a mixed-method research approach we provide qualitative evidence through a thematic and content analysis of semi-structured interviews with 16 OpenStack contributors. We contrast their perception with quantitative evidence gained by statistical analysis of the git commit histories (\(\sim \)1M commits) and Gerrit code review histories (\(\sim \)631K change sets and \(\sim \)2M patch sets) of 1,804 OpenStack project repositories over a 9-year period. Our findings provide novel empirical insights to practitioners to promote micro-collaborative coding practices, and to academics to conduct further research towards understanding and automating the micro-collaboration process.













Similar content being viewed by others
Notes
The replication package can be found on Zenodo: https://doi.org/10.5281/zenodo.5759968
There are identity pairs in \(\mathcal {P}\) that contain the same name-email pairs, but the username information only appears in one of them. If we account identities in \(\mathcal {P}\) as name-email pairs, the number of identities corresponds to 18,081.
According to Hess and Kromrey (2004) we interpret effect size as negligible (d < 0.147), small (0.147 ≤ d < 0.33), medium (0.33 ≤ d < 0.474) or large (d ≥ 0.474).
References
Al-Subaihin A A, Sarro F, Black S, Capra L, Harman M (2021) App store effects on software engineering practices. Trans Softw Eng 47(2):300–319
An L, Khomh F, Guéhéneuc Y-G (2018) An empirical study of crash-inducing commits in Mozilla Firefox. Softw Qual J 26:553–584
Arya D, Wang W, Guo J L C, Cheng J (2019) Analysis and detection of information types of open source software issue discussions. In: International conference on software engineering, pp 454–464
Avelino G, Passos L, Hora A, Valente M T (2017) Assessing code authorship: The case of the Linux kernel. In: Open source systems: towards robust practices. Springer, pp 151–163
Bagozzi R P, Yi Y (2012) Specification, evaluation, and interpretation of structural equation models. Journal of the Academy of Marketing Science 40(1):8–34
Beran T N, Violato C (2010) Structural equation modeling in medical research: a primer. BMC Research Notes 3(1):1–10
Bernard H R, Wutich A, Ryan G W (2016) Analyzing qualitative data: Systematic approaches. SAGE publications, Thousand Oaks
Bick S, Spohrer K, Hoda R, Scheerer A, Heinzl A (2018) Coordination challenges in large-scale software development: A case study of planning misalignment in hybrid settings. Trans Softw Eng 44(10):932–950
Bird C (2016) Interviews. In: Perspectives on data science for software engineering. Morgan Kaufmann, Burlington
Bird C, Gourley A, Devanbu P, Gertz M, Swaminathan A (2006) Mining email social networks. In: International working conference on mining software repositories. ACM, pp 137–143
Bird C, Zimmermann T (2012) Assessing the value of branches with what-if analysis. In: International symposium on foundations of software engineering. ACM SIGSOFT
Bogart C, Kästner C, Herbsleb J, Thung F (2021) When and how to make breaking changes: Policies and practices in 18 open source software ecosystems. Trans Softw Eng Methodol, 30(4)
Borg M, Svensson O, Berg K, Hansson D (2019) SZZ unleashed: An open implementation of the SZZ algorithm. In: MaLTeSQuE. ACM, pp 7–12
Brooks FP Jr (1974) The mythical man-month. Addison-Wesley Reading, United States
Campbell J L, Quincy C, Osserman J, Pedersen O K (2013) Coding in-depth semistructured interviews. Sociological Methods & Research 42 (3):294–320
Casalnuovo C, Vasilescu B, Devanbu P, Filkov V (2015) Developer onboarding in GitHub: The role of prior social links and language experience. In: Joint Meeting on ESEC and FSE. ACM, pp 817–828
Cassee N, Kitsanelis C, Constantinou E, Serebrenik A (2021) Human, bot or both? a study on the capabilities of classification models on mixed accounts. In: 2021 IEEE international conference on software maintenance and evolution (ICSME), pp 654–658
Cliff N (1993) Dominance statistics: Ordinal analyses to answer ordinal questions. Psychol Bull 114(3):4994–509
Costa C, Figueiredo J, Murta L, Sarma A (2016) TIPMerge: recommending experts for integrating changes across branches. In: International symposium on foundations of software engineering, pp 523–534
Cruzes DS, Dyba T (2011) Recommended steps for thematic synthesis in software engineering. In: International symposium on empirical software engineering and measurement, pp 275–284
D’Ambros M, Lanza M, Robbes R (2012) Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir Softw Eng 17(4):531–577
Datta S (June 2018) How does developer interaction relate to software quality? An examination of product development data. Empir Softw Eng 23 (3):1153–1187
de Souza Costa C, Figueiredo J J, Pimentel J F, Sarma A, Murta L G P (2019) Recommending participants for collaborative merge sessions. Trans Softw Eng, 1–1
Dingsøyr T, Moe N B, Fægri T E, Seim E A (2018) Exploring software development at the very large-scale: A revelatory case study and research agenda for agile method adaptation. Empir Softw Eng 23:490–520
DiStaso M W, Bortree D S (2012) Multi-method analysis of transparency in social media practices: Survey, interviews and content analysis. Public Relat Rev 38(3):511–514
Egelman C D, Murphy-Hill E, Kammer E, Hodges M M, Green C, Jaspan C, Lin J (2020) Predicting developers’ negative feelings about code review. In: International conference on software engineering. IEEE, pp 174–185
Fan Y, Xia X, Alencar da Costa D, Lo D, Hassan A E, Li S (2019) The impact of changes mislabeled by SZZ on just-in-time defect prediction. Trans Softw Eng, 26
Forsgren N, Storey M-A, Maddila C, Zimmermann T, Houck B, Butler J (2021) The SPACE of developer productivity: There’s more to it than you think. Queue 19(1):20–48
Foundjem A, Adams B (2021) Release synchronization in software ecosystems. Empir Softw Eng. 26(34)
Foundjem A, Constantinou E, Mens T, Adams B (2021a) Replication package — V2.0.0). https://doi.org/10.5281/zenodo.5759968https://doi.org/10.5281/zenodo.5759968, Online
Foundjem A, Eghan E E, Adams B (2021b) Onboarding vs. diversity, productivity and quality – empirical study of the OpenStack ecosystem. In: International conference on software engineering, pp 1033–1045
Fusch P, Ness L (2015) Are we there yet? data saturation in qualitative research, Qual Rep
Fusch P I, Ness L R (2015) Are we there yet? data saturation in qualitative research. The qualitative report 20(9):1408
Garson G D (2013) Path analysis. Statistical Associates Publishing Asheboro, NC
German D M, Adams B, Hassan A E (2016) Continuously mining distributed version control systems: An empirical study of how Linux uses git. Empir Softw Eng 21(1):260–299
Ghaiumy Anaraky R, Li Y, Knijnenburg B (2021) Difficulties of measuring culture in privacy studies. Proc ACM Hum.-Comput Interact, 5(CSCW2). [Online]. Available: https://doi.org/10.1145/3479522
Goeminne M, Mens T (2013) A comparison of identity merge algorithms for software repositories. Sci Comput Program 78:971–986
Golzadeh M, Decan A, Constantinou E, Mens T (2021) Identifying bot activity in github pull request and issue comments. In: 2021 IEEE/ACM third international workshop on bots in software engineering (BotSE), pp 21–25
Gopal A, Mukhopadhyay T, Krishnan M S (2005) The impact of institutional forces on software metrics programs. Trans Softw Eng 31(8):679–694
Guest G, Bunce A, Johnson L (2006) How many interviews are enough? an experiment with data saturation and variability. Field Methods 18 (1):59–82
Henley A Z, Muçlu K, Christakis M, Fleming S D, Bird C (2018) CFar: A tool to increase communication, productivity, and review quality in collaborative code reviews. In: CHI. ACM, pp 1–13
Hess M, Kromrey J (2004) Robust confidence intervals for effect sizes: A comparative study of Cohen’s d and Cliff’s delta under non-normality and heterogeneous variances. AERA, 1–30
Himmelsbach J, Schwarz S, Gerdenitsch C, Wais-Zechmann B, Bobeth J, Tscheligi M (2019) Do we care about diversity in human computer interaction. In: International conference on human factors in computing systems. ACM, pp 1–16
Igolkina A A, Meshcheryakov G (2020) semopy: A python package for structural equation modeling. Structural Equation Modeling: A Multidisciplinary Journal 27(6):952–963. [Online]. Available: https://doi.org/10.1080/10705511.2019.1704289
Islam M S, Khreich W, Hamou-Lhadj A (2018) Anomaly detection techniques based on kappa-pruned ensembles. IEEE Trans Reliab 67(1):212–229
Izquierdo-Cortazar D, Sekitoleko N, Gonzalez-Barahona JM, Kurth L (2017) Using metrics to track code review performance. In: Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering, EASE’17. [Online]. Available: https://doi.org/10.1145/3084226.3084247. Association for Computing Machinery, New York, pp 214–223
Johnson D R, Creech J C (1983) Ordinal measures in multiple indicator models: A simulation study of categorization error. Am Sociol Rev, 398–407
Kalliamvakou E, Damian D, Blincoe K, Singer L, German D M (2015) Open source-style collaborative development practices in commercial projects using GitHub. In: International conference on software engineering. IEEE, pp 574–585
Kang H, Ahn J-W (2021) Model setting and interpretation of results in research using structural equation modeling: A checklist with guiding questions for reporting. Asian Nursing Research 15(3):157–162
Kim M, Zimmermann T, DeLine R, Begel A (2016) The emerging role of data scientists on software development teams. In: 2016 IEEE/ACM 38th international conference on software engineering (ICSE). IEEE, pp 96–107
Klem L (2000) Structural equation modeling
Kononenko O, Baysal O, Godfrey M W (2016) Code review quality: How developers see it. In: Proceedings of the 38th international conference on software engineering, ICSE ’16. [Online]. Available: https://doi.org/10.1145/2884781.2884840. Association for Computing Machinery, New York, pp 1028–1038
Kovalenko V, Bacchelli A (2018) Code review for newcomers: Is it different?. In: Proceedings of the 11th international workshop on cooperative and human aspects of software engineering, CHASE ’18. [Online]. Available: https://doi.org/10.1145/3195836.3195842. Association for Computing Machinery, New York, pp 29–32
Krusche S, Berisha M, Bruegge B (2016) Teaching code review management using branch based workflows. In: International conference on software engineering. ACM, pp 384–393
Landis J R, Koch G G (1977) The measurement of observer agreement for categorical data. Biometrics, 33(1)
Lenberg P, Tengberg L G W, Feldt R (2017) An initial analysis of software engineers’ attitudes towards organizational change. Empir Softw Eng 22 (4):2179–2205
Lui KM, Chan KCC (2006) Pair programming productivity: Novice–novice vs. expert–expert. International Journal of Human-Computer Studies 64 (9):915–925. https://doi.org/10.1016/j.ijhcs.2006.04.010
Mardi F, Miller K, Balcerzak P (2021) Novice - expert pair coaching: Teaching Python in a pandemic. In: Technical symposium on computer science education. ACM, pp 226–231
McHugh M L (2013) The chi-square test of independence. Biochemia Medica: Biochemia Medica 23(2):143–149
McIntosh S, Kamei Y, Adams B, Hassan A E (2014) The impact of code review coverage and code review participation on software quality. In: Working conference on mining software repositories. ACM, pp 192–201
McIntosh S, Kamei Y, Adams B, Hassan A E (2014) The impact of code review coverage and code review participation on software quality: A case study of the qt, vtk, and itk projects. In: Proceedings of the 11th working conference on mining software repositories, MSR 2014. [Online]. Available: https://doi.org/10.1145/2597073.2597076. Association for Computing Machinery, New York, pp 192–201
McIntosh S, Kamei Y, Adams B, Hassan A E (2016) An empirical study of the impact of modern code review practices on software quality. Empir Softw Eng 21(5):2146–2189
Mens T, Cataldo M, Damian D (2019) The social developer: The future of software development. IEEE Softw 36(1):4
Meshcheryakov G, Igolkina A A, Samsonova M G (2021) semopy 2: A structural equation modeling package with random effects in python. arXiv:2106.01140
Meyer A N, Barr E T, Bird C, Zimmermann T (2019) Today was a good day: The daily life of software developers. IEEE Trans Softw Eng 47 (5):863–880
Mlouki O, Khomh F, Antoniol G (2016) On the detection of licenses violations in the android ecosystem. In: International conference on software analysis, evolution, and reengineering. IEEE, pp 382–392
Mukadam M, Bird C, Rigby PC (2013) Gerrit software code review data from Android. In: International working conference on mining software repositories, pp 45–48
Neto E C, d Costa D A, Kulesza U (2019) Revisiting and improving SZZ implementations. In: International symposium on empirical software engineering and measurement, pp 1–12
Neumayr T, Jetter H-C, Augstein M, Friedl J, Luger T (2018) Domino: A descriptive framework for hybrid collaboration and coupling styles in partially distributed teams. Human-Computer Interaction, 24
Oliveira E, Fernandes E, Steinmacher I, Cristo M, Conte T, Garcia A (2020) Code and commit metrics of developer productivity: a study on team leaders perceptions. Empir Softw Eng 25(4):2519–2549
Plonka L, Sharp H, van der Linden J, Dittrich Y (2015) Knowledge transfer in pair programming: An in-depth analysis. International Journal of Human-Computer Studies 73:66–78
Rahman M T (2015) Investigating modern release engineering practices. In: International conference on software analysis, evolution, and reengineering. IEEE, pp 607–608
Rich J T, Neely J G, Paniello R C, Voelker C C J, Nussenbaum B, Wang E W (2010) A practical guide to understanding kaplan-meier curves. Otolaryngology–head and neck surgery 143(3):331–6
Rigby P, Cleary B, Painchaud F, Storey M-A, German D (2012) Contemporary peer review in action: Lessons from open source development. IEEE Softw 29(6):56–61
Rigby P C, German D M, Storey M-A (2008) Open source software peer review practices: A case study of the Apache server. In: International conference on software engineering. ACM, pp 541–550
Rodriguez G, Robles G, Gonzalez-Barahona J (2018) Reproducibility and credibility in empirical software engineering. Inf Softw Technol 99:164–176
Runeson P, Host M, Rainer A, Regnell B (2012) Case study research in software engineering: Guidelines and examples, 1st edn. Wiley, Hoboken
Saldaña J (2015) The coding manual for qualitative researchers. SAGE Publications, Thousand Oaks
Salleh N, Hoda R, Su M T, Kanij T, Grundy J (2018) Recruitment, engagement and feedback in empirical software engineering studies in industrial contexts. Inf Softw Technol 98:161–172
Satorra A, Bentler P M (2001) A scaled difference chi-square test statistic for moment structure analysis. Psychometrika 66(4):507–514
Sharp H, Robinson H (2008) Collaboration and co-ordination in mature extreme programming teams. International Journal of Human-Computer Studies 66(7):506–518
Siegmund J, Kästner C, Liebig J, Apel S, Hanenberg S (2014) Measuring and modeling programming experience. Empir Softw Eng 19(5):1299–1334
Śliwerski J, Zimmermann T, Zeller A (May 2005) When do changes induce fixes?. SIGSOFT Softw Eng Notes 30(4):1–5
Spadini D, Aniche M, Bacchelli A (2018) PyDriller: Python framework for mining software repositories. In: Joint Meeting on ESEC and FSE. ACM, p 3
Spohrer A H K, Kude T, Schmidt C T (2013) Peer-based quality assurance in information systems and development: A transactive memory perspective. In: International conference on information systems
Steinmacher I, Silva M A G, Gerosa M A (2014) Barriers faced by newcomers to open source projects. In: Open Source Software: Mobile Open Source Technologies. Springer
Terzimehić N, Háuslschmid R, Hussmann H, schraefel (2019) A review & analysis of mindfulness research in HCI. In: ICHF in computing systems. ACM, pp 1–13
Tong C, Wong S K-S, Lui K P-H (2012) The influences of service personalization, customer satisfaction and switching costs on e-loyalty. International Journal of Economics and Finance 4(3):105–114
Treiblmaier H, Filzmoser P (2010) Exploratory factor analysis revisited: How robust methods support the detection of hidden multivariate data structures in is research. Information & Management 47(4):197–207
Tufano M, Palomba F, Bavota G, Oliveto R, Penta M D, De Lucia A, Poshyvanyk D (2017) When and why your code starts to smell bad (and whether the smells go away). Trans Softw Eng 43(11):1063–1088
Vallat R (2018) Pingouin: statistics in python. Journal of Open Source Software 3(31):1026. [Online]. Available: https://doi.org/10.21105/joss.01026
Wen M, Wu R, Liu Y, Tian Y, Xie X, Cheung S-C, Su Z (2019) Exploring and exploiting the correlations between bug-inducing and bug-fixing commits. In: Joint Meeting on ESEC and FSE. ACM, pp 326–337
Whitehead J (2007) Collaboration in software engineering: A roadmap. In: Future of software engineering, pp 214–225
Wohlin C, Runeson P, Höst M, Ohlsson M C, Regnell B, Wesslén A (2012) Experimentation in software engineering. Springer Science & Business Media, Berlin
Xia X, Lo D, Wang X, Yang X (2015) Who should review this change?: Putting text and file location analyses together for more accurate recommendations. In: 2015 IEEE international conference on software maintenance and evolution (ICSME), pp 261–270
Young J-G, Casari A, McLaughlin K, Trujillo M Z, Hébert-Dufresne L, Bagrow J P (2021) Which contributions count? analysis of attribution in open source. In: International working conference on mining software repositories. IEEE
Zhang Y, Zhou M, Mockus A, Jin Z (2021) Companies’ participation in oss development–an empirical study of openstack. IEEE Trans Softw Eng 47(10):2242–2259
Zhang Y, Zhou M, Stol K-J, Wu J, Jin Z (2020) How do companies collaborate in open source ecosystems?. In: International conference on software engineering. ACM, pp 1196–1208
Zhou S, Vasilescu B, Kästner C (2020) How has forking changed in the last 20 years? A study of hard forks on GitHub. In: International conference on software engineering. ACM, pp 445–456
Funding
This research is partially supported by the F.R.S.-FNRS under Grant numbers T.0017.18, J.0151.20, and O.0157.18F-RG43 (Excellence of Science project SECO-Assist), as well as by the FRQ-F.R.S.-FNRS under Grant number 264544 (bilateral Québec-Wallonia project SECOHealth).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics approval
The interviews performed during this study were subject to ethics certificate CER-1617-40, governed by the ethics board of Polytechnique Montreal.
Conflict of Interests
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Communicated by: Alexandre Bergel
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix : A: Questions for Guiding the OpenStack Interviews on Co-Authoring
Appendix : A: Questions for Guiding the OpenStack Interviews on Co-Authoring
1.1 A.1 Demographics
This first set of questions allow us to determine the profile and role of each interviewee within OpenStack.
-
1.
How and why did you start to get involved in OpenStack?
-
2.
What is your role in OpenStack? (And how did your role evolve over time?)
-
3.
Which and how many OpenStack projects have you been involved in?
-
4.
For how long have you been involved in OpenStack (and in these specific projects)?
1.2 A.2 Generic Questions
[These questions will be asked to each interviewee, regardless of his or her profile.]
-
1.
Which mechanisms are you aware of (or have personal experience with) for making joint contributions to OpenStack projects with other persons? [If the question is too unclear, provide concrete examples to the interviewee, e.g., internally visible branch, externally visible branch, emails, slack, same commit, IRC ...]
-
2.
Are you aware of (or familiar with) the possibility to co-author commits in OpenStack projects?
-
3.
If yes:
-
Are co-authored commits common in the OpenStack projects you are involved in?
-
What value, if any, does commit co-authoring bring to the OpenStack projects your are involved in?
-
What are the drawbacks, if any, commit co-authoring brings to the ecosystem?
-
Did/does the practice of co-authored commits improve onboarding experience? Go to C1 or D1
-
-
4.
If no:
-
For Foundation members, continue at question C2.
-
For all other interviewees: end of interview.
-
1.3 A.3 Questions for OpenStack Foundation Members
[These questions will only be asked to OpenStack Foundation members.]
-
1.
If the interviewee is aware of the possibility to co-author commits:
-
(a)
In general, why do co-authored commits happen in OpenStack?
-
(b)
Does OpenStack actively encourage co-authored commits? Why (not)?
-
(c)
Are OpenStack ecosystem members satisfied by the way in which co-authored commits are supported process-wise or tool-wise? Do you see room for improvement? How?
-
(a)
-
2.
How is OpenStack (or the specific projects you are or have been involved in) dealing with contributor onboarding, i.e., trying to attract and retain new contributors? Which techniques and/or processes are used to support this?
-
3.
How is OpenStack trying to reduce contributor turnover, and more specifically how is it trying to avoid key contributors from abandoning?
-
4.
Apart from the above issues, according to your personal experience, what are other social, technical or organizational health problems OpenStack is confronted with, including its community and its open source code base?
-
5.
How are these problems being addressed? For those problems that are not addressed yet, how should they be addressed?
1.4 A.4 Questions for OpenStack Practitioners
[These questions will only be asked to software developers involved in OpenStack projects.]
-
1.
Have you yourself been involved in co-authoring commits? For which projects (within and beyond OpenStack)?
-
2.
If yes:
-
(a)
How frequently have you co-authored commits?
-
(b)
What were the reasons for, and goals of, co-authoring the commits you were involved in (as opposed to individually authoring them)?
-
(c)
Are you aware of other reasons/goals of co-authored commits?
-
(d)
How much experience did you have in OpenStack when you started co-authoring commits?
-
(e)
What were the characteristics of the persons you co-authored with (juniors, seniors, experts in specific topics, …)?
-
(f)
What process do you use for co-authoring commits with other contributors (communication, division of tasks, …)?
-
(g)
Who becomes the “principal author” (i.e., the author recorded in Git)?
-
(h)
Do you explicitly mark your co-authored commits using “co-authored trailers” in commit messages? Why (not)?
-
(i)
Are you satisfied by the way in which co-authored commits are supported by OpenStack, both process-wise or tool-wise? Do you see room for improvement? How?
-
(a)
-
3.
If not at all:
-
(a)
Was it an explicit decision not to get involved in co-authoring?
-
(b)
If yes, why?
-
(c)
If not, do you see:
-
(i)
any value that commit co-authoring could bring to you?
-
(ii)
any drawbacks that commit co-authoring could bring to you?
-
(i)
-
(a)
Rights and permissions
About this article
Cite this article
Foundjem, A., Constantinou, E., Mens, T. et al. A mixed-methods analysis of micro-collaborative coding practices in OpenStack. Empir Software Eng 27, 120 (2022). https://doi.org/10.1007/s10664-022-10167-w
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-022-10167-w