Online collaborative platforms provide an environment for diverse developers to collaborate together in Open Source Software (OSS) projects. Previous studies in Software Engineering have shown the benefits of increasing gender and tenure diversity in OSS projects. However, little is known about racial and ethnic diversity’s role in OSS projects. An empirical study that analyzes how peer members’ racial and ethnic diversity in a collaborative group relates to the frequency of their collaborative contributions in OSS projects. We performed a large-scale quantitative analysis of the relationship between the race and ethnicity of peer members in a collaborative group and the frequency of their collaborative contributions in GitHub. We first inferred the peers working in collaborative groups within a project based on the collaboration between the developers in that project. We then used the Name-Prism tool to extract the race and ethnicity of the collaborative group’s peers from the names they use in GitHub. We finally used mixed effects regression modeling of the group members’ contributions – measured by the total number of pull requests merged as a collaborative group – to assess the relationship between the racial and ethnic diversity of the members in a collaborative group and the frequency of their collaborative contributions. Our results indicate that (1) a major part of the developers’ population are White developers; (2.1) the distribution of collaborative members’ contributions from homogeneous and heterogeneous collaborative groups, with respect to the race and ethnicity of the groups’ members, is different. Heterogeneous groups have a higher median number of contributions than homogeneous groups; and (2.2) the diversity of race and ethnicity of members in a collaborative group does have a statistically significant relationship with the frequency of the collaborative group members’ contributions. The racial and ethnic diversity of peer members in a collaborative group may have a role to play in the frequency of groups’ contributions in OSS. Hence, further research is needed to understand how the diverse racial and ethnic composition of collaborative group members leads to a higher rate of group contributions.




Similar content being viewed by others
Data Availability
We are sharing our dataset in the interest of encouraging others to replicate and build upon our work. The data can be found here: https://github.com/uw-swag/2022-EMSE-Race-and-Ethnicity-Diversity-and-Collaborative-Contributions.
In addition, we have added an online Appendix (https://github.com/uw-swag/2022-EMSE-Race-and-Ethnicity-Diversity-and-Collaborative-Contributions/blob/main/Appendix.md) to help readers better understand our method rationale.
GitHub is one of the most important online collaborative platforms, with more than 40M developers contributing to OSS projects https://octoverse.github.com
Alesina A, Ferrara EL (2005) Ethnic diversity and economic performance. Journal of economic literature 43(3):762–800
Ali M, Kulik CT, Metz I (2011) The gender diversity-performance relationship in services and manufacturing organizations. The International Journal of Human Resource Management 22(07):1464–1485
AlShebli BK, Rahwan T, Woon WL (2018) The preeminence of ethnic diversity in scientific collaboration. Nature Communications 9(1):1–10
Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In 2011 33rd International Conference on Software Engineering (ICSE) pages 1–10
Aué J, Haisma M, Tómasdóttir K, Bacchelli A (2016) Social diversity and growth levels of open source software projects on github. In Proceedings of the 10th ACM/IEEE international symposium on empirical software engineering and measurement pages 1–6
Avery DR, McKay PF, Tonidandel S, Volpone SD, Morris MA (2012) Is there method to the madness? examining how racioethnic matching influences retail store productivity. Personnel Psychology 65(1):167–199
Bates D, Mächler M, Bolker B, Walker S (2014) Fitting linear mixed-effects models using lme4. arXiv preprint arXiv:1406.5823
Bell ST (2007) Deep-level composition variables as predictors of team performance: a meta-analysis. Journal of Applied Psychology 92(3):595–615
Biazzini M, Baudry B (2014) “may the fork be with you”: Novel metrics to analyze collaboration on github. In Proceedings of the 5th international workshop on emerging trends in software metrics pages 37–43
Blau PM (1977) Inequality and heterogeneity: A primitive theory of social structure, vol 7. Free Press, New York
Burnett M, Stumpf S, Macbeth J, Makri S, Beckwith L, Kwan I, Peters A, Jernigan W (2016) Gendermag: A method for evaluating software’s gender inclusiveness. Interacting with Computers 28(6):760–787
Byrne DE (1971) The attraction paradigm, vol 462. Academic Press
Casalnuovo C, Vasilescu B, Devanbu P, Filkov V (2015) Developer onboarding in github: The role of prior social links and language experience. In Proceedings of the 2015 10th joint meeting on foundations of software engineering page 817–828, New York, NY, USA, 2015. Association for Computing Machinery
Catolino G, Palomba F, Tamburri DA, Serebrenik A, Ferrucci F (2019) Gender diversity and women in software teams: How do they affect community smells? In 2019 IEEE/ACM 41st international conference on software engineering: software engineering in society (ICSE-SEIS) pages 11–20. IEEE
Chen J, Ren Y, Riedl J (2010) The effects of diversity on group productivity and member withdrawal in online volunteer groups. In Proceedings of the SIGCHI conference on human factors in computing systems pages 821–830
Chen X, Wang D, Zhao T (2013) Geotext: an intelligent dynamic geometry textbook. ACM Communications in Computer Algebra 46(3/4):171–175
Cohen J, Cohen P, West SG, Aiken LS (2013) Applied multiple regression/correlation analysis for the behavioral sciences. Routledge, New York
Constantinou E, Mens T (2017) Socio-technical evolution of the ruby ecosystem in github. In 2017 IEEE 24th international conference on software analysis, evolution and reengineering (SANER) pages 34–44. IEEE
Crowston K, Li Q, Wei K, Eseryel UY, Howison J (2007) Self-organization of teams for free/libre open source software development. Information and software technology 49(6):564–575
Crowston K, Wei K, Howison J, Wiggins A (2008) Free/libre open-source software development: What we know and what we do not know. ACM Computing Surveys (CSUR) 44(2):1–35
Cuevas A, Febrero M, Fraiman R (2004) An anova test for functional data. Computational Statistics & Data Analysis 47(1):111–122
Dabbish L, Stuart C, Tsay J, Herbsleb J (2012) Social coding in github: transparency and collaboration in an open software repository. In Proceedings of the ACM 2012 conference on computer supported cooperative work pages 1277–1286
Daniel S, Agarwal R, Stewart KJ (2013) The effects of diversity in global, distributed collectives: A study of open source project success. Information Systems Research 24(2):312–333
Diamond R, McQuade T, Qian F (2019) The effects of rent control expansion on tenants, landlords, and inequality: Evidence from san francisco. American Economic Review 109(9):3365–94
Earley CP, Mosakowski E (2000) Creating hybrid team cultures: An empirical test of transnational team functioning. Academy of Management Journal 43(1):26–49
El Mezouar M, Zhang F, Zou Y (2019) An empirical study on the teams structures in social coding using github projects. Empirical Software Engineering 24(6):3790–3823
Finkel JR, Grenager T, Manning CD (2005) Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd annual meeting of the association for computational linguistics (ACL’05) pages 363–370
Galinsky AD, Todd AR, Homan AC, Phillips KW, Apfelbaum EP, Sasaki SJ, Richeson JA, Olayon JB, Maddux WW (2015) Maximizing the gains and minimizing the pains of diversity: A policy perspective. Perspectives on Psychological Science 10(6):742–748
German DM (2003) The gnome project: a case study of open source, global software development. Software Process: Improvement and Practice 8(4):201–215
Gerosa M, Wiese I, Trinkenreich B, Link G, Robles G, Treude C, Steinmacher I, Sarma A (2021) The shifting sands of motivation: Revisiting what drives contributors in open source. In 2021 IEEE/ACM 43rd international conference on software engineering (ICSE) pages 1046–1058. IEEE
Gila AR, Jaafa J, Omar M, Tunio MZ (2014) Impact of personality and gender diversity on software development teams’ performance. In 2014 International conference on computer, communications, and control technology (I4CT) pages 261–265. IEEE
GitHub (2021) The 2021 state of the octoverse. https://octoverse.github.com. Accessed 23 June 2022
Gornall W, Strebulaev IA (2019) Gender, race, and entrepreneurship: A randomized field experiment on venture capitalists and angels. Available at SSRN 3301982
Gupta R (2013) Workforce diversity and organizational performance. International Journal of Business and Management Invention 2(6):36–41
Hagberg A, Swart P, S Chult D (2008) Exploring network structure, dynamics, and function using networkx. Technical report, Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Hankerson D, Marshall AR, Booker J, El Mimouni H, Walker I, Rode JA (2016) Does technology have race? In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems pages 473–486
Harrison DA, Klein KJ (2007) What’s the difference? diversity constructs as separation, variety, or disparity in organizations. Academy of Management Review 32(4):1199–1228
Hogg MA, Abrams D, Otten S, Hinkle S (2004) The social identity perspective: Intergroup relations, self-conception, and small groups. Small Group Research 35(3):246–276
Horwitz SK, Horwitz IB (2007) The effects of team diversity on team outcomes: A meta-analytic review of team demography. Journal of Management 33(6):987–1015
Huang WHD, Hood DW, Yoo SJ (2013) Gender divide and acceptance of collaborative web 2.0 applications for learning in higher education. The Internet and Higher Education 16:57–65
Hunt V, Layton D, Prince S (2015) Diversity matters. McKinsey & Company 1(1):15–29
Ibe NA, Howsmon R, Penney L, Granor N, DeLyser LA, Wang K (2018) Reflections of a diversity, equity, and inclusion working group based on data from a national cs education program. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education pages 711–716
Iyer RN, Yun SA, Nagappan M, Hoey J (2019) Effects of personality traits on pull request acceptance. IEEE Transactions on Software Engineering pages 1–12
Jackson SE, Joshi A (2004) Diversity in social context: a multi-attribute, multilevel analysis of team diversity and sales performance. Journal of Organizational Behavior: The International Journal of Industrial, Occupational and Organizational Psychology and Behavior 25(6):675–702
Jackson SE, Joshi A, Erhardt NL (2003) Recent research on team and organizational diversity: Swot analysis and implications. Journal of Management 29(6):801–830
Joblin M, Mauerer W, Apel S, Siegmund J, Riehle D (2015) From developer networks to verified communities: A fine-grained approach. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering volume 1, pages 563–573, New York City, 2015. IEEE
Kempf E, Tsoutsoura M (2018) Partisan professionals: Evidence from credit rating analysts. Technical report, National Bureau of Economic Research
Lazear EP (1999) Globalisation and the market for team-mates. The Economic Journal 109(454):15–40
Lee N, Nathan M (2011) Does cultural diversity help innovation in cities: evidence from london firms. Technical report, London School of Economics and Political Science, LSE Library
Lehmann-Willenbrock N, Allen JA, Meinecke AL (2014) Observing culture: Differences in us-american and german team meeting behaviors. Group Processes & Intergroup Relations 17(2):252–271
Leibzon W (2016) Social network of software development at github. In 2016 IEEE/ACM International conference on advances in social networks analysis and mining (ASONAM) pages 1374–1376. IEEE
Lopez-Fernandez L, Robles G, Gonzalez-Barahona JM, et al (2004) Applying social network analysis to the information in cvs repositories. In MSR volume 2004, page 1st, Edinburgh, UK, 2004. IET
Marques M (2015) Software engineering education-does gender matter in project results?-a chilean case study. In 2015 IEEE Frontiers in Education Conference (FIE) pages 1–8. IEEE
Martins LL, Gilson LL, Maynard MT (2004) Virtual teams: What do we know and where do we go from here? Journal of management 30(6):805–835
McKnight PE, Najab J (2010) Mann-whitney u test. The Corsini encyclopedia of psychology pages 1
Mendez C, Padala HS, Steine-Hanson Z, Hilderbrand C, Horvath A, Hill C, Simpson L, Patil N, Sarma A, Burnett M (2018) Open source barriers to entry, revisited: A sociotechnical perspective. In Proceedings of the 40th international conference on software engineering pages 1004–1015
Meneely A, Williams L (2011) Socio-technical developer networks: Should we trust our measurements? In Proceedings of the 33rd international conference on software engineering pages 281–290, New York, NY, USA, 2011. Association for Computing Machinery
Meneely A, Williams L, Snipes W, Osborne J (2008) Predicting failures with developer networks and social network analysis. In Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering pages 13–23, New York, NY, USA, 2008. Association for Computing Machinery
Middleton J, Murphy-Hill E, Green D, Meade A, Mayer R, White D, McDonald S (2018) Which contributions predict whether developers are accepted into github teams. In 2018 IEEE/ACM 15th International conference on mining software repositories (MSR) pages 403–413, New York City, 2018. IEEE
Munaiah N, Kroh S, Cabrey C, Nagappan M (2017) Curating github for engineered software projects. Empirical Software Engineering 22(6):3219–3253
Nadri R, Rodríguez-Pérez G, Nagappan M (2020) Insights into nonmerged pull requests in github: Is there evidence of bias based on perceptible race? IEEE Software
Nadri R, Rodríguez-Pérez G, Nagappan M (2021) On the relationship between the developer’s perceptible race and ethnicity and the evaluation of contributions in oss. IEEE Transactions on Software Engineering
Nakagawa S, Schielzeth H (2013) A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2):133–142
octoverse (2018) Github population details
Ortu M, Destefanis G, Counsell S, Swift S, Tonelli R, Marchesi M (2017) How diverse is your team? investigating gender and nationality diversity in github teams. Journal of Software Engineering Research and Development 5(1):1–18
Peeters MAG, Van Tuijl HFJM, Rutte CG, Reymen IMMJ (2006) Personality and team performance: a meta-analysis. European Journal of Personality: Published for the European Association of Personality Psychology 20(5):377–396
Pieterse V, van Eekelen MCJD (2018) Cultural diversity and the performance of student software engineering teams. In Kabanda, S.(ed.), SACLA 2018: 47th Annual Conference of the Southern African Computing Lecturers’ Association Krystal Beach Hotel, Gordon’s Bay Western Cape, South Africa, June 18-20, 2018 pages 120–134. Cape Town: Southern African Computer Lecturers’ Association
Pinzger M, Nagappan N, Murphy B (2008) Can developer-module networks predict failures? In Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering pages 2–12, New York, NY, USA, 2008. Association for Computing Machinery
Preoţiuc-Pietro D, Ungar L (2018) User-level race and ethnicity predictors from twitter text. In Proceedings of the 27th International Conference on Computational Linguistics pages 1534–1545
Pretorius C, Razavian M, Eling K, Langerak F (2020) Combined intuition and rationality increases software feature novelty for female software designers. IEEE Software 38(2):64–69
R Core Team et al (2013) R: A language and environment for statistical computing
Rastogi A, Nachiappan N (2016) On the personality traits of github contributors. 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE). New York City, IEEE, pp 77–86
Rastogi A, Nagappan N, Gousios G, van der Hoek A (2018) Relationship between geographical location and evaluation of developer contributions in github. In Proceedings of the 12th ACM/IEEE international symposium on empirical software engineering and measurement pages 1–8
Reynolds A, Lewis D (2017) Teams solve problems faster when they’re more cognitively diverse. Harvard Business Review 23:2019
Riva G (2016) I social network. Il mulino
Roberts JA, Hann IH, Slaughter SA (2006) Understanding the motivations, participation, and performance of open source software developers: A longitudinal study of the apache projects. Management science 52(7):984–999
Rodríguez-Pérez G, Nadri R, Nagappan M (2021) Perceived diversity in software engineering: a systematic literature review. Empirical Software Engineering 26(5):1–38
Ross J, Irani L, Silberman MS, Zaldivar A, Tomlinson B (2010) Who are the crowdworkers? shifting demographics in mechanical turk. CHI’10 extended abstracts on Human factors in computing systems. Association for Computing Machinery, New York, NY, USA, pp 2863–2872
Salancik GR, Pfeffer J (1978) A social information processing approach to job attitudes and task design. Administrative Science Quarterly pages 224–253
Sax LJ, Zimmerman HB, Blaney JM, Toven-Lindsey B, Lehman K (2017) Diversifying undergraduate computer science: The role of department chairs in promoting gender and racial diversity. J Women Minorities Sci Eng 23(2)
Schulte C, Barwari T, Joshi A, Theofilatos K, Zampetaki A, Barallobre-Barreiro J, Singh B, Sörensen NA, Neumann JT, Zeller T et al (2019) Comparative analysis of circulating noncoding rnas versus protein biomarkers in the detection of myocardial injury. Circulation Research 125(3):328–340
Stahl GK, Maznevski ML, Voigt A, Jonsen K (2010) Unraveling the effects of cultural diversity in teams: A meta-analysis of research on multicultural work groups. Journal of International Business Studies 41(4):690–709
Tajfel H (1982) Social psychology of intergroup relations. Annual Review of Psychology 33(1):1–39
Tamburri DA, Palomba F, Serebrenik A, Zaidman A (2019) Discovering community patterns in open-source: a systematic approach and its evaluation. Empirical Software Engineering 24(3):1369–1417
Teachman JD (1980) Analysis of population diversity: Measures of qualitative variation. Sociological Methods & Research 8(3):341–362
Terrell J, Kofink A, Middleton J, Rainear C, Murphy-Hill E, Parnin C, Stallings J (2017) Gender differences and bias in open source: Pull request acceptance of women versus men. PeerJ Computer Science 3:e111
Thung F, Bissyande TF, Lo D, Jiang L (2013) Network structure of social coding in github. In 2013 17th European conference on software maintenance and reengineering pages 323–326, New York City, 2013. IEEE
Tsay J, Dabbish L, Herbsleb J (2014) Influence of social and technical factors for evaluating contribution in github. In Proceedings of the 36th international conference on software engineering pages 356–366, New York City, 2014. Association for Computing Machinery
Van Knippenberg D, Schippers MC (2007) Work group diversity. Annu Rev Psychol 58:515–541
Vasilescu B, Filkov V, Serebrenik A (2015) Perceptions of diversity on git hub: A user survey. In 2015 IEEE/ACM 8th international workshop on cooperative and human aspects of software engineering pages 50–56. IEEE
Vasilescu B, Posnett D, Ray B, van den Brand MGJ, Serebrenik A, Devanbu P, Filkov V (2015) Gender and tenure diversity in github teams. In Proceedings of the 33rd annual ACM conference on human factors in computing systems pages 3789–3798, New York, NY, 2015. ACM
Watson WE, Kumar K, Michaelsen LK (1993) Cultural diversity’s impact on interaction process and performance: Comparing homogeneous and diverse task groups. Academy of Management Journal 36(3):590–602
Williamsky O (1998) Demographyand diversityinorganizations: Areviewof40yearsof research. Research in Organizational Behavior 20(3):77–140
Yang X (2014) Social network analysis in open source software peer review. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering pages 820–822, New York, NY, USA, 2014. Association for Computing Machinery
Ye J, Han S, Hu Y, Coskun B, Liu M, Qin H, Skiena S (2017) Nationality classification using name embeddings. In Proceedings of the 2017 ACM on conference on information and knowledge management pages 1897–1906
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of Interests
The authors declared that they have no conflict of interest.
Additional information
Communicated by: Rafael Prikladnicki.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shameer, S., Rodríguez-Pérez, G. & Nagappan, M. Relationship between diversity of collaborative group members’ race and ethnicity and the frequency of their collaborative contributions in GitHub. Empir Software Eng 28, 83 (2023). https://doi.org/10.1007/s10664-023-10313-y
DOI: https://doi.org/10.1007/s10664-023-10313-y