Skip to main content
Log in

Relationship between diversity of collaborative group members’ race and ethnicity and the frequency of their collaborative contributions in GitHub

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Online collaborative platforms provide an environment for diverse developers to collaborate together in Open Source Software (OSS) projects. Previous studies in Software Engineering have shown the benefits of increasing gender and tenure diversity in OSS projects. However, little is known about racial and ethnic diversity’s role in OSS projects. An empirical study that analyzes how peer members’ racial and ethnic diversity in a collaborative group relates to the frequency of their collaborative contributions in OSS projects. We performed a large-scale quantitative analysis of the relationship between the race and ethnicity of peer members in a collaborative group and the frequency of their collaborative contributions in GitHub. We first inferred the peers working in collaborative groups within a project based on the collaboration between the developers in that project. We then used the Name-Prism tool to extract the race and ethnicity of the collaborative group’s peers from the names they use in GitHub. We finally used mixed effects regression modeling of the group members’ contributions – measured by the total number of pull requests merged as a collaborative group – to assess the relationship between the racial and ethnic diversity of the members in a collaborative group and the frequency of their collaborative contributions. Our results indicate that (1) a major part of the developers’ population are White developers; (2.1) the distribution of collaborative members’ contributions from homogeneous and heterogeneous collaborative groups, with respect to the race and ethnicity of the groups’ members, is different. Heterogeneous groups have a higher median number of contributions than homogeneous groups; and (2.2) the diversity of race and ethnicity of members in a collaborative group does have a statistically significant relationship with the frequency of the collaborative group members’ contributions. The racial and ethnic diversity of peer members in a collaborative group may have a role to play in the frequency of groups’ contributions in OSS. Hence, further research is needed to understand how the diverse racial and ethnic composition of collaborative group members leads to a higher rate of group contributions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data Availability

We are sharing our dataset in the interest of encouraging others to replicate and build upon our work. The data can be found here: https://github.com/uw-swag/2022-EMSE-Race-and-Ethnicity-Diversity-and-Collaborative-Contributions.

In addition, we have added an online Appendix (https://github.com/uw-swag/2022-EMSE-Race-and-Ethnicity-Diversity-and-Collaborative-Contributions/blob/main/Appendix.md) to help readers better understand our method rationale.

Notes

  1. https://diversity.google/

  2. https://www.microsoft.com/en-us/diversity

  3. https://diversity.fb.com/read-report/

  4. https://www.linuxfoundation.org/press-release/2020/10/linux-foundation-focuses-on-science-and-research-to-advance-diversity-and-inclusion-in-software-engineering/

  5. GitHub is one of the most important online collaborative platforms, with more than 40M developers contributing to OSS projects https://octoverse.github.com

  6. https://reporeapers.github.io/results/1.html

  7. https://developer.github.com/v3/

  8. https://github.com/18F/tock

  9. https://github.com/tue-mdse/genderComputer

  10. https://CRAN.R-project.org/package=MuMIn

  11. https://latinagirlscode.org

  12. https://www.blackgirlscode.com

  13. https://techqueria.org

  14. https://diversity.google/

  15. https://www.microsoft.com/en-us/diversity

  16. https://diversity.fb.com/read-report/

  17. https://diversity.google/

  18. https://www.microsoft.com/en-us/diversity

  19. https://diversity.fb.com/read-report/

References

  • Alesina A, Ferrara EL (2005) Ethnic diversity and economic performance. Journal of economic literature 43(3):762–800

    Article  Google Scholar 

  • Ali M, Kulik CT, Metz I (2011) The gender diversity-performance relationship in services and manufacturing organizations. The International Journal of Human Resource Management 22(07):1464–1485

    Article  Google Scholar 

  • AlShebli BK, Rahwan T, Woon WL (2018) The preeminence of ethnic diversity in scientific collaboration. Nature Communications 9(1):1–10

    Article  Google Scholar 

  • Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In 2011 33rd International Conference on Software Engineering (ICSE) pages 1–10

  • Aué J, Haisma M, Tómasdóttir K, Bacchelli A (2016) Social diversity and growth levels of open source software projects on github. In Proceedings of the 10th ACM/IEEE international symposium on empirical software engineering and measurement pages 1–6

  • Avery DR, McKay PF, Tonidandel S, Volpone SD, Morris MA (2012) Is there method to the madness? examining how racioethnic matching influences retail store productivity. Personnel Psychology 65(1):167–199

    Article  Google Scholar 

  • Bates D, Mächler M, Bolker B, Walker S (2014) Fitting linear mixed-effects models using lme4. arXiv preprint arXiv:1406.5823

  • Bell ST (2007) Deep-level composition variables as predictors of team performance: a meta-analysis. Journal of Applied Psychology 92(3):595–615

    Article  Google Scholar 

  • Biazzini M, Baudry B (2014) “may the fork be with you”: Novel metrics to analyze collaboration on github. In Proceedings of the 5th international workshop on emerging trends in software metrics pages 37–43

  • Blau PM (1977) Inequality and heterogeneity: A primitive theory of social structure, vol 7. Free Press, New York

    Google Scholar 

  • Burnett M, Stumpf S, Macbeth J, Makri S, Beckwith L, Kwan I, Peters A, Jernigan W (2016) Gendermag: A method for evaluating software’s gender inclusiveness. Interacting with Computers 28(6):760–787

    Article  Google Scholar 

  • Byrne DE (1971) The attraction paradigm, vol 462. Academic Press

    Google Scholar 

  • Casalnuovo C, Vasilescu B, Devanbu P, Filkov V (2015) Developer onboarding in github: The role of prior social links and language experience. In Proceedings of the 2015 10th joint meeting on foundations of software engineering page 817–828, New York, NY, USA, 2015. Association for Computing Machinery

  • Catolino G, Palomba F, Tamburri DA, Serebrenik A, Ferrucci F (2019) Gender diversity and women in software teams: How do they affect community smells? In 2019 IEEE/ACM 41st international conference on software engineering: software engineering in society (ICSE-SEIS) pages 11–20. IEEE

  • Chen J, Ren Y, Riedl J (2010) The effects of diversity on group productivity and member withdrawal in online volunteer groups. In Proceedings of the SIGCHI conference on human factors in computing systems pages 821–830

  • Chen X, Wang D, Zhao T (2013) Geotext: an intelligent dynamic geometry textbook. ACM Communications in Computer Algebra 46(3/4):171–175

    Article  Google Scholar 

  • Cohen J, Cohen P, West SG, Aiken LS (2013) Applied multiple regression/correlation analysis for the behavioral sciences. Routledge, New York

    Book  Google Scholar 

  • Constantinou E, Mens T (2017) Socio-technical evolution of the ruby ecosystem in github. In 2017 IEEE 24th international conference on software analysis, evolution and reengineering (SANER) pages 34–44. IEEE

  • Crowston K, Li Q, Wei K, Eseryel UY, Howison J (2007) Self-organization of teams for free/libre open source software development. Information and software technology 49(6):564–575

    Article  Google Scholar 

  • Crowston K, Wei K, Howison J, Wiggins A (2008) Free/libre open-source software development: What we know and what we do not know. ACM Computing Surveys (CSUR) 44(2):1–35

    Article  Google Scholar 

  • Cuevas A, Febrero M, Fraiman R (2004) An anova test for functional data. Computational Statistics & Data Analysis 47(1):111–122

    Article  MathSciNet  MATH  Google Scholar 

  • Dabbish L, Stuart C, Tsay J, Herbsleb J (2012) Social coding in github: transparency and collaboration in an open software repository. In Proceedings of the ACM 2012 conference on computer supported cooperative work pages 1277–1286

  • Daniel S, Agarwal R, Stewart KJ (2013) The effects of diversity in global, distributed collectives: A study of open source project success. Information Systems Research 24(2):312–333

    Article  Google Scholar 

  • Diamond R, McQuade T, Qian F (2019) The effects of rent control expansion on tenants, landlords, and inequality: Evidence from san francisco. American Economic Review 109(9):3365–94

    Article  Google Scholar 

  • Earley CP, Mosakowski E (2000) Creating hybrid team cultures: An empirical test of transnational team functioning. Academy of Management Journal 43(1):26–49

    Article  Google Scholar 

  • El Mezouar M, Zhang F, Zou Y (2019) An empirical study on the teams structures in social coding using github projects. Empirical Software Engineering 24(6):3790–3823

    Article  Google Scholar 

  • Finkel JR, Grenager T, Manning CD (2005) Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd annual meeting of the association for computational linguistics (ACL’05) pages 363–370

  • Galinsky AD, Todd AR, Homan AC, Phillips KW, Apfelbaum EP, Sasaki SJ, Richeson JA, Olayon JB, Maddux WW (2015) Maximizing the gains and minimizing the pains of diversity: A policy perspective. Perspectives on Psychological Science 10(6):742–748

    Article  Google Scholar 

  • German DM (2003) The gnome project: a case study of open source, global software development. Software Process: Improvement and Practice 8(4):201–215

    Article  Google Scholar 

  • Gerosa M, Wiese I, Trinkenreich B, Link G, Robles G, Treude C, Steinmacher I, Sarma A (2021) The shifting sands of motivation: Revisiting what drives contributors in open source. In 2021 IEEE/ACM 43rd international conference on software engineering (ICSE) pages 1046–1058. IEEE

  • Gila AR, Jaafa J, Omar M, Tunio MZ (2014) Impact of personality and gender diversity on software development teams’ performance. In 2014 International conference on computer, communications, and control technology (I4CT) pages 261–265. IEEE

  • GitHub (2021) The 2021 state of the octoverse. https://octoverse.github.com. Accessed 23 June 2022

  • Gornall W, Strebulaev IA (2019) Gender, race, and entrepreneurship: A randomized field experiment on venture capitalists and angels. Available at SSRN 3301982

  • Gupta R (2013) Workforce diversity and organizational performance. International Journal of Business and Management Invention 2(6):36–41

    Google Scholar 

  • Hagberg A, Swart P, S Chult D (2008) Exploring network structure, dynamics, and function using networkx. Technical report, Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

  • Hankerson D, Marshall AR, Booker J, El Mimouni H, Walker I, Rode JA (2016) Does technology have race? In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems pages 473–486

  • Harrison DA, Klein KJ (2007) What’s the difference? diversity constructs as separation, variety, or disparity in organizations. Academy of Management Review 32(4):1199–1228

    Article  Google Scholar 

  • Hogg MA, Abrams D, Otten S, Hinkle S (2004) The social identity perspective: Intergroup relations, self-conception, and small groups. Small Group Research 35(3):246–276

    Article  Google Scholar 

  • Horwitz SK, Horwitz IB (2007) The effects of team diversity on team outcomes: A meta-analytic review of team demography. Journal of Management 33(6):987–1015

    Article  Google Scholar 

  • Huang WHD, Hood DW, Yoo SJ (2013) Gender divide and acceptance of collaborative web 2.0 applications for learning in higher education. The Internet and Higher Education 16:57–65

    Article  Google Scholar 

  • Hunt V, Layton D, Prince S (2015) Diversity matters. McKinsey & Company 1(1):15–29

    Google Scholar 

  • Ibe NA, Howsmon R, Penney L, Granor N, DeLyser LA, Wang K (2018) Reflections of a diversity, equity, and inclusion working group based on data from a national cs education program. In Proceedings of the 49th ACM Technical Symposium on Computer Science Education pages 711–716

  • Iyer RN, Yun SA, Nagappan M, Hoey J (2019) Effects of personality traits on pull request acceptance. IEEE Transactions on Software Engineering pages 1–12

  • Jackson SE, Joshi A (2004) Diversity in social context: a multi-attribute, multilevel analysis of team diversity and sales performance. Journal of Organizational Behavior: The International Journal of Industrial, Occupational and Organizational Psychology and Behavior 25(6):675–702

    Article  Google Scholar 

  • Jackson SE, Joshi A, Erhardt NL (2003) Recent research on team and organizational diversity: Swot analysis and implications. Journal of Management 29(6):801–830

    Article  Google Scholar 

  • Joblin M, Mauerer W, Apel S, Siegmund J, Riehle D (2015) From developer networks to verified communities: A fine-grained approach. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering volume 1, pages 563–573, New York City, 2015. IEEE

  • Kempf E, Tsoutsoura M (2018) Partisan professionals: Evidence from credit rating analysts. Technical report, National Bureau of Economic Research

    Book  Google Scholar 

  • Lazear EP (1999) Globalisation and the market for team-mates. The Economic Journal 109(454):15–40

    Article  Google Scholar 

  • Lee N, Nathan M (2011) Does cultural diversity help innovation in cities: evidence from london firms. Technical report, London School of Economics and Political Science, LSE Library

    Google Scholar 

  • Lehmann-Willenbrock N, Allen JA, Meinecke AL (2014) Observing culture: Differences in us-american and german team meeting behaviors. Group Processes & Intergroup Relations 17(2):252–271

    Article  Google Scholar 

  • Leibzon W (2016) Social network of software development at github. In 2016 IEEE/ACM International conference on advances in social networks analysis and mining (ASONAM) pages 1374–1376. IEEE

  • Lopez-Fernandez L, Robles G, Gonzalez-Barahona JM, et al (2004) Applying social network analysis to the information in cvs repositories. In MSR volume 2004, page 1st, Edinburgh, UK, 2004. IET

  • Marques M (2015) Software engineering education-does gender matter in project results?-a chilean case study. In 2015 IEEE Frontiers in Education Conference (FIE) pages 1–8. IEEE

  • Martins LL, Gilson LL, Maynard MT (2004) Virtual teams: What do we know and where do we go from here? Journal of management 30(6):805–835

    Article  Google Scholar 

  • McKnight PE, Najab J (2010) Mann-whitney u test. The Corsini encyclopedia of psychology pages 1

  • Mendez C, Padala HS, Steine-Hanson Z, Hilderbrand C, Horvath A, Hill C, Simpson L, Patil N, Sarma A, Burnett M (2018) Open source barriers to entry, revisited: A sociotechnical perspective. In Proceedings of the 40th international conference on software engineering pages 1004–1015

  • Meneely A, Williams L (2011) Socio-technical developer networks: Should we trust our measurements? In Proceedings of the 33rd international conference on software engineering pages 281–290, New York, NY, USA, 2011. Association for Computing Machinery

  • Meneely A, Williams L, Snipes W, Osborne J (2008) Predicting failures with developer networks and social network analysis. In Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering pages 13–23, New York, NY, USA, 2008. Association for Computing Machinery

  • Middleton J, Murphy-Hill E, Green D, Meade A, Mayer R, White D, McDonald S (2018) Which contributions predict whether developers are accepted into github teams. In 2018 IEEE/ACM 15th International conference on mining software repositories (MSR) pages 403–413, New York City, 2018. IEEE

  • Munaiah N, Kroh S, Cabrey C, Nagappan M (2017) Curating github for engineered software projects. Empirical Software Engineering 22(6):3219–3253

    Article  Google Scholar 

  • Nadri R, Rodríguez-Pérez G, Nagappan M (2020) Insights into nonmerged pull requests in github: Is there evidence of bias based on perceptible race? IEEE Software

  • Nadri R, Rodríguez-Pérez G, Nagappan M (2021) On the relationship between the developer’s perceptible race and ethnicity and the evaluation of contributions in oss. IEEE Transactions on Software Engineering

  • Nakagawa S, Schielzeth H (2013) A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in ecology and evolution 4(2):133–142

    Article  Google Scholar 

  • octoverse (2018) Github population details

  • Ortu M, Destefanis G, Counsell S, Swift S, Tonelli R, Marchesi M (2017) How diverse is your team? investigating gender and nationality diversity in github teams. Journal of Software Engineering Research and Development 5(1):1–18

    Article  Google Scholar 

  • Peeters MAG, Van Tuijl HFJM, Rutte CG, Reymen IMMJ (2006) Personality and team performance: a meta-analysis. European Journal of Personality: Published for the European Association of Personality Psychology 20(5):377–396

    Article  Google Scholar 

  • Pieterse V, van Eekelen MCJD (2018) Cultural diversity and the performance of student software engineering teams. In Kabanda, S.(ed.), SACLA 2018: 47th Annual Conference of the Southern African Computing Lecturers’ Association Krystal Beach Hotel, Gordon’s Bay Western Cape, South Africa, June 18-20, 2018 pages 120–134. Cape Town: Southern African Computer Lecturers’ Association

  • Pinzger M, Nagappan N, Murphy B (2008) Can developer-module networks predict failures? In Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering pages 2–12, New York, NY, USA, 2008. Association for Computing Machinery

  • Preoţiuc-Pietro D, Ungar L (2018) User-level race and ethnicity predictors from twitter text. In Proceedings of the 27th International Conference on Computational Linguistics pages 1534–1545

  • Pretorius C, Razavian M, Eling K, Langerak F (2020) Combined intuition and rationality increases software feature novelty for female software designers. IEEE Software 38(2):64–69

    Article  Google Scholar 

  • R Core Team et al (2013) R: A language and environment for statistical computing

  • Rastogi A, Nachiappan N (2016) On the personality traits of github contributors. 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE). New York City, IEEE, pp 77–86

    Chapter  Google Scholar 

  • Rastogi A, Nagappan N, Gousios G, van der Hoek A (2018) Relationship between geographical location and evaluation of developer contributions in github. In Proceedings of the 12th ACM/IEEE international symposium on empirical software engineering and measurement pages 1–8

  • Reynolds A, Lewis D (2017) Teams solve problems faster when they’re more cognitively diverse. Harvard Business Review 23:2019

    Google Scholar 

  • Riva G (2016) I social network. Il mulino

  • Roberts JA, Hann IH, Slaughter SA (2006) Understanding the motivations, participation, and performance of open source software developers: A longitudinal study of the apache projects. Management science 52(7):984–999

    Article  Google Scholar 

  • Rodríguez-Pérez G, Nadri R, Nagappan M (2021) Perceived diversity in software engineering: a systematic literature review. Empirical Software Engineering 26(5):1–38

    Article  Google Scholar 

  • Ross J, Irani L, Silberman MS, Zaldivar A, Tomlinson B (2010) Who are the crowdworkers? shifting demographics in mechanical turk. CHI’10 extended abstracts on Human factors in computing systems. Association for Computing Machinery, New York, NY, USA, pp 2863–2872

    Google Scholar 

  • Salancik GR, Pfeffer J (1978) A social information processing approach to job attitudes and task design. Administrative Science Quarterly pages 224–253

  • Sax LJ, Zimmerman HB, Blaney JM, Toven-Lindsey B, Lehman K (2017) Diversifying undergraduate computer science: The role of department chairs in promoting gender and racial diversity. J Women Minorities Sci Eng 23(2)

  • Schulte C, Barwari T, Joshi A, Theofilatos K, Zampetaki A, Barallobre-Barreiro J, Singh B, Sörensen NA, Neumann JT, Zeller T et al (2019) Comparative analysis of circulating noncoding rnas versus protein biomarkers in the detection of myocardial injury. Circulation Research 125(3):328–340

    Article  Google Scholar 

  • Stahl GK, Maznevski ML, Voigt A, Jonsen K (2010) Unraveling the effects of cultural diversity in teams: A meta-analysis of research on multicultural work groups. Journal of International Business Studies 41(4):690–709

    Article  Google Scholar 

  • Tajfel H (1982) Social psychology of intergroup relations. Annual Review of Psychology 33(1):1–39

    Article  Google Scholar 

  • Tamburri DA, Palomba F, Serebrenik A, Zaidman A (2019) Discovering community patterns in open-source: a systematic approach and its evaluation. Empirical Software Engineering 24(3):1369–1417

    Article  Google Scholar 

  • Teachman JD (1980) Analysis of population diversity: Measures of qualitative variation. Sociological Methods & Research 8(3):341–362

    Article  Google Scholar 

  • Terrell J, Kofink A, Middleton J, Rainear C, Murphy-Hill E, Parnin C, Stallings J (2017) Gender differences and bias in open source: Pull request acceptance of women versus men. PeerJ Computer Science 3:e111

    Article  Google Scholar 

  • Thung F, Bissyande TF, Lo D, Jiang L (2013) Network structure of social coding in github. In 2013 17th European conference on software maintenance and reengineering pages 323–326, New York City, 2013. IEEE

  • Tsay J, Dabbish L, Herbsleb J (2014) Influence of social and technical factors for evaluating contribution in github. In Proceedings of the 36th international conference on software engineering pages 356–366, New York City, 2014. Association for Computing Machinery

  • Van Knippenberg D, Schippers MC (2007) Work group diversity. Annu Rev Psychol 58:515–541

    Article  Google Scholar 

  • Vasilescu B, Filkov V, Serebrenik A (2015) Perceptions of diversity on git hub: A user survey. In 2015 IEEE/ACM 8th international workshop on cooperative and human aspects of software engineering pages 50–56. IEEE

  • Vasilescu B, Posnett D, Ray B, van den Brand MGJ, Serebrenik A, Devanbu P, Filkov V (2015) Gender and tenure diversity in github teams. In Proceedings of the 33rd annual ACM conference on human factors in computing systems pages 3789–3798, New York, NY, 2015. ACM

  • Watson WE, Kumar K, Michaelsen LK (1993) Cultural diversity’s impact on interaction process and performance: Comparing homogeneous and diverse task groups. Academy of Management Journal 36(3):590–602

    Article  Google Scholar 

  • Williamsky O (1998) Demographyand diversityinorganizations: Areviewof40yearsof research. Research in Organizational Behavior 20(3):77–140

    Google Scholar 

  • Yang X (2014) Social network analysis in open source software peer review. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering pages 820–822, New York, NY, USA, 2014. Association for Computing Machinery

  • Ye J, Han S, Hu Y, Coskun B, Liu M, Qin H, Skiena S (2017) Nationality classification using name embeddings. In Proceedings of the 2017 ACM on conference on information and knowledge management pages 1897–1906

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gema Rodríguez-Pérez.

Ethics declarations

Conflicts of Interests

The authors declared that they have no conflict of interest.

Additional information

Communicated by: Rafael Prikladnicki.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shameer, S., Rodríguez-Pérez, G. & Nagappan, M. Relationship between diversity of collaborative group members’ race and ethnicity and the frequency of their collaborative contributions in GitHub. Empir Software Eng 28, 83 (2023). https://doi.org/10.1007/s10664-023-10313-y

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-023-10313-y

Keywords

Navigation