Skip to main content
Log in

How do developers collaborate? Investigating GitHub heterogeneous networks

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Assessing the collaboration among developers is important to understand different aspects of software lifecycle including code smell intensity, bug fixes, and software quality. This kind of collaboration can be obtained from social networks, which represent interactions between individuals in different contexts. In this paper, we model GitHub developers’ collaborations in a heterogeneous network by considering three aspects: social collaboration, collaboration time in a repository and technical features. Then, we explore the GitHub network from different perspectives: size, relevance, and potential applications. The results show the considered metrics are not correlated, bringing new information about the collaborations. We also show that such information is useful for social developer ranking, an actual task which is often part of different applications, such as team formation, community detection and pair programming. Finally, as software quality is intrinsic to the people who code it, our methodology and analyses represent initial steps towards people-centered software quality analysis, as further discussed throughout this article.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Availability of data and material

Our dataset, called GitHub Socially Enhanced Dataset (GitSED), is publicly available in Zenodo (Oliveira et al., 2021).

Code availability

All relevant information related to this work is available in Project Apoena homepage: https://bit.ly/proj-apoena

Notes

  1. GitHub Mastering Issues: https://guides.github.com/features/issues

  2. Following GitHub documentation, repositories owned by user accounts have one owner, and ownership permissions cannot be shared with another user account. Owners may also invite users on GitHub to their repositories as collaborators.

  3. Following People: https://help.github.com/en/articles/following-people

  4. GitHub stars: https://help.github.com/en/articles/saving-repositories-with-stars

  5. The initial version of our research is published in Portuguese within a local venue (Rocha et al., 2016).

  6. TIOBE Index: https://www.tiobe.com/tiobe-index

  7. As of July 2021, most of such languages continue at the top 12, with exceptions of Ruby (17th), Perl (18th) and Pascal (20th). Their places in the top 12 are now filled with SQL (database language at 10th), Classic Visual Basic (at 11th), and R (environment-oriented at 12th). Hence, our analyses are still relevant for considering the most used coding languages.

  8. GitHub’s state of Octoverse: https://octoverse.github.com/2016/

  9. GHTorrent is an offline repository of data collected through the GitHub REST API.

  10. June 2019 is the most recent available dump at the time of defining our model. As of July 2021, there are only two versions newer than it: July 2020 and March 2021.

  11. Git Awards: https://github.com/vdaubry/github-awards

  12. The average node degree is calculated over all nodes in the network.

  13. The average CC is calculated over all nodes in the network.

  14. A graph is complete when all its nodes are connected by a unique edge.

References

  • Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the Web. Social Networks, 25(3), 211–230. https://doi.org/10.1016/S0378-8733(03)00009-1

    Article  Google Scholar 

  • Aggarwal, C. C. (2016). Recommender Systems - The Textbook. Springer. https://doi.org/10.1007/978-3-319-29659-3

    Article  Google Scholar 

  • Almarimi, N., Ouni, A., & Mkaouer, M. W. (2020). Learning to detect community smells in open source software projects. Knowledge-Based Systems, 204, 106201. https://doi.org/10.1016/j.knosys.2020.106201

  • Anvik, J., Hiew, L., & Murphy, G. C. (2006). Who should fix this bug? In International Conference on Software Engineering (pp. 361–370). Shanghai, China. https://doi.org/10.1145/1134285.1134336

  • Avelino, G., Passos, L., Hora, A., & Valente, M. T. (2016). A novel approach for estimating truck factors. In Int’l Conf. on Program Comprehension (pp. 1–10). IEEE Computer Society. https://doi.org/10.1109/ICPC.2016.7503718

  • Avelino, G., Passos, L., Hora, A., & Valente, M. T. (2017). Assessing code authorship: The case of the Linux kernel. In International Conference on Open Source Systems (OSS) (pp. 151–163). Buenos Aires, Argentina. https://doi.org/10.1007/978-3-319-57735-7_15

  • Bagley, C. A., & Chou, C. C. (2007). Collaboration and the importance for novices in learning java computer programming. SIGCSE Bulletin, 39(3), 211–215.

  • Barabási, A. L. (2016). Network science. Cambridge University Press.

  • Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.

  • Batista, N. A., Brandão, M. A., Alves, G. B., da Silva, A. P. C., & Moro, M. M. (2017). Collaboration strength metrics and analyses on GitHub. In Proceedings of the International Conference on Web Intelligence (pp. 170–178). Leipzig, Germany.

  • Baysal, O., Godfrey, M. W., & Cohen, R. (2009). A bug you like: A framework for automated assignment of bugs. In International Conference on Program Comprehension (pp. 297–298). Vancouver, Canada. https://doi.org/10.1109/ICPC.2009.5090066

  • Bhasin, T., Murray, A., & Storey, M. D. (2021). Student experiences with github and stack overflow: An exploratory study. In IEEE/ACM Int’l Workshop on Cooperative and Human Aspects of Software Engineering (CHASE) (pp. 81–90). IEEE, Madrid, Spain. https://doi.org/10.1109/CHASE52884.2021.00017

  • Blincoe, K., Harrison, F., & Damian, D. (2015). Ecosystems in github and a method for ecosystem identification using reference coupling. In IEEE/ACM 12th Working Conference on Mining Software Repositories (pp. 202–211). https://doi.org/10.1109/MSR.2015.26

  • Borges, H., Hora, A., & Valente, M. T. (2016). Understanding the factors that impact the popularity of GitHub repositories. In IEEE International Conference on Software Maintenance and Evolution (pp. 334–344). https://doi.org/10.1109/ICSME.2016.31

  • Brandão, M. A., & Moro, M. M. (2017). The strength of co-authorship ties through different topological properties. Journal of the Brazilian Computer Society, 23(1). https://doi.org/10.1186/s13173-017-0055-x

  • Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks, 30(1–7), 107–117. https://doi.org/10.1016/S0169-7552(98)00110-X

  • Çaglayan, B., & Bener, A. B. (2016). Effect of developer collaboration activity on software quality in two large scale projects. Journal of Systems and Software, 118, 288–296.

  • Colakoglu, F. N., Yazici, A., & Mishra, A. (2021). Software product quality metrics: A systematic mapping study. IEEE Access, 9, 44647–44670. https://doi.org/10.1109/ACCESS.2021.3054730

  • Constantinou, E., & Mens, T. (2017). Socio-technical evolution of the ruby ecosystem in github. In IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (pp. 34–44). Klagenfurt, Austria. https://doi.org/10.1109/SANER.2017.7884607

  • Costa, A., et al. (2020). Team formation in software engineering: A systematic mapping study. IEEE Access, 8, 145687–145712. https://doi.org/10.1109/ACCESS.2020.3015017

    Article  Google Scholar 

  • Dalla Palma, S., et al. (2020). Towards a catalogue of software quality metrics for infrastructure code. Journal of Systems and Software, p 110726.

  • Easley, D., & Kleinberg, J. (2010). Networks, crowds, and markets: Reasoning about a highly connected world. Cambridge University Press.

  • Emerson, P. (2013). The original borda count and partial voting. Social Choice and Welfare, 40(2), 353–358.

    Article  MathSciNet  MATH  Google Scholar 

  • Garousi, V., Tarhan, A., Pfahl, D., Coşkunçay, A., & Demirörs, O. (2019). Correlation of critical success factors with success of software projects: an empirical investigation. Software Quality Journal, 27, 429–493. https://doi.org/10.1007/s11219-018-9419-5.

  • Gousios, G. (2013). The GHTorrent Dataset and Tool Suite. In Proceedings of the 10th Working Conference on Mining Software Repositories (pp. 233–236).

  • Gousios, G., et al. (2014). Lean GHTorrent: GitHub data on demand. In 11th Working Conference on Mining Software Repositories (pp. 384–387). Hyderabad, India. https://doi.org/10.1145/2597073.2597126

  • Hong, Q., et al. (2011). Understanding a developer social network and its evolution. In IEEE 27th International Conference on Software Maintenance, ICSM (pp. 323–332). IEEE Computer Society. https://doi.org/10.1109/ICSM.2011.6080799

  • Jere, S., Jayannavar, L., Ali, A., & Kulkarni, C. (2017). Recruitment graph model for hiring unique competencies using social media mining. In Proceedings of the International Conference on Machine Learning and Computing (pp. 461–466). Singapore. https://doi.org/10.1145/3055635.3056575

  • Jiang, J., et al. (2019). Who should make decision on this pull request? analyzing time-decaying relationships and file similarities for integrator prediction. Journal of Systems and Software, 154, 196–210. https://doi.org/10.1016/j.jss.2019.04.055

    Article  Google Scholar 

  • Joblin, M., et al. (2017). Classifying developers into core and peripheral: An empirical study on count and network metrics. In Proceedings of the 39th International Conference on Software Engineering (pp. 164–174). Buenos Aires, Argentina. https://doi.org/10.1109/ICSE.2017.23

  • Leibzon, W. (2016). Social network of software development at GitHub. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (pp. 1374–1376). San Francisco, USA. https://doi.org/10.1109/ASONAM.2016.7752419

  • Lenhard, J., Blom, M., & Herold, S. (2019). Exploring the suitability of source code metrics for indicating architectural inconsistencies. Software Quality Journal, 27, 241–274. https://doi.org/10.1007/s11219-018-9404-z

  • Li, H., et al. (2020). Privacy leakage via de-anonymization and aggregation in heterogeneous social networks. IEEE IEEE Transactions on Dependable and Secure Computing, 17(2), 350–362. https://doi.org/10.1109/TDSC.2017.2754249

    Article  Google Scholar 

  • Lima, A., Rossi, L., & Musolesi, M. (2014). Coding together at scale: Github as a collaborative social network. In Proceedings of the Eighth International Conference on Weblogs and Social Media (pp. 295–304). Ann Arbor, USA.

  • Madeyski, L., & Jureczko, M. (2015). Which process metrics can significantly improve defect prediction models? an empirical study. Software Quality Journal, 23, 393–422. https://doi.org/10.1007/s11219-014-9241-7.

  • Majumder, S., Mody, P., & Menzies, T. (2020). Revisiting process versus product metrics: a large scale analysis. CoRR abs/2008.09569. https://arxiv.org/abs/2008.09569

  • Malhotra, R., & Chug, A. (2013). An empirical study to redefine the relationship between software design metrics and maintainability in high data intensive applications. In Proceedings of the World Congress on Engineering and Computer Science (vol. 1).

  • Meneely, A., & Williams, L. (2011). Socio-technical developer networks: Should we trust our measurements? In Proceedings of the International Conference on Software Engineering (pp. 281–290). Honolulu, USA. https://doi.org/10.1145/1985793.1985832

  • Meneely, A., et al. (2008). Predicting failures with developer networks and social network analysis. In ACM SIGSOFT International Symposium on Foundations of Software Engineering (pp. 13–23). Atlanta, USA. https://doi.org/10.1145/1453101.1453106

  • Montandon, J. E., et al. (2021). What skills do IT companies look for in new developers? A study with stack overflow jobs. Information and Software Technology, 129, 106429. https://doi.org/10.1016/j.infsof.2020.106429

    Article  Google Scholar 

  • Nguyen, P. T., Rocco, J. D., Rubei, R., & Ruscio, D. D. (2020). An automated approach to assess the similarity of github repositories. Software Quality Journal, 28(2), 595–631. https://doi.org/10.1007/s11219-019-09483-0

  • Oliveira, G. P., Batista, N. A., Brandão, M. A., & Moro, M. M. (2018). Tie strength in gitHub heterogeneous networks. In Brazilian Symposium on Multimedia and the Web (pp. 363–370). https://doi.org/10.1145/3243082.3243101

  • Oliveira, G. P., Moura, A. F. C., Batista, N. A., Brandão, M. A., & Moro, M. M. (2021). GitSED: GitHub Socially Enhanced Dataset. https://doi.org/10.5281/zenodo.5021329

  • Palomba, F., et al. (2018). Beyond technical aspects: How do community smells influence the intensity of code smells? IEEE Transactions on Software Engineering.

  • Rahman, F., & Devanbu, P. T. (2013). How, and why, process metrics are better. In D. Notkin, B. H. C. Cheng, & K. Pohl (Eds.). International Conference on Software Engineering, IEEE Computer Society (pp. 432–441). https://doi.org/10.1109/ICSE.2013.6606589

  • Rahman, M. M., & Roy, C. K. (2014). An insight into the pull requests of github. In ACM 11th Working Conference on Mining Software Repositories (pp. 364–367).

  • Rocha, L. M. A., et al. (2016). Análise da Contribuição para Código entre Repositórios do GitHub. In Brazilian Symposium on Databases - Short Papers (pp 103–108).

  • Sarma, A., et al. (2016). Hiring in the global stage: Profiles of online contributions. In 11th IEEE International Conference on Global Software Engineering (pp. 1–10). Orange County, CA, USA. https://doi.org/10.1109/ICGSE.2016.35

  • Silva, H., & Valente, M. T. (2018). What’s in a GitHub star? understanding repository starring practices in a social coding platform. Journal of Systems and Software, 146, 112–129. https://doi.org/10.1016/j.jss.2018.09.016.

  • Singer, L., et al. (2013). Mutual assessment in the social programmer ecosystem: an empirical investigation of developer profile aggregators. In Computer Supported Cooperative Work (pp. 103–116) San Antonio, TX, USA. https://doi.org/10.1145/2441776.2441791

  • Singh, P. V. (2010). The small-world effect: The influence of macro-level properties of developer collaboration networks on open-source project success. ACM Transactions on Software Engineering and Methodology, 20(2), 6:1–6:27.

  • Tamburri, D. A., et al. (2019). Discovering community patterns in open-source: a systematic approach and its evaluation. Empirical Software Engineering, 24(3), 1369–1417.

    Article  Google Scholar 

  • Torres, N. (2015). Technology is only making social skills more important. Harvard Business Review, pp August 26, 2015.

  • Wang, S., et al. (2018). Entagrec ++: An enhanced tag recommendation system for software information sites. Empirical Software Engineering, 23(2), 800–832. https://doi.org/10.1007/s10664-017-9533-1

    Article  Google Scholar 

  • Young, H. P. (1988). Condorcet’s theory of voting. American Political science review, 82(4), 1231–1244.

    Article  Google Scholar 

  • Yu, Y., Wang, H., Yin, G., & Wang, T. (2016). Reviewer recommendation for pull-requests in github: What can we learn from code review and bug assignment? Information and Software Technology, 74, 204–218. https://doi.org/10.1016/j.infsof.2016.01.004

  • Yu, Y., et al. (2014a). Exploring the patterns of social behavior in github. In International Workshop on Crowd-based Software Development Methods and Technologies (pp. 31–36). https://doi.org/10.1145/2666539.2666571

  • Yu, Y., et al. (2014b). Reviewer recommender of pull-requests in github. In International Conference on Software Maintenance and Evolution (pp. 609–612). Victoria, Canada. https://doi.org/10.1109/ICSME.2014.107

  • Zhang, Y., et al. (2017). Detecting similar repositories on github. In IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (pp. 13–23). Klagenfurt, Austria. https://doi.org/10.1109/SANER.2017.7884605

  • Zhou, C., Kuttal, S. K., & Ahmed, I. (2018). What makes a good developer? an empirical study of developers’ technical and social competencies. In IEEE Symposium on Visual Languages and Human-Centric Computing, VL/HCC (pp. 319–321). Lisbon, Portugal. https://doi.org/10.1109/VLHCC.2018.8506577

Download references

Funding

This work was supported by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Brazil.

Author information

Authors and Affiliations

Authors

Contributions

Gabriel P. Oliveira: conceptualization, methodology, software, formal analysis, investigation, writing — original draft, visualization. Ana Flávia C. Moura: software, investigation, data curation. Natércia A. Batista: conceptualization, methodology, investigation, writing — original draft. Michele A. Brandão: conceptualization, methodology, formal analysis, investigation, writing — original draft. Andre Hora: conceptualization, validation, writing — review and editing supervision. Mirella M. Moro: conceptualization, resources, writing — review and editing, supervision, project administration, funding acquisition.

Corresponding author

Correspondence to Mirella M. Moro.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oliveira, G.P., Moura, A.F.C., Batista, N.A. et al. How do developers collaborate? Investigating GitHub heterogeneous networks. Software Qual J 31, 211–241 (2023). https://doi.org/10.1007/s11219-022-09598-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-022-09598-x

Keywords

Navigation