Abstract
Assessing the collaboration among developers is important to understand different aspects of software lifecycle including code smell intensity, bug fixes, and software quality. This kind of collaboration can be obtained from social networks, which represent interactions between individuals in different contexts. In this paper, we model GitHub developers’ collaborations in a heterogeneous network by considering three aspects: social collaboration, collaboration time in a repository and technical features. Then, we explore the GitHub network from different perspectives: size, relevance, and potential applications. The results show the considered metrics are not correlated, bringing new information about the collaborations. We also show that such information is useful for social developer ranking, an actual task which is often part of different applications, such as team formation, community detection and pair programming. Finally, as software quality is intrinsic to the people who code it, our methodology and analyses represent initial steps towards people-centered software quality analysis, as further discussed throughout this article.
Similar content being viewed by others
Availability of data and material
Our dataset, called GitHub Socially Enhanced Dataset (GitSED), is publicly available in Zenodo (Oliveira et al., 2021).
Code availability
All relevant information related to this work is available in Project Apoena homepage: https://bit.ly/proj-apoena
Notes
GitHub Mastering Issues: https://guides.github.com/features/issues
Following GitHub documentation, repositories owned by user accounts have one owner, and ownership permissions cannot be shared with another user account. Owners may also invite users on GitHub to their repositories as collaborators.
Following People: https://help.github.com/en/articles/following-people
The initial version of our research is published in Portuguese within a local venue (Rocha et al., 2016).
TIOBE Index: https://www.tiobe.com/tiobe-index
As of July 2021, most of such languages continue at the top 12, with exceptions of Ruby (17th), Perl (18th) and Pascal (20th). Their places in the top 12 are now filled with SQL (database language at 10th), Classic Visual Basic (at 11th), and R (environment-oriented at 12th). Hence, our analyses are still relevant for considering the most used coding languages.
GitHub’s state of Octoverse: https://octoverse.github.com/2016/
GHTorrent is an offline repository of data collected through the GitHub REST API.
June 2019 is the most recent available dump at the time of defining our model. As of July 2021, there are only two versions newer than it: July 2020 and March 2021.
Git Awards: https://github.com/vdaubry/github-awards
The average node degree is calculated over all nodes in the network.
The average CC is calculated over all nodes in the network.
A graph is complete when all its nodes are connected by a unique edge.
References
Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the Web. Social Networks, 25(3), 211–230. https://doi.org/10.1016/S0378-8733(03)00009-1
Aggarwal, C. C. (2016). Recommender Systems - The Textbook. Springer. https://doi.org/10.1007/978-3-319-29659-3
Almarimi, N., Ouni, A., & Mkaouer, M. W. (2020). Learning to detect community smells in open source software projects. Knowledge-Based Systems, 204, 106201. https://doi.org/10.1016/j.knosys.2020.106201
Anvik, J., Hiew, L., & Murphy, G. C. (2006). Who should fix this bug? In International Conference on Software Engineering (pp. 361–370). Shanghai, China. https://doi.org/10.1145/1134285.1134336
Avelino, G., Passos, L., Hora, A., & Valente, M. T. (2016). A novel approach for estimating truck factors. In Int’l Conf. on Program Comprehension (pp. 1–10). IEEE Computer Society. https://doi.org/10.1109/ICPC.2016.7503718
Avelino, G., Passos, L., Hora, A., & Valente, M. T. (2017). Assessing code authorship: The case of the Linux kernel. In International Conference on Open Source Systems (OSS) (pp. 151–163). Buenos Aires, Argentina. https://doi.org/10.1007/978-3-319-57735-7_15
Bagley, C. A., & Chou, C. C. (2007). Collaboration and the importance for novices in learning java computer programming. SIGCSE Bulletin, 39(3), 211–215.
Barabási, A. L. (2016). Network science. Cambridge University Press.
Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.
Batista, N. A., Brandão, M. A., Alves, G. B., da Silva, A. P. C., & Moro, M. M. (2017). Collaboration strength metrics and analyses on GitHub. In Proceedings of the International Conference on Web Intelligence (pp. 170–178). Leipzig, Germany.
Baysal, O., Godfrey, M. W., & Cohen, R. (2009). A bug you like: A framework for automated assignment of bugs. In International Conference on Program Comprehension (pp. 297–298). Vancouver, Canada. https://doi.org/10.1109/ICPC.2009.5090066
Bhasin, T., Murray, A., & Storey, M. D. (2021). Student experiences with github and stack overflow: An exploratory study. In IEEE/ACM Int’l Workshop on Cooperative and Human Aspects of Software Engineering (CHASE) (pp. 81–90). IEEE, Madrid, Spain. https://doi.org/10.1109/CHASE52884.2021.00017
Blincoe, K., Harrison, F., & Damian, D. (2015). Ecosystems in github and a method for ecosystem identification using reference coupling. In IEEE/ACM 12th Working Conference on Mining Software Repositories (pp. 202–211). https://doi.org/10.1109/MSR.2015.26
Borges, H., Hora, A., & Valente, M. T. (2016). Understanding the factors that impact the popularity of GitHub repositories. In IEEE International Conference on Software Maintenance and Evolution (pp. 334–344). https://doi.org/10.1109/ICSME.2016.31
Brandão, M. A., & Moro, M. M. (2017). The strength of co-authorship ties through different topological properties. Journal of the Brazilian Computer Society, 23(1). https://doi.org/10.1186/s13173-017-0055-x
Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks, 30(1–7), 107–117. https://doi.org/10.1016/S0169-7552(98)00110-X
Çaglayan, B., & Bener, A. B. (2016). Effect of developer collaboration activity on software quality in two large scale projects. Journal of Systems and Software, 118, 288–296.
Colakoglu, F. N., Yazici, A., & Mishra, A. (2021). Software product quality metrics: A systematic mapping study. IEEE Access, 9, 44647–44670. https://doi.org/10.1109/ACCESS.2021.3054730
Constantinou, E., & Mens, T. (2017). Socio-technical evolution of the ruby ecosystem in github. In IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (pp. 34–44). Klagenfurt, Austria. https://doi.org/10.1109/SANER.2017.7884607
Costa, A., et al. (2020). Team formation in software engineering: A systematic mapping study. IEEE Access, 8, 145687–145712. https://doi.org/10.1109/ACCESS.2020.3015017
Dalla Palma, S., et al. (2020). Towards a catalogue of software quality metrics for infrastructure code. Journal of Systems and Software, p 110726.
Easley, D., & Kleinberg, J. (2010). Networks, crowds, and markets: Reasoning about a highly connected world. Cambridge University Press.
Emerson, P. (2013). The original borda count and partial voting. Social Choice and Welfare, 40(2), 353–358.
Garousi, V., Tarhan, A., Pfahl, D., Coşkunçay, A., & Demirörs, O. (2019). Correlation of critical success factors with success of software projects: an empirical investigation. Software Quality Journal, 27, 429–493. https://doi.org/10.1007/s11219-018-9419-5.
Gousios, G. (2013). The GHTorrent Dataset and Tool Suite. In Proceedings of the 10th Working Conference on Mining Software Repositories (pp. 233–236).
Gousios, G., et al. (2014). Lean GHTorrent: GitHub data on demand. In 11th Working Conference on Mining Software Repositories (pp. 384–387). Hyderabad, India. https://doi.org/10.1145/2597073.2597126
Hong, Q., et al. (2011). Understanding a developer social network and its evolution. In IEEE 27th International Conference on Software Maintenance, ICSM (pp. 323–332). IEEE Computer Society. https://doi.org/10.1109/ICSM.2011.6080799
Jere, S., Jayannavar, L., Ali, A., & Kulkarni, C. (2017). Recruitment graph model for hiring unique competencies using social media mining. In Proceedings of the International Conference on Machine Learning and Computing (pp. 461–466). Singapore. https://doi.org/10.1145/3055635.3056575
Jiang, J., et al. (2019). Who should make decision on this pull request? analyzing time-decaying relationships and file similarities for integrator prediction. Journal of Systems and Software, 154, 196–210. https://doi.org/10.1016/j.jss.2019.04.055
Joblin, M., et al. (2017). Classifying developers into core and peripheral: An empirical study on count and network metrics. In Proceedings of the 39th International Conference on Software Engineering (pp. 164–174). Buenos Aires, Argentina. https://doi.org/10.1109/ICSE.2017.23
Leibzon, W. (2016). Social network of software development at GitHub. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (pp. 1374–1376). San Francisco, USA. https://doi.org/10.1109/ASONAM.2016.7752419
Lenhard, J., Blom, M., & Herold, S. (2019). Exploring the suitability of source code metrics for indicating architectural inconsistencies. Software Quality Journal, 27, 241–274. https://doi.org/10.1007/s11219-018-9404-z
Li, H., et al. (2020). Privacy leakage via de-anonymization and aggregation in heterogeneous social networks. IEEE IEEE Transactions on Dependable and Secure Computing, 17(2), 350–362. https://doi.org/10.1109/TDSC.2017.2754249
Lima, A., Rossi, L., & Musolesi, M. (2014). Coding together at scale: Github as a collaborative social network. In Proceedings of the Eighth International Conference on Weblogs and Social Media (pp. 295–304). Ann Arbor, USA.
Madeyski, L., & Jureczko, M. (2015). Which process metrics can significantly improve defect prediction models? an empirical study. Software Quality Journal, 23, 393–422. https://doi.org/10.1007/s11219-014-9241-7.
Majumder, S., Mody, P., & Menzies, T. (2020). Revisiting process versus product metrics: a large scale analysis. CoRR abs/2008.09569. https://arxiv.org/abs/2008.09569
Malhotra, R., & Chug, A. (2013). An empirical study to redefine the relationship between software design metrics and maintainability in high data intensive applications. In Proceedings of the World Congress on Engineering and Computer Science (vol. 1).
Meneely, A., & Williams, L. (2011). Socio-technical developer networks: Should we trust our measurements? In Proceedings of the International Conference on Software Engineering (pp. 281–290). Honolulu, USA. https://doi.org/10.1145/1985793.1985832
Meneely, A., et al. (2008). Predicting failures with developer networks and social network analysis. In ACM SIGSOFT International Symposium on Foundations of Software Engineering (pp. 13–23). Atlanta, USA. https://doi.org/10.1145/1453101.1453106
Montandon, J. E., et al. (2021). What skills do IT companies look for in new developers? A study with stack overflow jobs. Information and Software Technology, 129, 106429. https://doi.org/10.1016/j.infsof.2020.106429
Nguyen, P. T., Rocco, J. D., Rubei, R., & Ruscio, D. D. (2020). An automated approach to assess the similarity of github repositories. Software Quality Journal, 28(2), 595–631. https://doi.org/10.1007/s11219-019-09483-0
Oliveira, G. P., Batista, N. A., Brandão, M. A., & Moro, M. M. (2018). Tie strength in gitHub heterogeneous networks. In Brazilian Symposium on Multimedia and the Web (pp. 363–370). https://doi.org/10.1145/3243082.3243101
Oliveira, G. P., Moura, A. F. C., Batista, N. A., Brandão, M. A., & Moro, M. M. (2021). GitSED: GitHub Socially Enhanced Dataset. https://doi.org/10.5281/zenodo.5021329
Palomba, F., et al. (2018). Beyond technical aspects: How do community smells influence the intensity of code smells? IEEE Transactions on Software Engineering.
Rahman, F., & Devanbu, P. T. (2013). How, and why, process metrics are better. In D. Notkin, B. H. C. Cheng, & K. Pohl (Eds.). International Conference on Software Engineering, IEEE Computer Society (pp. 432–441). https://doi.org/10.1109/ICSE.2013.6606589
Rahman, M. M., & Roy, C. K. (2014). An insight into the pull requests of github. In ACM 11th Working Conference on Mining Software Repositories (pp. 364–367).
Rocha, L. M. A., et al. (2016). Análise da Contribuição para Código entre Repositórios do GitHub. In Brazilian Symposium on Databases - Short Papers (pp 103–108).
Sarma, A., et al. (2016). Hiring in the global stage: Profiles of online contributions. In 11th IEEE International Conference on Global Software Engineering (pp. 1–10). Orange County, CA, USA. https://doi.org/10.1109/ICGSE.2016.35
Silva, H., & Valente, M. T. (2018). What’s in a GitHub star? understanding repository starring practices in a social coding platform. Journal of Systems and Software, 146, 112–129. https://doi.org/10.1016/j.jss.2018.09.016.
Singer, L., et al. (2013). Mutual assessment in the social programmer ecosystem: an empirical investigation of developer profile aggregators. In Computer Supported Cooperative Work (pp. 103–116) San Antonio, TX, USA. https://doi.org/10.1145/2441776.2441791
Singh, P. V. (2010). The small-world effect: The influence of macro-level properties of developer collaboration networks on open-source project success. ACM Transactions on Software Engineering and Methodology, 20(2), 6:1–6:27.
Tamburri, D. A., et al. (2019). Discovering community patterns in open-source: a systematic approach and its evaluation. Empirical Software Engineering, 24(3), 1369–1417.
Torres, N. (2015). Technology is only making social skills more important. Harvard Business Review, pp August 26, 2015.
Wang, S., et al. (2018). Entagrec ++: An enhanced tag recommendation system for software information sites. Empirical Software Engineering, 23(2), 800–832. https://doi.org/10.1007/s10664-017-9533-1
Young, H. P. (1988). Condorcet’s theory of voting. American Political science review, 82(4), 1231–1244.
Yu, Y., Wang, H., Yin, G., & Wang, T. (2016). Reviewer recommendation for pull-requests in github: What can we learn from code review and bug assignment? Information and Software Technology, 74, 204–218. https://doi.org/10.1016/j.infsof.2016.01.004
Yu, Y., et al. (2014a). Exploring the patterns of social behavior in github. In International Workshop on Crowd-based Software Development Methods and Technologies (pp. 31–36). https://doi.org/10.1145/2666539.2666571
Yu, Y., et al. (2014b). Reviewer recommender of pull-requests in github. In International Conference on Software Maintenance and Evolution (pp. 609–612). Victoria, Canada. https://doi.org/10.1109/ICSME.2014.107
Zhang, Y., et al. (2017). Detecting similar repositories on github. In IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (pp. 13–23). Klagenfurt, Austria. https://doi.org/10.1109/SANER.2017.7884605
Zhou, C., Kuttal, S. K., & Ahmed, I. (2018). What makes a good developer? an empirical study of developers’ technical and social competencies. In IEEE Symposium on Visual Languages and Human-Centric Computing, VL/HCC (pp. 319–321). Lisbon, Portugal. https://doi.org/10.1109/VLHCC.2018.8506577
Funding
This work was supported by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Brazil.
Author information
Authors and Affiliations
Contributions
Gabriel P. Oliveira: conceptualization, methodology, software, formal analysis, investigation, writing — original draft, visualization. Ana Flávia C. Moura: software, investigation, data curation. Natércia A. Batista: conceptualization, methodology, investigation, writing — original draft. Michele A. Brandão: conceptualization, methodology, formal analysis, investigation, writing — original draft. Andre Hora: conceptualization, validation, writing — review and editing supervision. Mirella M. Moro: conceptualization, resources, writing — review and editing, supervision, project administration, funding acquisition.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Oliveira, G.P., Moura, A.F.C., Batista, N.A. et al. How do developers collaborate? Investigating GitHub heterogeneous networks. Software Qual J 31, 211–241 (2023). https://doi.org/10.1007/s11219-022-09598-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11219-022-09598-x