ABSTRACT
GitHub, the largest code hosting site (with 25 million public active repositories and contributions from 6 million active users), provides an unprecedented opportunity to observe the collaboration patterns of software developers. Understanding the patterns behind the social coding phenomena is an active research area where the insights gained can guide the design of better collaboration tools, and can also help to identify and select developer talent. In this paper, we present a large-scale analysis of the co-commit patterns in GitHub. We analyze 10 million commits made by 200 thousand developers to 16 thousand repositories, using 17 of the most popular programming languages over a period of 3 years. Although a large volume of data is included in our study, we pay close attention to the participation criteria for repositories and developers. We select repositories by reputation (based on star ranking), and we introduce the notion of active developer in GitHub (observing that a limited subset of developers is responsible for the vast majority of the commits). Using co-authorship networks, we analyze the co-commit patterns of the active developer network for each programming language. We observe that the active developer networks are less connected and more centralized than the general GitHub developer networks, and that the patterns vary significantly among languages. We compare our results to other collaborative environments (Wikipedia and scientific research networks), and we also describe the evolution of the co-commit patterns over time.
- Réka Albert and Albert-László Barabási. 2002. Statistical mechanics of complex networks. Reviews of Modern Physics 74, 1 (2002), 47.Google ScholarCross Ref
- Albert-Laszlo Barabâsi, Hawoong Jeong, Zoltan Néda, Erzsebet Ravasz, Andras Schubert, and Tamas Vicsek. 2002. Evolution of the social network of scientific collaborations. Physica A: Statistical mechanics and its applications 311, 3 (2002), 590--614.Google Scholar
- Pamela Bhattacharya, Marios Iliofotou, Iulian Neamtiu, and Michalis Faloutsos. 2012. Graph-based analysis and prediction for software evolution. In 34th International Conference on Software Engineering (ICSE'12). 419--429. Google ScholarDigital Library
- Christian Bird, Premkumar Devanbu, Earl Barr, Vladimir Filkov, Andre Nash, and Zhendong Su. 2009. Structure and dynamics of research collaboration in computer science. In Proceedings of the 2009 SIAM International Conference on Data Mining (SDM'09). 826--837.Google ScholarCross Ref
- Sarvenaz Choobdar, Pedro Ribeiro, Sylwia Bugla, and Fernando Silva. 2012. Comparison of co-authorship networks across scientific fields using motifs. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM'12). 147--152. Google ScholarDigital Library
- Valerio Cosentino, Javier Luis, and Jordi Cabot. 2016. Findings from GitHub: Methods, datasets and limitations. In Proceedings of the 13th International Conference on Mining Software Repositories (MSR'16). 137--141. Google ScholarDigital Library
- Christina DesMarais. 2017. Need Tech Talent? 6 New Places to Look. Retrieved August 24, 2017 from https://www.inc.com/christina-desmarais/6-unexpected-places-to-find-technical-talent.htmlGoogle Scholar
- Linton C Freeman. 1977. A set of measures of centrality based on betweenness. Sociometry (1977), 35--41.Google Scholar
- Linton C Freeman. 1978. Centrality in social networks conceptual clarification. Social networks 1, 3 (1978), 215--239.Google Scholar
- Michelle Girvan and Mark EJ Newman. 2002. Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99, 12 (2002), 7821--7826.Google ScholarCross Ref
- Georgios Gousios. 2013. The GHTorrent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR '13). IEEE Press, Piscataway, NJ, USA, 233--236. http://dl.acm.org/citation.cfm?id=2487085.2487132 Google ScholarDigital Library
- H. Hemmati, S. Nadi, O. Baysal, O. Kononenko, W. Wang, R. Holmes, and M. W. Godfrey. 2013. The MSR Cookbook: Mining a decade of research. In 2013 10th Working Conference on Mining Software Repositories (MSR). 343--352. Google ScholarDigital Library
- Jian Huang, Ziming Zhuang, Jia Li, and C Lee Giles. 2008. Collaboration over time: Characterizing and modeling network evolution. In Proceedings of the 2008 International Conference on Web Search and Data Mining (WSDM'08). 107--116. Google ScholarDigital Library
- J. Jiang, L. Zhang, and L. Li. 2013. Understanding project dissemination on a social coding site. In 2013 20th Working Conference on Reverse Engineering (WCRE'13). 132--141.Google Scholar
- Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M German, and Daniela Damian. 2014. The promises and perils of mining GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR'14). 92--101. Google ScholarDigital Library
- Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M German, and Daniela Damian. 2016. An in-depth study of the promises and perils of mining GitHub. Empirical Software Engineering 21, 5 (2016), 2035--2071. Google ScholarDigital Library
- David Laniado and Riccardo Tasso. 2011. Co-authorship 2.0: Patterns of collaboration in Wikipedia. In Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia (HT'11). 201--210. Google ScholarDigital Library
- Antonio Lima, Luca Rossi, and Mirco Musolesi. 2014. Coding Together at Scale: GitHub as a Collaborative Social Network. In Eighth International AAAI Conference on Weblogs and Social Media (ICWSM'14).Google Scholar
- Xiaoming Liu, Johan Bollen, Michael L Nelson, and Herbert Van de Sompel. 2005. Co-authorship networks in the digital library research community. Information Processing & Management 41, 6 (2005), 1462--1480. Google ScholarCross Ref
- Dmitry Lizorkin, Olena Medelyan, and Maria Grineva. 2009. Analysis of community structure in Wikipedia. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). 1221--1222. Google ScholarDigital Library
- Luis Lopez-Fernandez, Gregorio Robles, Jesus M Gonzalez-Barahona, et al. 2004. Applying social network analysis to the information in CVS repositories. In International Workshop on Mining Software Repositories (MSR'04). 101--105.Google ScholarCross Ref
- A. Meneely and L. Williams. 2011. Socio-technical developer networks: should we trust our measurements?. In 2011 33rd International Conference on Software Engineering (ICSE). 281--290. Google ScholarDigital Library
- Mark EJ Newman. 2001. The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences 98, 2 (2001), 404--409.Google ScholarCross Ref
- Mark EJ Newman. 2004. Coauthorship networks and patterns of scientific collaboration. Proceedings of the National Academy of Sciences 101, 1 (2004), 5200--5205.Google ScholarCross Ref
- Mark EJ Newman. 2004. Who is the best connected scientist? A study of scientific coauthorship networks. In Complex networks. Springer, 337--370.Google Scholar
- Mark EJ Newman. 2006. Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103, 23 (2006), 8577--8582.Google ScholarCross Ref
- Mark EJ Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Physical review E 69, 2 (2004), 026113.Google Scholar
- Christian Staudt, Aleksejs Sazonovs, and Henning Meyerhenke. 2014. NetworKit: An Interactive Tool Suite for High-Performance Network Analysis. CoRR abs/1403.3005 (2014). http://arxiv.org/abs/1403.3005Google Scholar
- Christian L Staudt and Henning Meyerhenke. 2016. Engineering parallel algorithms for community detection in massive networks. IEEE Transactions on Parallel and Distributed Systems 27, 1 (2016), 171--184. Google ScholarDigital Library
- Didi Surian, David Lo, and Ee-Peng Lim. 2010. Mining collaboration patterns from a large developer network. In 17th Working Conference on Reverse Engineering (WCRE'10). 269--273. Google ScholarDigital Library
- Daniel Terdiman. 2012. Forget LinkedIn: Companies turn to GitHub to find tech talent. Retrieved August 24, 2017 from https://www.cnet.com/news/forget-linkedin-companies-turn-to-github-to-find-tech-talentGoogle Scholar
- Ferdian Thung, Tegawende F Bissyande, David Lo, and Lingxiao Jiang. 2013. Network structure of social coding in GitHub. In 17th European Conference on Software Maintenance and Reengineering (CSMR'13). 323--326. Google ScholarDigital Library
- Jin Xu, Yongqin Gao, Scott Christley, and Gregory Madey. 2005. A topological analysis of the open souce software development community. In Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05). 198a--198a. Google ScholarDigital Library
- Yue Yu, Gang Yin, Huaimin Wang, and Tao Wang. 2014. Exploring the Patterns of Social Behavior in GitHub. In Proceedings of the 1st International Workshop on Crowd-based Software Development Methods and Technologies (CrowdSoft'14). 31--36. Google ScholarDigital Library
Recommendations
How often and what StackOverflow posts do developers reference in their GitHub projects?
MSR '19: Proceedings of the 16th International Conference on Mining Software RepositoriesStack Overflow (SO) is a popular Q&A forum for software developers, providing a large amount of copyable code snippets. While GitHub is an independent code collaboration platform, developers often reuse SO code in their GitHub projects. In this paper, ...
Heterogeneous Network Analysis of Developer Contribution in Bug Repositories
CSC '13: Proceedings of the 2013 International Conference on Cloud and Service ComputingUsing a bug repository, developers contribute to improve the quality of software incrementally by creating and updating bug reports. All the software artifacts in bug repositories are derived from developer contribution. Most prior studies on developer ...
Analyzing the GitHub Repositories of Research Papers
JCDL '20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020Linking to code repositories, such as on GitHub, in scientific papers becomes increasingly common in the field of computer science. The actual quality and usage of these repositories are, however, to a large degree unknown so far. In this paper, we ...
Comments