skip to main content
10.1145/3196398.3196436acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Large-scale analysis of the co-commit patterns of the active developers in github's top repositories

Published:28 May 2018Publication History

ABSTRACT

GitHub, the largest code hosting site (with 25 million public active repositories and contributions from 6 million active users), provides an unprecedented opportunity to observe the collaboration patterns of software developers. Understanding the patterns behind the social coding phenomena is an active research area where the insights gained can guide the design of better collaboration tools, and can also help to identify and select developer talent. In this paper, we present a large-scale analysis of the co-commit patterns in GitHub. We analyze 10 million commits made by 200 thousand developers to 16 thousand repositories, using 17 of the most popular programming languages over a period of 3 years. Although a large volume of data is included in our study, we pay close attention to the participation criteria for repositories and developers. We select repositories by reputation (based on star ranking), and we introduce the notion of active developer in GitHub (observing that a limited subset of developers is responsible for the vast majority of the commits). Using co-authorship networks, we analyze the co-commit patterns of the active developer network for each programming language. We observe that the active developer networks are less connected and more centralized than the general GitHub developer networks, and that the patterns vary significantly among languages. We compare our results to other collaborative environments (Wikipedia and scientific research networks), and we also describe the evolution of the co-commit patterns over time.

References

  1. Réka Albert and Albert-László Barabási. 2002. Statistical mechanics of complex networks. Reviews of Modern Physics 74, 1 (2002), 47.Google ScholarGoogle ScholarCross RefCross Ref
  2. Albert-Laszlo Barabâsi, Hawoong Jeong, Zoltan Néda, Erzsebet Ravasz, Andras Schubert, and Tamas Vicsek. 2002. Evolution of the social network of scientific collaborations. Physica A: Statistical mechanics and its applications 311, 3 (2002), 590--614.Google ScholarGoogle Scholar
  3. Pamela Bhattacharya, Marios Iliofotou, Iulian Neamtiu, and Michalis Faloutsos. 2012. Graph-based analysis and prediction for software evolution. In 34th International Conference on Software Engineering (ICSE'12). 419--429. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Christian Bird, Premkumar Devanbu, Earl Barr, Vladimir Filkov, Andre Nash, and Zhendong Su. 2009. Structure and dynamics of research collaboration in computer science. In Proceedings of the 2009 SIAM International Conference on Data Mining (SDM'09). 826--837.Google ScholarGoogle ScholarCross RefCross Ref
  5. Sarvenaz Choobdar, Pedro Ribeiro, Sylwia Bugla, and Fernando Silva. 2012. Comparison of co-authorship networks across scientific fields using motifs. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM'12). 147--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Valerio Cosentino, Javier Luis, and Jordi Cabot. 2016. Findings from GitHub: Methods, datasets and limitations. In Proceedings of the 13th International Conference on Mining Software Repositories (MSR'16). 137--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Christina DesMarais. 2017. Need Tech Talent? 6 New Places to Look. Retrieved August 24, 2017 from https://www.inc.com/christina-desmarais/6-unexpected-places-to-find-technical-talent.htmlGoogle ScholarGoogle Scholar
  8. Linton C Freeman. 1977. A set of measures of centrality based on betweenness. Sociometry (1977), 35--41.Google ScholarGoogle Scholar
  9. Linton C Freeman. 1978. Centrality in social networks conceptual clarification. Social networks 1, 3 (1978), 215--239.Google ScholarGoogle Scholar
  10. Michelle Girvan and Mark EJ Newman. 2002. Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99, 12 (2002), 7821--7826.Google ScholarGoogle ScholarCross RefCross Ref
  11. Georgios Gousios. 2013. The GHTorrent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR '13). IEEE Press, Piscataway, NJ, USA, 233--236. http://dl.acm.org/citation.cfm?id=2487085.2487132 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. Hemmati, S. Nadi, O. Baysal, O. Kononenko, W. Wang, R. Holmes, and M. W. Godfrey. 2013. The MSR Cookbook: Mining a decade of research. In 2013 10th Working Conference on Mining Software Repositories (MSR). 343--352. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jian Huang, Ziming Zhuang, Jia Li, and C Lee Giles. 2008. Collaboration over time: Characterizing and modeling network evolution. In Proceedings of the 2008 International Conference on Web Search and Data Mining (WSDM'08). 107--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Jiang, L. Zhang, and L. Li. 2013. Understanding project dissemination on a social coding site. In 2013 20th Working Conference on Reverse Engineering (WCRE'13). 132--141.Google ScholarGoogle Scholar
  15. Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M German, and Daniela Damian. 2014. The promises and perils of mining GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR'14). 92--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M German, and Daniela Damian. 2016. An in-depth study of the promises and perils of mining GitHub. Empirical Software Engineering 21, 5 (2016), 2035--2071. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. David Laniado and Riccardo Tasso. 2011. Co-authorship 2.0: Patterns of collaboration in Wikipedia. In Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia (HT'11). 201--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Antonio Lima, Luca Rossi, and Mirco Musolesi. 2014. Coding Together at Scale: GitHub as a Collaborative Social Network. In Eighth International AAAI Conference on Weblogs and Social Media (ICWSM'14).Google ScholarGoogle Scholar
  19. Xiaoming Liu, Johan Bollen, Michael L Nelson, and Herbert Van de Sompel. 2005. Co-authorship networks in the digital library research community. Information Processing & Management 41, 6 (2005), 1462--1480. Google ScholarGoogle ScholarCross RefCross Ref
  20. Dmitry Lizorkin, Olena Medelyan, and Maria Grineva. 2009. Analysis of community structure in Wikipedia. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). 1221--1222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Luis Lopez-Fernandez, Gregorio Robles, Jesus M Gonzalez-Barahona, et al. 2004. Applying social network analysis to the information in CVS repositories. In International Workshop on Mining Software Repositories (MSR'04). 101--105.Google ScholarGoogle ScholarCross RefCross Ref
  22. A. Meneely and L. Williams. 2011. Socio-technical developer networks: should we trust our measurements?. In 2011 33rd International Conference on Software Engineering (ICSE). 281--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Mark EJ Newman. 2001. The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences 98, 2 (2001), 404--409.Google ScholarGoogle ScholarCross RefCross Ref
  24. Mark EJ Newman. 2004. Coauthorship networks and patterns of scientific collaboration. Proceedings of the National Academy of Sciences 101, 1 (2004), 5200--5205.Google ScholarGoogle ScholarCross RefCross Ref
  25. Mark EJ Newman. 2004. Who is the best connected scientist? A study of scientific coauthorship networks. In Complex networks. Springer, 337--370.Google ScholarGoogle Scholar
  26. Mark EJ Newman. 2006. Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103, 23 (2006), 8577--8582.Google ScholarGoogle ScholarCross RefCross Ref
  27. Mark EJ Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Physical review E 69, 2 (2004), 026113.Google ScholarGoogle Scholar
  28. Christian Staudt, Aleksejs Sazonovs, and Henning Meyerhenke. 2014. NetworKit: An Interactive Tool Suite for High-Performance Network Analysis. CoRR abs/1403.3005 (2014). http://arxiv.org/abs/1403.3005Google ScholarGoogle Scholar
  29. Christian L Staudt and Henning Meyerhenke. 2016. Engineering parallel algorithms for community detection in massive networks. IEEE Transactions on Parallel and Distributed Systems 27, 1 (2016), 171--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Didi Surian, David Lo, and Ee-Peng Lim. 2010. Mining collaboration patterns from a large developer network. In 17th Working Conference on Reverse Engineering (WCRE'10). 269--273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Daniel Terdiman. 2012. Forget LinkedIn: Companies turn to GitHub to find tech talent. Retrieved August 24, 2017 from https://www.cnet.com/news/forget-linkedin-companies-turn-to-github-to-find-tech-talentGoogle ScholarGoogle Scholar
  32. Ferdian Thung, Tegawende F Bissyande, David Lo, and Lingxiao Jiang. 2013. Network structure of social coding in GitHub. In 17th European Conference on Software Maintenance and Reengineering (CSMR'13). 323--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jin Xu, Yongqin Gao, Scott Christley, and Gregory Madey. 2005. A topological analysis of the open souce software development community. In Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05). 198a--198a. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yue Yu, Gang Yin, Huaimin Wang, and Tao Wang. 2014. Exploring the Patterns of Social Behavior in GitHub. In Proceedings of the 1st International Workshop on Crowd-based Software Development Methods and Technologies (CrowdSoft'14). 31--36. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    MSR '18: Proceedings of the 15th International Conference on Mining Software Repositories
    May 2018
    627 pages
    ISBN:9781450357166
    DOI:10.1145/3196398

    Copyright © 2018 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 28 May 2018

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Upcoming Conference

    ICSE 2025

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader