Abstract
This paper develops methods to estimate the factors that affect the impact of open-source software (OSS), measured by number of downloads, with a study of Python and R packages. The OSS community is characterized by a high level of collaboration and sharing which results in interactions between contributors as well as packages due to reuses. We use data collected from Depsy.org about the development activities of Python and R packages, and generate the dependency and contributor networks. We develop three Quasi-Poisson models for each of the Python and R communities using network characteristics, as well as author and package attributes. We find that the more derivative a package is (the more dependencies it has), the less likely it is to have a high impact. We also show that the centrality of a package in the dependency network measured by the out-degree, closeness centrality, and pagerank has a significant effect on its impact. Moreover, the closeness and weighted degree centralities of the developers in the Python and R contributor networks play an important role. We also find that introducing network features to a baseline model using only package features (e.g., number of authors, number of commits) improves the performance of the models.
Similar content being viewed by others
Notes
Other network characteristics were removed because of the high correlations with the other measures included in the models.
References
Abbasi A, Altmann J, Hossain L (2011) Identifying the effects of co-authorship networks on the performance of scholars: a correlation and regression analysis of performance measures and social network analysis measures. J Informetr 5(4):594–607
Acedo FJ, Barroso C, Casanueva C, Galán JL (2006) Co-authorship in management and organizational studies: an empirical and network analysis. J Manag Stud 43(5):957–983
Bastian M, Heymann S, Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks. In: International AAAI conference on weblogs and social media. http://www.aaai.org/ocs/index.php/ICWSM/09/paper/view/154. Accessed 5 Dec 2019
Bosquet C, Combes PP (2013) Do large departments make academics more productive? Agglomeration and peer effects in research. CEPR Discussion Paper No. DP9401. https://ssrn.com/abstract=2244081. Accessed 5 Dec 2019
Cook RD (1977) Detection of influential observation in linear regression. Technometrics 19(1):15–18
CRAN: The comprehensive R archive network (1997) https://cran.r-project.org/. Accessed 5 Dec 2019
Django: Django overview. https://www.djangoproject.com/start/overview/
Ductor L (2015) Does co-authorship lead to higher academic productivity? Oxford Bulletin of Econ. and Stat. 77(3):385–407
Ductor L, Fafchamps M, Goyal S, van der Leij MJ (2014) Social networks and research output. Rev Econ Stat 96(5):936–948
Goyal S, Van Der Leij MJ, Moraga-González JL (2006) Economics: an emerging small world. J Polit Econ 114(2):403–412
Greenstein S, Nagle F (2014) Digital dark matter and the economic contribution of Apache. Res Policy 43(4):623–631
Grossman JW (2002) The evolution of the mathematical research collaboration graph. Congressus Numerantium, pp 201–212
Hirsch JE (2005) An index to quantify an individual’s scientific research output. Proc Nat Acad Sci USA 102(46):16569
Howison J, Bullard J (2016) Software in the scientific literature: problems with seeing, finding, and using software mentioned in the biology literature. J Assoc Inf Sci Technol 67(9):2137–2155
Howison J, Deelman E et al (2015) Understanding the scientific software ecosystem and its impact: current and future measures. Res Eval 24(4):454–470. https://doi.org/10.1093/reseval/rvv014
IEEE Spectrum: IEEE top programming languages: design, methods, and data (2018a) https://spectrum.ieee.org/static/ieee-top-programming-languages-2018-methods. Accessed 5 Dec 2019
IEEE Spectrum: Interactive: The top programming languages 2018 (2018b) https://spectrum.ieee.org/static/interactive-the-top-programming-languages-2018. Accessed 5 Dec 2019
Ihaka R (2017) The R project: a brief history and thoughts about the future. https://www.stat.auckland.ac.nz/~ihaka/downloads/Massey.pdf. Accessed 5 Dec 2019
Impact Story (2012) https://impactstory.org
Keller S, Korkmaz G, Robbins C, Shipp S (2018) Opportunities to observe and measure intangible inputs to innovation: definitions, operationalization, and examples. Proc Natl Acad Sci 115(50):12638–12645
Korkmaz G, Kelling C, Robbins C, Keller SA (2018) Modeling the impact of R packages using dependency and contributor networks. In: 2018 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE, pp 511–514
Krivitsky PN (2012) Exponential-family random graph models for valued networks. Electron J Stat 6:1100
Kumar S (2015) Co-authorship networks: a review of the literature. Aslib J Inf Manag 67(1):55–73
Lambiotte R, Delvenne JC, Barahona M (2008) Laplacian dynamics and multiscale modular structure in networks. arXiv preprint arXiv:0812.1770
Lee S, Bozeman B (2005) The impact of research collaboration on scientific productivity. Soc Stud Sci 35:673–702
Moody J (2004) The structure of a social science collaboration network: disciplinary cohesion from 1963 to 1999. Am Sociol Rev 69(2):213–238
Muenchen B (2017) R’s growth continues to accelerate
Newman ME (2001a) Scientific collaboration networks. i. Network construction and fundamental results. Phys Rev E 64(1):1–8
Newman ME (2001b) Scientific collaboration networks ii Shortest paths, weighted networks, and centrality. Phys Rev E 64(1):1–7
Newman ME (2001c) The structure of scientific collaboration networks. PNAS 98(2):404–409
Newman ME (2004) Coauthorship networks and patterns of scientific collaboration. PNAS 101(suppl 1):5200–5205
Octoverse: the state of the Octoverse (2018) https://octoverse.github.com
Open Source Initiative (1998) https://opensource.org/osd
Piwowar H, Priem J (2016) Depsy: valuing the software that powers science. https://github.com/Impactstory/depsy-research/blob/master/introducing_depsy.md
Plone: About plone. https://plone.com/about
PyPI: Python Package Index (PyPI) https://pypi.org/
PYPL: PYPL PopularitY of Programming Language (2019) http://pypl.github.io/PYPL.html
Robbins C, Korkmaz G, Calderon JBS, Kelling C, Shipp SS, Keller S (2018) The scope and impact of open source software: a framework for analysis and preliminary cost estimates. In: 35th international association for research on income and wealth (IARIW) general conference. IARIW
Robbins C, Korkmaz G, Calderon JBS, Chen D, Schroeder A, Kelling C, Shipp SS, Keller S (2019) The scope and impact of open source software as intangible capital: a framework for measurement with an application based on the use of r packages. In: Big data for 21st century economic statistics. University of Chicago Press
Rossum GV (2009) A brief timeline of Python. https://python-history.blogspot.com/2009/01/brief-timeline-of-python.html. Accessed 5 Dec 2019
Singh Chawla D (2016) The unsung heroes of scientific software. Nat News 529(7584):115
Stack OverFlow: Stack Overflow developer survey results: programming, scripting, and markup languages (2018) https://insights.stackoverflow.com/survey/2018/#technology-programming-scripting-and-markup-languages
Thiemichen S, Friel N, Caimo A, Kauermann G (2016) Bayesian exponential random graph models with nodal random effects. Soc Netw 46:11–28
TIOBE: TIOBE Index for January 2019 (2019) https://www.tiobe.com/tiobe-index/
Ube: Project ube. https://pypi.org/project/ube/
Uddin S, Hossain L, Rasmussen K (2013) Network effects on scientific collaborations. PLoS ONE 8(2):e57546
Venables B, Smith D, Gentleman R, Ihaka R (1998) Notes on R: a programming environment for data analysis and graphics
Ver Hoef JM, Boveng PL (2007) Quasi-poisson versus negative binomial regression: How should we model overdispersed count data? Ecology 88(11):2766–2772
Wikipedia contributors: Abc (programming language)—Wikipedia, the free encyclopedia (2018) https://en.wikipedia.org/w/index.php?title=ABC_(programming_language)&oldid=852622792. Accessed 4 Feb 2019
Wikipedia contributors: Centrum wiskunde & informatica—Wikipedia, the free encyclopedia (2018) https://en.wikipedia.org/w/index.php?title=Centrum_Wiskunde%26_Informatica&oldid=870200085. Accessed 4 Feb 2019
Yan E, Ding Y (2009) Applying centrality measures to impact analysis: a coauthorship network analysis. J Assoc Inf Sci Technol 60(10):2107–2118
Yan E, Ding Y, Zhu Q (2010) Mapping library and information science in china: a coauthorship network analysis. Scientometrics 83:115–131
Ye Q, Li T, Law R (2013) A coauthorship network analysis of tourism and hospitality research collaboration. J Hosp Tour Res 37(1):51–76
Zhao R, Wei M (2017) Impact evaluation of open source software: an altmetrics perspective. Scientometrics 110:1017–33
Acknowledgements
This material is based on work supported by US Department of Agriculture (58-3AEU-7-0074) and the National Science Foundation under IGERT Grant DGE-1144860, Big Data Social Science. We acknowledge the Data Science for the Public Good Program 2017 participants Daniel Chen, Sayali Phadke, Eirik Iversen, and Ben Swartz.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A preliminary version of the paper appeared in the Proceedings of 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (Korkmaz et al. 2018).
Rights and permissions
About this article
Cite this article
Korkmaz, G., Kelling, C., Robbins, C. et al. Modeling the impact of Python and R packages using dependency and contributor networks. Soc. Netw. Anal. Min. 10, 7 (2020). https://doi.org/10.1007/s13278-019-0619-1
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-019-0619-1