On the impact of using trivial packages: an empirical case study on npm and PyPI

Abdalkareem, Rabe; Oda, Vinicius; Mujahid, Suhaib; Shihab, Emad

doi:10.1007/s10664-019-09792-9

On the impact of using trivial packages: an empirical case study on npm and PyPI

Published: 09 January 2020

Volume 25, pages 1168–1204, (2020)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Rabe Abdalkareem ORCID: orcid.org/0000-0001-9914-5434¹,
Vinicius Oda¹,
Suhaib Mujahid¹ &
…
Emad Shihab¹

1415 Accesses
1 Altmetric
Explore all metrics

Abstract

Code reuse has traditionally been encouraged since it enables one to avoid re-inventing the wheel. Due to the npm left-pad package incident where a trivial package led to the breakdown of some of the most popular web applications such as Facebook and Netflix, some questioned such reuse. Reuse of trivial packages is particularly prevalent in platforms such as npm. To date, there is no study that examines the reason why developers reuse trivial packages other than in npm. Therefore, in this paper, we study two large platforms npm and PyPI. We mine more than 500,000 npm packages and 38,000 JavaScript applications and more than 63,000 PyPI packages and 14,000 Python applications to study the prevalence of trivial packages. We found that trivial packages are common, making up between 16.0% to 10.5% of the studied platforms. We performed surveys with 125 developers who use trivial packages to understand the reasons and drawbacks of their use. Our surveys revealed that trivial packages are used because they are perceived to be well implemented and tested pieces of code. However, developers are concerned about maintaining and the risks of breakages due to the extra dependencies trivial packages introduce. To objectively verify the survey results, we validate the most cited reason and drawback. We find that contrary to developers’ beliefs only around 28% of npm and 49% PyPI trivial packages have tests. However, trivial packages appear to be ‘deployment tested’ and to have similar test, usage and community interest as non-trivial packages. On the other hand, we found that 18.4% and 2.9% of the studied trivial packages have more than 20 dependencies in npm and PyPI, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Helping or not helping? Why and how trivial packages impact the npm ecosystem

Article 02 March 2021

Technical leverage analysis in the Python ecosystem

Article Open access 13 October 2023

What are the characteristics of popular APIs? A large-scale study on Java, Android, and 165 libraries

Article 29 November 2019

Notes

In this paper, we use the term package to refer to a software library that is published on the studied package management platforms.
Note that if a package is required in the application, but does not exist, it will break the application.
It is important to note that the motivation and full derivation (e.g., why they put a weight of 0.15 on the test coverage, etc.) of the metrics is beyond the scope of this paper. We refer interested readers to the npms documentation for more details (Cruz and Duarte 2017). To make our paper self-sufficient, we include how the metrics are calculated here.
we modified the npm code to intercept the install call and counted the installations needed for every package.

References

Abate P, Di Cosmo R, Boender J, Zacchiroli S (2009) Strong dependencies between software components. In: Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement, ESEM ’09, IEEE Computer Society, pp 89–99
Abdalkareem R (2017) Reasons and drawbacks of using trivial npm packages: The developers’ perspective. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, ACM, pp 1062–1064
Abdalkareem R, Nourry O, Wehaibi S, Mujahid S, Shihab E (2017) Why do developers use trivial packages? an empirical case study on npm. In: Proceedings of the 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE ’17, ACM, pp 385–395
Abdalkareem R, Oda V, Mujahid S, Shihab E (2019) On the impact of using trivial packages: An empirical case study on npm and pypi. https://doi.org/10.5281/zenodo.3095009
Abdalkareem R, Shihab E, Rilling J (2017) On code reuse from Stack Overflow : An exploratory study on Android apps. Inf Softw Technol 88(C):148–158
Article Google Scholar
Abdalkareem R, Shihab E, Rilling J (2017) What do developers use the crowd for? a study using Stack Overflow. IEEE Softw 34(2):53–60
Article Google Scholar
Baltes S, Diehl S (2018) Usage and attribution of Stack Overflow code snippets in gitHub projects. Empirical Software Engineering
Basili VR, Briand LC, Melo WL (1996) How reuse influences productivity in object-oriented systems. Commun ACM 39(10):104–116
Article Google Scholar
Bavota G, Canfora G, Penta MD, Oliveto R, Panichella S (2013) The evolution of project inter-dependencies in a software ecosystem: The case of Apache. In: Proceedings of the 2013 IEEE International Conference on Software Maintenance, ICSM ’13, IEEE Computer Society, pp 280–289
Blais M snakefood: Python Dependency Graphs. http://furius.ca/snakefood/. (accessed on 09/23/2018)
Bloemen R, Amrit C, Kuhlmann S, Ordóñez Matamoros G (2014) Gentoo package dependencies over time. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR ’14, ACM, pp 404–407
Bogart C, Kastner C, Herbsleb J (2015) When it breaks, it breaks: How ecosystem developers reason about the stability of dependencies. In: Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering Workshop, ASEW ’15, IEEE Computer Society, pp 86–89
Bogart C, Kästner C, Herbsleb J, Thung F (2016) How to break an API: Cost negotiation and community values in three software ecosystems. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE ’16, ACM, pp 109–120
Bower (2012) Bower a package manager for the web. https://bower.io/. (accessed on 08/23/2016)
Castelluccio M, An L, Khomh F (2019) An empirical study of patch uplift in rapid release development pipelines. Empir Softw Eng 24(5):3008–3044
Article Google Scholar
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20:37–46
Article Google Scholar
Cruz A, Duarte A (2017) npms. https://npms.io/. (accessed on 02/20/2017)
de Souza CRB, Redmiles DF (2008) An empirical study of software developers’ management of dependencies and changes. In: Proceedings of the 30th International Conference on Software Engineering, ICSE ’08, ACM, pp 241–250
Decan A, Mens T, Constantinou E (2018a) On the impact of security vulnerabilities in the npm package dependency network. In: International Conference on Mining Software Repositories
Decan A, Mens T, Grosjean P (2018b) An empirical comparison of dependency network evolution in seven software packaging ecosystems. Empirical Software Engineering
Decan A, Mens T, Grosjean P, et al. (2016) When github meets CRAN: an analysis of inter-repository package dependency problems. In: Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering, volume 1 of SANER ’16, IEEE, pp 493–504
Di Cosmo R, Di Ruscio D, Pelliccione P, Pierantonio A, Zacchiroli S (2011) Supporting software evolution in component-based FOSS systems. Sci Comput Program 76(12):1144–1160
Article Google Scholar
Dogguy M, Glondu S, Le Gall S, Zacchiroli S (2011) Enforcing type-Safe linking using inter-package relationships. Studia Informatica Universalis 9(1):129–157
Google Scholar
Ebert C, Cain J (2016) Cyclomatic complexity. IEEE Softw 33(6):27–29
Article Google Scholar
Fleiss JL, Cohen J (1973) The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Meas 33:613–619
Article Google Scholar
Flyvbjerg B (2006) Five misunderstandings about case-study research. Qual Inq 12(2):219–245
Article Google Scholar
Fuchs T (2016) What if we had a great standard library in JavaScript? – medium. https://medium.com/@thomasfuchs/what-if-we-had-a-great-standard-library-in-javascript-52692342ee3f.pw7d4cq8j. (accessed on 02/24/2017)
German D, Adams B, Hassan A (2013) Programming language ecosystems: the evolution of R. In: Proceedings of the 17th European Conference on Software Maintenance and Reengineering, CSMR ’13, IEEE, pp 243–252
Gousios G, Vasilescu B, Serebrenik A, Zaidman A (2014) Lean ghtorrent: Github data on demand. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR ’14, ACM, pp 384–387
Grissom RJ, Kim JJ (2005) Effect sizes for research: A broad practical approach. Lawrence Erlbaum Associates Publishers
Haefliger S, Von Krogh G, Spaeth S (2008) Code reuse in open source software. Manag Sci 54(1):180–193
Article Google Scholar
Haney D (2016) Npm & left-pad: Have we forgotten how to program? http://www.haneycodes.net/npm-left-pad-have-we-forgotten-how-to-program/. (accessed on 08/10/2016)
Harris R (2015) Small modules: it’s not quite that simple. https://medium.com/@Rich_Harris/small-modules-it-s-not-quite-that-simple-3ca532d65de4. (accessed on 08/24/2016)
Hemanth HM (2015) One-line node modules -issue#10- sindresorhus/ama. https://github.com/sindresorhus/ama/issues/10. (accessed on 08/10/2016)
Höst M, Regnell B, Wohlin C (2000) Using students as subjects—a comparative study of students and professionals in lead-time impact assessment. Empir Softw Eng 5(3):201–214
Article Google Scholar
Hunter JE (2001) The desperate need for replications. J Consum Res 28(1):149–158
Article Google Scholar
Inoue K, Sasaki Y, Xia P, Manabe Y (2012) Where does this code come from and where does it go? - integrated code history tracker for open source systems -. In: Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, IEEE Press, pp 331–341
Kabbedijk J, Jansen S (2011) Steering insight: An exploration of the Ruby software ecosystem. In: Proceedings of the Second International Conference of Software Business, ICSOB ’11, Springer, pp 44–55
Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining gitHub. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR ’14, ACM, pp 92–101
Kula RG, Roover CD, German DM, Ishio T, Inoue K (2018) A generalized model for visualizing library popularity, adoption, and diffusion within a software ecosystem. In: 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering, volume 00 of SANER ’18, pp 288–299
Libraries.io. Libraries.io - the open source discovery service. https://libraries.io/. (accessed on 05/20/2018)
Libraries.io (2017) Pypi. https://libraries.io/pypi. (accessed on 03/08/2017)
Lim WC (1994) Effects of reuse on quality, productivity, and economics. IEEE Softw 11(5):23–30
Article Google Scholar
Macdonald F (2016) A programmer almost broke the Internet last week by deleting 11 lines of code. http://www.sciencealert.com/how-a-programmer-almost-broke-the-internet-by-deleting-11-lines-of-code. (accessed on 08/24/2016)
Manikas K (2016) Revisiting software ecosystems research: a longitudinal literature study. J Syst Softw 117:84–103
Article Google Scholar
McCamant S, Ernst MD (2003) Predicting problems caused by component upgrades. In: Proceedings of the 9th European Software Engineering Conference Held Jointly with 11th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ESEC/FSE ’03, ACM, pp 287–296
Mirhosseini S, Parnin C (2017) Can automated pull requests encourage software developers to upgrade out-of-date dependencies? In: Proceedings of the 32Nd IEEE/ACM International Conference on Automated Software Engineering, ASE ’17, IEEE Press, pp 84–94
Mockus A (2007) Large-scale code reuse in open source software. In: Proceedings of the First International Workshop on Emerging Trends in FLOSS Research and Development, FLOSS ’07, IEEE Computer Society, p 7–
Mohagheghi P, Conradi R, Killi OM, Schwarz H (2004) An empirical study of software reuse vs. defect-density and stability. In: Proceedings of the 26th International Conference on Software Engineering, ICSE ’04, IEEE Computer Society, pp 282–292
npm (2016) What is npm? — node package managment documentation. https://docs.npmjs.com/getting-started/what-is-npm. (accessed on 08/14/2016)
npm Blog T (2016) The npm blog changes to npm’s unpublish policy. http://blog.npmjs.org/post/141905368000/changes-to--unpublish-policy. (accessed on 08/11/2016)
Orsila H, Geldenhuys J, Ruokonen A, Hammouda I (2008) Update propagation practices in highly reusable open source components. In: Proceedings of the 4th IFIP WG 2.13 International Conference on Open Source Systems, OSS ’08, pp 159–170
Patra J, Dixit PN, M. Pradel (2018) Conflictjs: Finding and understanding conflicts between javaScript libraries. In: Proceedings of the 40th International Conference on Software Engineering, ICSE ’18, ACM, pp 741–751
Python Python testing tools taxonomy - python wiki. https://wiki.python.org/moin/PythonTestingToolsTaxonomy. (accessed on 05/16/2018)
Rahman MT, Rigby PC, Shihab E (2019) The modular and feature toggle architectures of google chrome. Empir Softw Eng 24(2):826–853
Article Google Scholar
Ray B, Posnett D, Filkov V, Devanbu P (2014) A large scale study of programming languages and code quality in gitHub. In: Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE ’14, ACM, pp 155–165
Salman I, Misirli AT, Juristo N (2015) Are students representatives of professionals in software engineering experiments? In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, volume 1 of ICSE ’15, . IEEE, pp 666–676
SciTools Understand tool. https://scitools.com/. (accessed on 04/16/2019)
Seaman CB (1999) Qualitative methods in empirical studies of software engineering. IEEE Trans Softw Eng 25(4):557–572
Article Google Scholar
Singer J, Sim SE, Lethbridge TC (2008) Software engineering data collection for field studies. In: Guide to Advanced Empirical Software Engineering. Springer, london, pp 9–34
Chapter Google Scholar
Sjoberg DIK, Anda B, Arisholm E, Dyba T, Jorgensen M, Karahasanovic A, Koren EF, Vokac M (2002) Conducting realistic experiments in software engineering. In: Proceedings International Symposium on Empirical Software Engineering, IEEE, pp 17–26
Sojer M, Henkel J (2010) Code reuse in open source software development Quantitative evidence, drivers, and impediments. J Assoc Inf Syst 11(12):868–901
Google Scholar
Trockman A, Zhou S, Kästner C, Vasilescu B (2018) Adding sparkle to social coding: an empirical study of repository badges in the npm ecosystem. In: Proceedings of the International Conference on Software Engineering, ICSE ’18, ACM
Tsay J, Dabbish L, Herbsleb J (2014) Influence of social and technical factors for evaluating contribution in gitHub. In: Proceedings of the 36th International Conference on Software Engineering, ICSE ’14, ACM, pp 356–366
Valiev M, Vasilescu B, Herbsleb J (2018) Ecosystem-level determinants of sustained activity in open-source projects A case study of the pyPi ecosystem. In: Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE ’18. ACM
Vasilescu B, Yu Y, Wang H, Devanbu P, Filkov V (2015) Quality and productivity outcomes relating to continuous integration in gitHub. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE ’15, ACM, pp 805–816
Williams C (2016) How one developer just broke Node, Babel and thousands of projects in 11 lines of JavaScript. http://www.theregister.co.uk/2016/03/23/npm_left_pad_chaos. (accessed on 08/24/ 2016)
Wittern E, Suter P, Rajagopalan S (2016) A look at the dynamics of the javaScript package ecosystem. In: Proceedings of the 13th International Conference on Mining Software Repositories, MSR ’16, ACM, pp 351–361
Wu Y, Wang S, Bezemer C-P, Inoue K (2018) How do developers utilize source code from Stack Overflow? Empirical Software Engineering
Zambonini D (2011) A Practical Guide to Web App Success, chapter 20. Five Simple Steps. (accessed on 02/23/2017). In: Gregory O (ed)
Zhu J, Zhou M, Mockus A (2014) Patterns of folder use and project popularity: A case study of gitHub repositories. In: Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM ’14, ACM, pp 30:1–30:4

Download references

Acknowledgments

The authors are grateful to the many survey respondents who dedicated their valuable time to respond to our surveys. Also, the authors would like to thank the anonymous reviewers and the editor for their thoughtful feedback and suggestions that help us improve our study.

Author information

Authors and Affiliations

Data-Driven Analysis of Software (DAS) Lab, Department of Computer Science and Software Engineering, Concordia University, Montréal, Canada
Rabe Abdalkareem, Vinicius Oda, Suhaib Mujahid & Emad Shihab

Authors

Rabe Abdalkareem
View author publications
You can also search for this author in PubMed Google Scholar
Vinicius Oda
View author publications
You can also search for this author in PubMed Google Scholar
Suhaib Mujahid
View author publications
You can also search for this author in PubMed Google Scholar
Emad Shihab
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rabe Abdalkareem.

Additional information

Communicated by: Arie van Deursen

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abdalkareem, R., Oda, V., Mujahid, S. et al. On the impact of using trivial packages: an empirical case study on npm and PyPI. Empir Software Eng 25, 1168–1204 (2020). https://doi.org/10.1007/s10664-019-09792-9

Download citation

Published: 09 January 2020
Issue Date: March 2020
DOI: https://doi.org/10.1007/s10664-019-09792-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the impact of using trivial packages: an empirical case study on npm and PyPI

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Helping or not helping? Why and how trivial packages impact the npm ecosystem

Technical leverage analysis in the Python ecosystem

What are the characteristics of popular APIs? A large-scale study on Java, Android, and 165 libraries

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

On the impact of using trivial packages: an empirical case study on npm and PyPI

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Helping or not helping? Why and how trivial packages impact the npm ecosystem

Technical leverage analysis in the Python ecosystem

What are the characteristics of popular APIs? A large-scale study on Java, Android, and 165 libraries

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation