skip to main content
10.1145/3510003.3510199acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

An exploratory study of deep learning supply chain

Published: 05 July 2022 Publication History

Abstract

Deep learning becomes the driving force behind many contemporary technologies and has been successfully applied in many fields. Through software dependencies, a multi-layer supply chain (SC) with a deep learning framework as the core and substantial down-stream projects as the periphery has gradually formed and is constantly developing. However, basic knowledge about the structure and characteristics of the SC is lacking, which hinders effective support for its sustainable development. Previous studies on software SC usually focus on the packages in different registries without paying attention to the SCs derived from a single project. We present an empirical study on two deep learning SCs: TensorFlow and PyTorch SCs. By constructing and analyzing their SCs, we aim to understand their structure, application domains, and evolutionary factors. We find that both SCs exhibit a short and sparse hierarchy structure. Overall, the relative growth of new projects increases month by month. Projects have a tendency to attract downstream projects shortly after the release of their packages, later the growth becomes faster and tends to stabilize. We propose three criteria to identify vulnerabilities and identify 51 types of packages and 26 types of projects involved in the two SCs. A comparison reveals their similarities and differences, e.g., TensorFlow SC provides a wealth of packages in experiment result analysis, while PyTorch SC contains more specific framework packages. By fitting the GAM model, we find that the number of dependent packages is significantly negatively associated with the number of downstream projects, but the relationship with the number of authors is nonlinear. Our findings can help further open the "black box" of deep learning SCs and provide insights for their healthy and sustainable development.

References

[1]
Ahmad Abdellatif, Diego Costa, Khaled Badran, Rabe Abdalkareem, and Emad Shihab. 2020. Challenges in Chatbot Development: A Study of Stack Overflow Posts. In MSR '20: 17th International Conference on Mining Software Repositories, Seoul, Republic of Korea, 29--30 June, 2020, Sunghun Kim, Georgios Gousios, Sarah Nadi, and Joseph Hejderup (Eds.). ACM, 174--185.
[2]
Mehdi Bagherzadeh and Raffi Khatchadourian. 2019. Going big: a large-scale study on what big data developers ask. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26--30, 2019, Marlon Dumas, Dietmar Pfahl, Sven Apel, and Alessandra Russo (Eds.). ACM, 432--442.
[3]
Gabriele Bavota, Gerardo Canfora, Massimiliano Di Penta, Rocco Oliveto, and Sebastiano Panichella. 2013. The Evolution of Project Inter-dependencies in a Software Ecosystem: The Case of Apache. In 2013 IEEE International Conference on Software Maintenance. 280--289.
[4]
Christopher Bogart, Christian Kästner, and James Herbsleb. 2015. When It Breaks, It Breaks: How Ecosystem Developers Reason about the Stability of Dependencies. In 2015 30th IEEE/ACM International Conference on Automated Software Engineering Workshop (ASEW). 86--89.
[5]
Hudson Borges, André C. Hora, and M. T. Valente. 2016. Understanding the Factors That Impact the Popularity of GitHub Repositories. 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME) (2016), 334--344.
[6]
Budek, Konrad and Tautkute-Rustecka, Ivona. 2020. PyTorch vs. TensorFlow - a Detailed Comparison. https://www.tooploox.com/blog/pytorch-vs-tensorflow-a-detailed-comparison [Online; accessed 27-July-2021].
[7]
Caffe2 Community. 2021. Caffe2 and PyTorch join forces to create a Research + Production platform PyTorch 1.0. https://caffe2.ai/blog/2018/05/02/Caffe2_PyTorch_1_0.html [Online; accessed 23-July-2021].
[8]
Joel Cox, Eric Bouwers, Marko van Eekelen, and Joost Visser. 2015. Measuring Dependency Freshness in Software Systems. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 2. 109--118.
[9]
Daniela S. Cruzes and Tore Dyba. 2011. Recommended Steps for Thematic Synthesis in Software Engineering. In 2011 International Symposium on Empirical Software Engineering and Measurement. 275--284.
[10]
Alexandre Decan, Tom Mens, and Maelick Claes. 2016. On the Topology of Package Dependency Networks: A Comparison of Three Programming Language Ecosystems. In Proccedings of the 10th European Conference on Software Architecture Workshops (Copenhagen, Denmark) (ECSAW '16). Association for Computing Machinery, New York, NY, USA, Article 21, 4 pages.
[11]
Alexandre Decan, Tom Mens, and Maëlick Claes. 2017. An Empirical Comparison of Dependency Issues in OSS Packaging Ecosystems. In 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2--12.
[12]
Alexandre Decan, Tom Mens, and Eleni Constantinou. 2018. On the Impact of Security Vulnerabilities in the Npm Package Dependency Network. In Proceedings of the 15th International Conference on Mining Software Repositories (Gothenburg, Sweden) (MSR '18). Association for Computing Machinery, New York, NY, USA, 181--191.
[13]
Li Deng and Dong Yu. 2014. Deep learning: methods and applications. Foundations and trends in signal processing 7, 3--4 (2014), 197--387.
[14]
Tapajit Dey, Yuxing Ma, and Audris Mockus. 2019. Patterns of Effort Contribution and Demand and User Classification Based on Participation Patterns in NPM Ecosystem (PROMISE'19). Association for Computing Machinery, New York, NY, USA, 36--45.
[15]
Tapajit Dey and Audris Mockus. 2018. Are Software Dependency Supply Chain Metrics Useful in Predicting Change of Popularity of NPM Packages?. In Proceedings of the 14th International Conference on Predictive Models and Data Analytics in Software Engineering (Oulu, Finland) (PROMISE'18). Association for Computing Machinery, New York, NY, USA, 66--69.
[16]
Xuedan Du, Yinghao Cai, Shuo Wang, and Leijie Zhang. 2016. Overview of Deep Learning. In 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC). IEEE, 159--164.
[17]
Daniel M. German, Bram Adams, and Ahmed E. Hassan. 2013. The Evolution of the R Software Ecosystem. In 2013 17th European Conference on Software Maintenance and Reengineering. 243--252.
[18]
Junxiao Han, Shuiguang Deng, David Lo, Chen Zhi, Jianwei Yin, and Xin Xia. 2020. An Empirical Study of the Dependency Networks of Deep Learning Libraries. In 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). 868--878.
[19]
Junxiao Han, Emad Shihab, Zhiyuan Wan, Shuiguang Deng, and Xin Xia. 2020. What do Programmers Discuss about Deep Learning Frameworks. Empir. Softw. Eng. 25, 4 (2020), 2694--2747.
[20]
Mubin Ul Haque, Leonardo Horn Iwaya, and Muhammad Ali Babar. 2020. Challenges in Docker Development: A Large-scale Study Using Stack Overflow. In ESEM '20: ACM / IEEE International Symposium on Empirical Software Engineering and Measurement, Bari, Italy, October 5--7, 2020, Maria Teresa Baldassarre, Filippo Lanubile, Marcos Kalinowski, and Federica Sarro (Eds.). ACM, 7:1--7:11.
[21]
Trevor J Hastie and Robert J Tibshirani. 2017. Generalized additive models. Routledge.
[22]
Sarika Jalan and Camellia Sarkar. 2017. Complex Networks: an emerging branch of science. Physics News 47 (2017), 3--4.
[23]
Jeff Hale. 2020. Is PyTorch Catching TensorFlow? https://towardsdatascience.com/is-pytorch-catching-tensorflow-ca88f9128304 [Online; accessed 27-July-2021].
[24]
Riivo Kikas, Georgios Gousios, Marlon Dumas, and Dietmar Pfahl. 2017. Structure and Evolution of Package Dependency Networks. In Proceedings of the 14th International Conference on Mining Software Repositories (Buenos Aires, Argentina) (MSR '17). IEEE Press, 102--112.
[25]
Kurama, Vihar. 2021. PyTorch vs. TensorFlow: Which Framework Is Best for Your Deep Learning Project? https://builtin.com/data-science/pytorch-vs-tensorflow [Online; accessed 27-July-2021].
[26]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep Learning. nature 521, 7553 (2015), 436--444.
[27]
Wanwangying Ma, Lin Chen, Xiangyu Zhang, Yuming Zhou, and Baowen Xu. 2017. How Do Developers Fix Cross-Project Correlated Bugs? A Case Study on the GitHub Scientific Python Ecosystem. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). 381--392.
[28]
Yuxing Ma. 2018. Constructing Supply Chains in Open Source Software. In 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion). 458--459.
[29]
Yuxing Ma, Chris Bogart, Sadika Amreen, Russell Zaretzki, and Audris Mockus. 2019. World of Code: An Infrastructure for Mining the Universe of Open Source VCS Data. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). 143--154.
[30]
Edward R Mansfield and Billy P Helms. 1982. Detecting multicollinearity. The American Statistician 36, 3a (1982), 158--160.
[31]
Patrick E McKnight and Julius Najab. 2010. Mann-Whitney U Test. The Corsini encyclopedia of psychology (2010), 1--1.
[32]
Nicolas D. Jimenez. 2021. TensorFlow Sucks. https://nicodjimenez.github.io/2017/10/08/tensorflow.html [Online; accessed 23-July-2021].
[33]
Marc Ohm, Henrik Plate, Arnold Sykosch, and Michael Meier. 2020. Backstabber's Knife Collection: A Review of Open Source Software Supply Chain Attacks. CoRR abs/2005.09535 (2020). arXiv:2005.09535 https://arxiv.org/abs/2005.09535
[34]
Annibale Panichella, Bogdan Dit, Rocco Oliveto, Massimiliano Di Penta, Denys Poshyvanyk, and Andrea De Lucia. 2013. How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms. In 35th International Conference on Software Engineering, ICSE '13, San Francisco, CA, USA, May 18--26, 2013, David Notkin, Betty H. C. Cheng, and Klaus Pohl (Eds.). IEEE Computer Society, 522--531.
[35]
Sebastiano Panichella. 2014. How the Apache Community Upgrades Dependencies: An Evolutionary Study. Empirical Software Engineering (06 2014).
[36]
Martin F Porter. 1980. An algorithm for suffix stripping. Program (1980).
[37]
R documentation contributors. 2021. GAM: Generalized Additive Models with Integrated Smoothness Estimation. https://www.rdocumentation.org/packages/mgcv/versions/1.8-36/topics/gam [Online; accessed 27-July-2021].
[38]
Willi Sauerbrei, Aris Perperoglou, Matthias Schmid, Michal Abrahamowicz, Heiko Becher, Harald Binder, Daniela Dunkler, Frank E Harrell, Patrick Royston, and Georg Heinze. 2020. State of the art in selection of variables and functional forms in multivariable analysis---outstanding issues. Diagnostic and prognostic research 4, 1 (2020), 1--18.
[39]
Abhishek Sharma, Ferdian Thung, Pavneet Singh Kochhar, Agus Sulistya, and David Lo. 2017. Cataloging GitHub Repositories. In Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering, EASE 2017, Karlskrona, Sweden, June 15--16, 2017, Emilia Mendes, Steve Counsell, and Kai Petersen (Eds.). ACM, 314--319.
[40]
Christoph Treude and Markus Wagner. 2019. Predicting good configurations for GitHub and stack overflow topic models. In Proceedings of the 16th International Conference on Mining Software Repositories, MSR 2019, 26--27 May 2019, Montreal, Canada, Margaret-Anne D. Storey, Bram Adams, and Sonia Haiduc (Eds.). IEEE / ACM, 84--95.
[41]
Athanasios Voulodimos, Nikolaos Doulamis, George Bebis, and Tania Stathaki. 2018. Recent developments in deep learning for engineering applications.
[42]
Wikipedia contributors. 2021. Betweenness centrality --- Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Betweenness_centrality&oldid=1034163048 [Online; accessed 23-July-2021].
[43]
Wikipedia contributors. 2021. Microsoft Cognitive Toolkit. https://en.wikipedia.org/wiki/Microsoft_Cognitive_Toolkit [Online; accessed 23-July-2021].
[44]
Erik Wittern, Philippe Suter, and Shriram Rajagopalan. 2016. A Look at the Dynamics of the JavaScript Package Ecosystem. In 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR). 351--361.
[45]
Ahmed Zerouali, Eleni Constantinou, Tom Mens, Gregorio Robles, and Jesús González-Barahona. 2018. An empirical analysis of technical lag in npm package dependencies. In International Conference on Software Reuse. Springer, 95--110.
[46]
Minghui Zhou, Yuxia Zhang, and Xin Tan. 2019. Software digital sociology. SCIENTIA SINICA Informationis 49, 11 (2019), 1399--1411.

Cited By

View all
  • (2025)Characterizing and detecting Python version incompatibilities caused by inconsistent version specificationsJournal of Systems and Software10.1016/j.jss.2025.112337222(112337)Online publication date: Apr-2025
  • (2024)Understanding the OSS Communities of Deep Learning Frameworks: A Comparative Case Study of PyTorch and TensorFlowACM Transactions on Software Engineering and Methodology10.1145/370530334:3(1-30)Online publication date: 23-Nov-2024
  • (2024)MSR4SBOM: Mining Software Repositories for enhanced Software Bills of MaterialsProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3695390(589-593)Online publication date: 24-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '22: Proceedings of the 44th International Conference on Software Engineering
May 2022
2508 pages
ISBN:9781450392211
DOI:10.1145/3510003
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 July 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep learning
  2. open source
  3. software evolution
  4. software structure
  5. software supply chain

Qualifiers

  • Research-article

Funding Sources

  • the National Natural Science Foundation of China Grant

Conference

ICSE '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)169
  • Downloads (Last 6 weeks)15
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Characterizing and detecting Python version incompatibilities caused by inconsistent version specificationsJournal of Systems and Software10.1016/j.jss.2025.112337222(112337)Online publication date: Apr-2025
  • (2024)Understanding the OSS Communities of Deep Learning Frameworks: A Comparative Case Study of PyTorch and TensorFlowACM Transactions on Software Engineering and Methodology10.1145/370530334:3(1-30)Online publication date: 23-Nov-2024
  • (2024)MSR4SBOM: Mining Software Repositories for enhanced Software Bills of MaterialsProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3695390(589-593)Online publication date: 24-Oct-2024
  • (2024)Decoding Web3: In-depth Analysis of the Third-Party Package Supply ChainProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671402(457-466)Online publication date: 24-Jul-2024
  • (2024)Interoperability in Deep Learning: A User Survey and Failure Analysis of ONNX Model ConvertersProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680374(1466-1478)Online publication date: 11-Sep-2024
  • (2024)PeaTMOSS: A Dataset and Initial Analysis of Pre-Trained Models in Open-Source SoftwareProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644907(431-443)Online publication date: 15-Apr-2024
  • (2024)Characterizing Deep Learning Package Supply Chains in PyPI: Domains, Clusters, and DisengagementACM Transactions on Software Engineering and Methodology10.1145/364033633:4(1-27)Online publication date: 10-Jan-2024
  • (2024)BOMs Away! Inside the Minds of Stakeholders: A Comprehensive Study of Bills of Materials for Software SystemsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623347(1-13)Online publication date: 20-May-2024
  • (2024)To Share or Hide: Confidential Model Compilation as a Service with Privacy-Preserving Transparency2024 43rd International Symposium on Reliable Distributed Systems (SRDS)10.1109/SRDS64841.2024.00022(126-138)Online publication date: 30-Sep-2024
  • (2024)Sustainability Forecasting for Deep Learning Packages2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER60148.2024.00106(981-992)Online publication date: 12-Mar-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media