skip to main content
research-article

Open Source License Inconsistencies on GitHub

Published: 22 July 2023 Publication History

Abstract

Almost all software, open or closed, builds on open source software and therefore needs to comply with the license obligations of the open source code. Not knowing which licenses to comply with poses a legal danger to anyone using open source software. This article investigates the extent of inconsistencies between licenses declared by an open source project at the top level of the repository and the licenses found in the code. We analyzed a sample of 1,000 open source GitHub repositories. We find that about half of the repositories did not fully declare all licenses found in the code. Of these, approximately 10% represented a permissive vs. copyleft license mismatch. Furthermore, existing tools cannot fully identify licences. We conclude that users of open source code should not just look at the declared licenses of the open source code they intend to use, but rather examine the software to understand its actual licenses.
Appendix

A Statistics

Table A.1.
 statisticp-value
size185.19120.0000
stargazers_count84.91970.0000
subscribers_count101.10650.0000
forks_count103.77190.0000
open_issues_count133.40030.0000
fork10.30010.0162
has_issues6.43750.0922
has_projects11.24550.0105
has_downloads1.27620.7348
has_wiki19.12090.0003
has_pages15.04040.0018
Table A.1. Kruskal-Wallis Correlation calculations for Nomos
Table A.2.
 statisticp-value
size178.16890.0000
stargazers_count90.86130.0000
subscribers_count106.47160.0000
forks_count112.13450.0000
open_issues_count141.27970.0000
fork3.350690.34065
has_issues3.877660.27498
has_projects12.223920.00665
has_downloads1.031540.79362
has_wiki19.738760.00019
has_pages9.928830.01918
Table A.2. Kruskal-Wallis Correlation calculations for ScanCode
Table A.3.
 statisticp-value
size200.81150.0000
stargazers_count87.35380.0000
subscribers_count98.38080.0000
forks_count107.81770.0000
open_issues_count133.29910.0000
fork10.55910.0144
has_issues6.5870.0863
has_projects11.48950.0094
has_downloads1.39130.7076
has_wiki19.03570.0003
has_pages14.74050.0021
Table A.3. Kruskal-Wallis Correlation calculations for hybrid

References

[1]
Pär J. Ågerfalk and Brian Fitzgerald. 2008. Outsourcing to an unknown workforce: Exploring opensurcing as a global sourcing strategy. MIS Quarterly 32, 2 (2008), 385–409.
[2]
Daniel A. Almeida, Gail C. Murphy, Greg Wilson, and Mike Hoye. 2017. Do software developers understand open source licenses? In 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC’17). IEEE, 1–11.
[3]
Thomas A. Alspaugh, Hazeline U. Asuncion, and Walt Scacchi. 2009. Analyzing software licenses in open architecture software systems. In Proceedings of the 2009 ICSE Workshop on Emerging Trends in Free/Libre/Open Source Software Research and Development. IEEE, 54–57.
[4]
Terrence August, Wei Chen, and Kevin Zhu. 2021. Competition among proprietary and open-source software firms: The role of licensing in strategic contribution. Management Science 67, 5 (2021), 3041–3066.
[5]
Aran Azhakesan and Frances Paulisch. 2020. Sharing at scale: An open-source-software-based license compliance ecosystem. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Practice. 130–131.
[6]
Sebastian Baltes and Stephan Diehl. 2019. Usage and attribution of stack overflow code snippets in GitHub projects. Empirical Software Engineering 24, 3 (2019), 1259–1295.
[7]
COSRI. 2017. 2017 Open Source Security & Risk Analysis, Black Duck Software.
[8]
Shane Coughlan. 2020. Standardizing open source license compliance with OpenChain. Computer 53, 11 (2020), 70–74.
[9]
Kevin Crowston, Kangning Wei, James Howison, and Andrea Wiggins. 2012. Free/Libre open-source software development: What we know and what we do not know. ACM Computing Surveys (CSUR) 44, 2 (2012). 7.
[10]
Sergius Dyck, Daniel Haferkorn, Christian Kerth, and André Schoebel. 2018. Automating open source software license information generation in software projects. Journal of Systemics, Cybernetics and Informatics 16, 5 (2018), 44–49.
[11]
European-Commission. [n. d.]. The economic and social impact of software & services on competitiveness and innovation (SMART 2015/0015). Retrieved April 22, 2021, from https://op.europa.eu/en/publication-detail/-/publication/480eff53-0495-11e7-8a35-01aa75ed71a11.
[12]
Oliver Fendt and Michael C. Jaeger. 2019. Open source for open source license compliance. In Open Source Systems, Francis Bordeleau, Alberto Sillitti, Paulo Meirelles, and Valentina Lenarduzzi (Eds.). Springer International Publishing, Cham, 133–138.
[13]
Oliver Fendt and Michael C. Jaeger. 2019. Open source for open source license compliance. In IFIP International Conference on Open Source Systems. Springer, 133–138.
[14]
Muyue Feng, Weixuan Mao, Zimu Yuan, Yang Xiao, Gu Ban, Wei Wang, Shiyang Wang, Qian Tang, Jiahuan Xu, He Su, et al. 2019. Open-source license violations of binary software at large scale. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER’19). IEEE, 564–568.
[15]
Brian Fitzgerald. 2006. The transformation of open source software. MIS Quarterly 30, 3 (2006), 587–598.
[16]
Richard Fontana, Bradley M. Kuhn, Eben Moglen, Matthew Norwood, Daniel B. Ravicher, Karen Sandler, James Vasile, and Aaron Williamson. 2008. A legal issues primer for open source and free software projects. Software Freedom Law Center, 1–37.
[17]
Jonas Gamalielsson and Björn Lundell. 2017. On licensing and other conditions for contributing to widely used open source projects: An exploratory analysis. In Proceedings of the 13th International Symposium on Open Collaboration. 1–14.
[18]
G. R. Gangadharan, Vincenzo D’Andrea, Stefano De Paoli, and Michael Weiss. 2012. Managing license compliance in free and open source software development. Information Systems Frontiers 14, 2 (2012), 143–154.
[19]
Daniel German and Massimiliano Di Penta. 2012. A method for open source license compliance of java applications. IEEE Software 29, 3 (2012), 58–63.
[20]
Daniel M. German, Massimiliano Di Penta, and Julius Davies. 2010. Understanding and auditing the licensing of open source software distributions. In 2010 IEEE 18th International Conference on Program Comprehension. IEEE, 84–93.
[21]
Daniel M. German and Ahmed E. Hassan. 2009. License integration patterns: Addressing license mismatches in component-based development. In Proceedings of the 31st International Conference on Software Engineering. IEEE Computer Society, 188–198.
[22]
Daniel M. German, Yuki Manabe, and Katsuro Inoue. 2010. A sentence-matching method for automatic license identification of source code files. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering. 437–446.
[23]
Robert Gobeille. 2008. The FOSSology project. In Proceedings of the International Working Conference on Mining Software Repositories. ACM, 47–50.
[24]
Andrew J. Hall. 2016. Open-source licensing and business models: Making money by giving it away. Santa Clara Computer & High Tech. LJ 33 (2016), 427.
[25]
Nikolay Harutyunyan. 2020. Managing your open source supply chain-why and how? Computer 53, 6 (2020), 77–81.
[26]
Nikolay Harutyunyan, Andreas Bauer, and Dirk Riehle. 2019. Industry requirements for FLOSS governance tools to facilitate the use of FLOSS components in commercial products. Journal of Systems and Software 158 (2019), 110390.
[27]
Nikolay Harutyunyan and Dirk Riehle. 2021. Getting started with corporate open source governance: A case study evaluation of industry best practices. In Proceedings of the 54th Hawaii International Conference on System Sciences. 6263.
[28]
K. Hassin. 2007. Open source on trial. In Open Source Business Resource.
[29]
Øyvind Hauge, Claudia Ayala, and Reidar Conradi. 2010. Adoption of open source software in software-intensive organizations–A systematic literature review. Information and Software Technology 52, 11 (2010), 1133–1154.
[30]
Roberta Heale and Alison Twycross. 2015. Validity and reliability in quantitative studies. Evidence-based Nursing 18, 3 (2015), 66–67.
[31]
Armijn Hemel, Karl Trygve Kalleberg, Rob Vermaas, and Eelco Dolstra. 2011. Finding software license violations through binary code clone detection. In Proceedings of the 8th Working Conference on Mining Software Repositories. 63–72.
[32]
Armijn Hemel, Karl Trygve Kalleberg, Rob Vermaas, and Eelco Dolstra. 2021. Finding software license violations through binary code clone detection-a retrospective. ACM SIGSOFT Software Engineering Notes 46, 3 (2021), 24–25.
[33]
Michael C. Jaeger, Oliver Fendt, Robert Gobeille, Maximilian Huber, Johannes Najjar, Kate Stewart, Steffen Weber, and Andreas Wurl. 2017. The FOSSology project: 10 years of license scanning. IFOSS L. Rev. 9 (2017), 9.
[34]
Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. German, and Daniela Damian. 2014. The promises and perils of mining GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR’14). ACM, ACM, New York, NY, 92–101.
[35]
Georgia M. Kapitsaki and Georgia Charalambous. 2019. Modeling and recommending open source licenses with findOSSLicense. IEEE Transactions on Software Engineering 47, 5 (2019), 919–935.
[36]
Georgia M. Kapitsaki, Frederik Kramer, and Nikolaos D. Tselikas. 2017. Automating the license compatibility process in open source software with SPDX. Journal of Systems and Software 131 (2017), 386–401.
[37]
Fabio Kon, Paulo Meirelles, Nelson Lago, Antonio Terceiro, Christina Chavez, and Manoel Mendonça. 2011. Free and open source software development and research: Opportunities for software engineering. In 2011 25th Brazilian Symposium on Software Engineering. IEEE, 82–91.
[38]
Paul B. de Laat. 2007. Governance of open source software: State of the art. Journal of Management & Governance 11, 2 (2007), 165–177.
[39]
Josh Lerner and Jean Tirole. 2005. The scope of open source licensing. Journal of Law, Economics, and Organization 21, 1 (2005), 20–56.
[40]
Cristina V. Lopes, Petr Maj, Pedro Martins, Vaibhav Saini, Di Yang, Jakub Zitny, Hitesh Sajnani, and Jan Vitek. 2017. DéjàVu: a map of code duplicates on GitHub. Proceedings of the ACM on Programming Languages, Vol. 1, ACM, New York, NY, 1–28.
[41]
Patrick Marois, Josianne Marsan, Kevin Carillo, Klaas-Jan Stol, and Brian Fitzgerald. 2022. A Delphi study of obsolete assumptions in free/libre and open source software. In Proceedings of the European Conference on Information Systems. AIS.
[42]
João Moraes, Ivanilton Polato, Igor Wiese, Filipe Saraiva, and Gustavo Pinto. 2021. From one to hundreds: Multi-licensing in the JavaScript ecosystem. Empirical Software Engineering 26, 3 (2021), 1–29.
[43]
Mikko Mustonen. 2003. Copyleft—The economics of Linux and other open source software. Information Economics and Policy 15, 1 (2003), 99–121.
[44]
Philippe Ombredanne. 2020. Free and open source software license compliance: Tools for software composition analysis. Computer 53, 10 (2020), 105–109.
[45]
Karl Michael Popp. 2019. Best Practices for Commercial Use of Open Source Software: Business Models. BoD–Books on Demand.
[46]
Dirk Riehle. 2020. Single-vendor open source firms. Computer 53, 4 (2020), 68–72.
[47]
Dirk Riehle and Nikolay Harutyunyan. 2019. Open-source license compliance in software supply chains. In Towards Engineering Free/Libre Open Source Software (FLOSS) Ecosystems for Impact and Sustainability. Springer, 83–95.
[48]
Lawrence Rosen. 2005. Open Source Licensing. Vol. 692. Prentice Hall.
[49]
Guillaume Rousseau, Roberto Di Cosmo, and Stefano Zacchiroli. 2020. Software provenance tracking at the scale of public source code. Empirical Software Engineering 25, 4 (2020), 2930–2959.
[50]
C. Ruffin and Christof Ebert. 2004. Using open source software in product development: A primer. IEEE Software 21, 1 (2004), 82–86.
[51]
M. Ruffin and C. Ebert. 2004. Using open source software in product development: A primer. IEEE Software 21 (2004), 82–86.
[52]
Megan Squire. 2017. The lives and deaths of open source code forges. In Proceedings of the 13th International Symposium on Open Collaboration. ACM, New York, NY.
[53]
Katherine J. Stewart, Anthony P. Ammeter, and Likoebe M. Maruping. 2006. Impacts of license choice and organizational sponsorship on user interest and development activity in open source software projects. Information Systems Research 17, 2 (2006), 126–144.
[54]
Timo Tuunanen, Jussi Koskinen, and Tommi Karkkainen. 2006. Asla: Reverse engineering approach for software license information retrieval. In Conference on Software Maintenance and Reengineering (CSMR’06). IEEE, 4–pp.
[55]
Timo Tuunanen, Jussi Koskinen, and Tommi Kärkkäinen. 2009. Automated software license analysis. Automated Software Engineering 16, 3 (2009), 455–490.
[56]
Sander Van Der Burg, Eelco Dolstra, Shane McIntosh, Julius Davies, Daniel M. German, and Armijn Hemel. 2014. Tracing software build processes to uncover license compliance inconsistencies. In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering. 731–742.
[57]
Thomas Wolter. 2019. A Comparison Study of Open Source License Crawler. Bachelor Thesis. https://osr.cs.fau.de/wp-content/uploads/2019/08/wolter_2019.pdf.
[58]
Yuhao Wu, Yuki Manabe, Tetsuya Kanda, Daniel M. German, and Katsuro Inoue. 2015. A method to detect license inconsistencies in large-scale open source projects. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. IEEE, 324–333.
[59]
Stefano Zacchiroli. 2022. A large-scale dataset of (open source) license text variants. In Proceedings of the 19th International Conference on Mining Software Repositories. IEEE.

Cited By

View all
  • (2025)On the suitability of hugging face hub for empirical studiesEmpirical Software Engineering10.1007/s10664-024-10608-830:2Online publication date: 18-Jan-2025
  • (2024)The Origin and Opportunities of Developers' Perceived Code Accountability in Open Source AI Software DevelopmentProceedings of the 2024 AAAI/ACM Conference on AI, Ethics, and Society10.5555/3716662.3716672(94-106)Online publication date: 21-Oct-2024
  • (2024)Studying the Impact of TensorFlow and PyTorch Bindings on Machine Learning Software QualityACM Transactions on Software Engineering and Methodology10.1145/367816834:1(1-31)Online publication date: 13-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 32, Issue 5
September 2023
905 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/3610417
  • Editor:
  • Mauro Pezzè
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 July 2023
Online AM: 08 December 2022
Accepted: 22 September 2022
Revised: 13 September 2022
Received: 25 January 2022
Published in TOSEM Volume 32, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. License management
  2. license conflicts

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)464
  • Downloads (Last 6 weeks)30
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)On the suitability of hugging face hub for empirical studiesEmpirical Software Engineering10.1007/s10664-024-10608-830:2Online publication date: 18-Jan-2025
  • (2024)The Origin and Opportunities of Developers' Perceived Code Accountability in Open Source AI Software DevelopmentProceedings of the 2024 AAAI/ACM Conference on AI, Ethics, and Society10.5555/3716662.3716672(94-106)Online publication date: 21-Oct-2024
  • (2024)Studying the Impact of TensorFlow and PyTorch Bindings on Machine Learning Software QualityACM Transactions on Software Engineering and Methodology10.1145/367816834:1(1-31)Online publication date: 13-Jul-2024
  • (2024)Analyzing FOSS license usage in publicly available software at scale via the SWH-analytics frameworkThe Journal of Supercomputing10.1007/s11227-024-06069-x80:11(15799-15833)Online publication date: 6-Apr-2024
  • (2024)Bridging the language gap: an empirical study of bindings for open source machine learning libraries across software package ecosystemsEmpirical Software Engineering10.1007/s10664-024-10570-530:1Online publication date: 18-Oct-2024
  • (2023)Applying the Universal Version History Concept to Help De-Risk Copy-Based Code Reuse2023 IEEE 23rd International Working Conference on Source Code Analysis and Manipulation (SCAM)10.1109/SCAM59687.2023.00012(1-12)Online publication date: 2-Oct-2023
  • (2023)Understanding and Remediating Open-Source License Incompatibilities in the PyPI EcosystemProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00175(178-190)Online publication date: 11-Nov-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media