skip to main content
research-article

Open Source License Inconsistencies on GitHub

Published:22 July 2023Publication History
Skip Abstract Section

Abstract

Almost all software, open or closed, builds on open source software and therefore needs to comply with the license obligations of the open source code. Not knowing which licenses to comply with poses a legal danger to anyone using open source software. This article investigates the extent of inconsistencies between licenses declared by an open source project at the top level of the repository and the licenses found in the code. We analyzed a sample of 1,000 open source GitHub repositories. We find that about half of the repositories did not fully declare all licenses found in the code. Of these, approximately 10% represented a permissive vs. copyleft license mismatch. Furthermore, existing tools cannot fully identify licences. We conclude that users of open source code should not just look at the declared licenses of the open source code they intend to use, but rather examine the software to understand its actual licenses.

REFERENCES

  1. [1] Ågerfalk Pär J. and Fitzgerald Brian. 2008. Outsourcing to an unknown workforce: Exploring opensurcing as a global sourcing strategy. MIS Quarterly 32, 2 (2008), 385409.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Almeida Daniel A., Murphy Gail C., Wilson Greg, and Hoye Mike. 2017. Do software developers understand open source licenses? In 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC’17). IEEE, 111.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Alspaugh Thomas A., Asuncion Hazeline U., and Scacchi Walt. 2009. Analyzing software licenses in open architecture software systems. In Proceedings of the 2009 ICSE Workshop on Emerging Trends in Free/Libre/Open Source Software Research and Development. IEEE, 5457.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] August Terrence, Chen Wei, and Zhu Kevin. 2021. Competition among proprietary and open-source software firms: The role of licensing in strategic contribution. Management Science 67, 5 (2021), 30413066.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Azhakesan Aran and Paulisch Frances. 2020. Sharing at scale: An open-source-software-based license compliance ecosystem. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Practice. 130131.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Baltes Sebastian and Diehl Stephan. 2019. Usage and attribution of stack overflow code snippets in GitHub projects. Empirical Software Engineering 24, 3 (2019), 12591295.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] COSRI. 2017. 2017 Open Source Security & Risk Analysis, Black Duck Software.Google ScholarGoogle Scholar
  8. [8] Coughlan Shane. 2020. Standardizing open source license compliance with OpenChain. Computer 53, 11 (2020), 7074.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Crowston Kevin, Wei Kangning, Howison James, and Wiggins Andrea. 2012. Free/Libre open-source software development: What we know and what we do not know. ACM Computing Surveys (CSUR) 44, 2 (2012). 7.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Dyck Sergius, Haferkorn Daniel, Kerth Christian, and Schoebel André. 2018. Automating open source software license information generation in software projects. Journal of Systemics, Cybernetics and Informatics 16, 5 (2018), 4449.Google ScholarGoogle Scholar
  11. [11] European-Commission. [n. d.]. The economic and social impact of software & services on competitiveness and innovation (SMART 2015/0015). Retrieved April 22, 2021, from https://op.europa.eu/en/publication-detail/-/publication/480eff53-0495-11e7-8a35-01aa75ed71a11.Google ScholarGoogle Scholar
  12. [12] Fendt Oliver and Jaeger Michael C.. 2019. Open source for open source license compliance. In Open Source Systems, Bordeleau Francis, Sillitti Alberto, Meirelles Paulo, and Lenarduzzi Valentina (Eds.). Springer International Publishing, Cham, 133138. Google ScholarGoogle Scholar
  13. [13] Fendt Oliver and Jaeger Michael C.. 2019. Open source for open source license compliance. In IFIP International Conference on Open Source Systems. Springer, 133138.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Feng Muyue, Mao Weixuan, Yuan Zimu, Xiao Yang, Ban Gu, Wang Wei, Wang Shiyang, Tang Qian, Xu Jiahuan, Su He, et al. 2019. Open-source license violations of binary software at large scale. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER’19). IEEE, 564568.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Fitzgerald Brian. 2006. The transformation of open source software. MIS Quarterly 30, 3 (2006), 587598.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Fontana Richard, Kuhn Bradley M., Moglen Eben, Norwood Matthew, Ravicher Daniel B., Sandler Karen, Vasile James, and Williamson Aaron. 2008. A legal issues primer for open source and free software projects. Software Freedom Law Center, 137.Google ScholarGoogle Scholar
  17. [17] Gamalielsson Jonas and Lundell Björn. 2017. On licensing and other conditions for contributing to widely used open source projects: An exploratory analysis. In Proceedings of the 13th International Symposium on Open Collaboration. 114.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Gangadharan G. R., D’Andrea Vincenzo, Paoli Stefano De, and Weiss Michael. 2012. Managing license compliance in free and open source software development. Information Systems Frontiers 14, 2 (2012), 143154.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] German Daniel and Penta Massimiliano Di. 2012. A method for open source license compliance of java applications. IEEE Software 29, 3 (2012), 5863.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] German Daniel M., Penta Massimiliano Di, and Davies Julius. 2010. Understanding and auditing the licensing of open source software distributions. In 2010 IEEE 18th International Conference on Program Comprehension. IEEE, 8493.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] German Daniel M. and Hassan Ahmed E.. 2009. License integration patterns: Addressing license mismatches in component-based development. In Proceedings of the 31st International Conference on Software Engineering. IEEE Computer Society, 188198.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] German Daniel M., Manabe Yuki, and Inoue Katsuro. 2010. A sentence-matching method for automatic license identification of source code files. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering. 437446.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Gobeille Robert. 2008. The FOSSology project. In Proceedings of the International Working Conference on Mining Software Repositories. ACM, 4750.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Hall Andrew J.. 2016. Open-source licensing and business models: Making money by giving it away. Santa Clara Computer & High Tech. LJ 33 (2016), 427.Google ScholarGoogle Scholar
  25. [25] Harutyunyan Nikolay. 2020. Managing your open source supply chain-why and how? Computer 53, 6 (2020), 7781.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Harutyunyan Nikolay, Bauer Andreas, and Riehle Dirk. 2019. Industry requirements for FLOSS governance tools to facilitate the use of FLOSS components in commercial products. Journal of Systems and Software 158 (2019), 110390.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Harutyunyan Nikolay and Riehle Dirk. 2021. Getting started with corporate open source governance: A case study evaluation of industry best practices. In Proceedings of the 54th Hawaii International Conference on System Sciences. 6263.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Hassin K.. 2007. Open source on trial. In Open Source Business Resource.Google ScholarGoogle Scholar
  29. [29] Hauge Øyvind, Ayala Claudia, and Conradi Reidar. 2010. Adoption of open source software in software-intensive organizations–A systematic literature review. Information and Software Technology 52, 11 (2010), 11331154.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Heale Roberta and Twycross Alison. 2015. Validity and reliability in quantitative studies. Evidence-based Nursing 18, 3 (2015), 6667.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Hemel Armijn, Kalleberg Karl Trygve, Vermaas Rob, and Dolstra Eelco. 2011. Finding software license violations through binary code clone detection. In Proceedings of the 8th Working Conference on Mining Software Repositories. 6372.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Hemel Armijn, Kalleberg Karl Trygve, Vermaas Rob, and Dolstra Eelco. 2021. Finding software license violations through binary code clone detection-a retrospective. ACM SIGSOFT Software Engineering Notes 46, 3 (2021), 2425.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Jaeger Michael C., Fendt Oliver, Gobeille Robert, Huber Maximilian, Najjar Johannes, Stewart Kate, Weber Steffen, and Wurl Andreas. 2017. The FOSSology project: 10 years of license scanning. IFOSS L. Rev. 9 (2017), 9.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Kalliamvakou Eirini, Gousios Georgios, Blincoe Kelly, Singer Leif, German Daniel M., and Damian Daniela. 2014. The promises and perils of mining GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR’14). ACM, ACM, New York, NY, 92101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Kapitsaki Georgia M. and Charalambous Georgia. 2019. Modeling and recommending open source licenses with findOSSLicense. IEEE Transactions on Software Engineering 47, 5 (2019), 919935.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Kapitsaki Georgia M., Kramer Frederik, and Tselikas Nikolaos D.. 2017. Automating the license compatibility process in open source software with SPDX. Journal of Systems and Software 131 (2017), 386401.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Kon Fabio, Meirelles Paulo, Lago Nelson, Terceiro Antonio, Chavez Christina, and Mendonça Manoel. 2011. Free and open source software development and research: Opportunities for software engineering. In 2011 25th Brazilian Symposium on Software Engineering. IEEE, 8291.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Laat Paul B. de. 2007. Governance of open source software: State of the art. Journal of Management & Governance 11, 2 (2007), 165177.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Lerner Josh and Tirole Jean. 2005. The scope of open source licensing. Journal of Law, Economics, and Organization 21, 1 (2005), 2056.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Lopes Cristina V., Maj Petr, Martins Pedro, Saini Vaibhav, Yang Di, Zitny Jakub, Sajnani Hitesh, and Vitek Jan. 2017. DéjàVu: a map of code duplicates on GitHub. Proceedings of the ACM on Programming Languages, Vol. 1, ACM, New York, NY, 128.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Marois Patrick, Marsan Josianne, Carillo Kevin, Stol Klaas-Jan, and Fitzgerald Brian. 2022. A Delphi study of obsolete assumptions in free/libre and open source software. In Proceedings of the European Conference on Information Systems. AIS.Google ScholarGoogle Scholar
  42. [42] Moraes João, Polato Ivanilton, Wiese Igor, Saraiva Filipe, and Pinto Gustavo. 2021. From one to hundreds: Multi-licensing in the JavaScript ecosystem. Empirical Software Engineering 26, 3 (2021), 129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Mustonen Mikko. 2003. Copyleft—The economics of Linux and other open source software. Information Economics and Policy 15, 1 (2003), 99121.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Ombredanne Philippe. 2020. Free and open source software license compliance: Tools for software composition analysis. Computer 53, 10 (2020), 105109.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Popp Karl Michael. 2019. Best Practices for Commercial Use of Open Source Software: Business Models. BoD–Books on Demand.Google ScholarGoogle Scholar
  46. [46] Riehle Dirk. 2020. Single-vendor open source firms. Computer 53, 4 (2020), 6872. Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Riehle Dirk and Harutyunyan Nikolay. 2019. Open-source license compliance in software supply chains. In Towards Engineering Free/Libre Open Source Software (FLOSS) Ecosystems for Impact and Sustainability. Springer, 8395.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Rosen Lawrence. 2005. Open Source Licensing. Vol. 692. Prentice Hall.Google ScholarGoogle Scholar
  49. [49] Rousseau Guillaume, Cosmo Roberto Di, and Zacchiroli Stefano. 2020. Software provenance tracking at the scale of public source code. Empirical Software Engineering 25, 4 (2020), 29302959.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Ruffin C. and Ebert Christof. 2004. Using open source software in product development: A primer. IEEE Software 21, 1 (2004), 8286.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Ruffin M. and Ebert C.. 2004. Using open source software in product development: A primer. IEEE Software 21 (2004), 8286.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Squire Megan. 2017. The lives and deaths of open source code forges. In Proceedings of the 13th International Symposium on Open Collaboration. ACM, New York, NY.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Stewart Katherine J., Ammeter Anthony P., and Maruping Likoebe M.. 2006. Impacts of license choice and organizational sponsorship on user interest and development activity in open source software projects. Information Systems Research 17, 2 (2006), 126144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Tuunanen Timo, Koskinen Jussi, and Karkkainen Tommi. 2006. Asla: Reverse engineering approach for software license information retrieval. In Conference on Software Maintenance and Reengineering (CSMR’06). IEEE, 4–pp.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Tuunanen Timo, Koskinen Jussi, and Kärkkäinen Tommi. 2009. Automated software license analysis. Automated Software Engineering 16, 3 (2009), 455490.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Burg Sander Van Der, Dolstra Eelco, McIntosh Shane, Davies Julius, German Daniel M., and Hemel Armijn. 2014. Tracing software build processes to uncover license compliance inconsistencies. In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering. 731742.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] Wolter Thomas. 2019. A Comparison Study of Open Source License Crawler. Bachelor Thesis. https://osr.cs.fau.de/wp-content/uploads/2019/08/wolter_2019.pdf.Google ScholarGoogle Scholar
  58. [58] Wu Yuhao, Manabe Yuki, Kanda Tetsuya, German Daniel M., and Inoue Katsuro. 2015. A method to detect license inconsistencies in large-scale open source projects. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. IEEE, 324333.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Zacchiroli Stefano. 2022. A large-scale dataset of (open source) license text variants. In Proceedings of the 19th International Conference on Mining Software Repositories. IEEE.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Open Source License Inconsistencies on GitHub

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Software Engineering and Methodology
          ACM Transactions on Software Engineering and Methodology  Volume 32, Issue 5
          September 2023
          905 pages
          ISSN:1049-331X
          EISSN:1557-7392
          DOI:10.1145/3610417
          • Editor:
          • Mauro Pezzè
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 22 July 2023
          • Online AM: 8 December 2022
          • Accepted: 22 September 2022
          • Revised: 13 September 2022
          • Received: 25 January 2022
          Published in tosem Volume 32, Issue 5

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
        • Article Metrics

          • Downloads (Last 12 months)595
          • Downloads (Last 6 weeks)72

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text