Skip to main content
Log in

Analysis of license inconsistency in large collections of open source projects

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Free and open source software (FOSS) plays an important role in source code reuse practice. They usually come with one or more software licenses written in the header part of source files, stating the requirements and conditions which should be followed when been reused. Removing or modifying the license statement by re-distributors will result in the inconsistency of license with its ancestor, and may potentially cause license infringement. In this paper, we describe and categorize different types of license inconsistencies and propose a method to detect them. Then we applied this method to Debian 7.5 and a collection of 10,514 Java projects on GitHub and present the license inconsistency cases found in these systems. With a manual analysis, we summarized various reasons behind these license inconsistency cases, some of which imply potential license infringement and require attention from the developers. This analysis also exposes the difficulty to discover license infringements, highlighting the usefulness of finding and maintaining source code provenance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. http://opensource.org

  2. http://www.fsf.org

  3. http://www.ifross.org/en/artikel/versata-saga-settled-prejudice-1

  4. http://opensource.org/definition

  5. http://www.blackducksoftware.com/products/knowledgebase

  6. In this paper we will use the abbreviations of FOSS licenses of the Software Package Data Exchange (SPDX), found at http://spdx.org/licenses/.

  7. Some licenses, such as the Mozilla tri-license (which allowed the reuse of the file under either the MPL-1.0, the GPL-2.0+ or the LGPL-2.1) allow the user to remove one or two licenses. Similarly, files are frequently licensed with the ability to use newer versions of the license (corresponding to the + sign in the SPDX abbreviations of license names, such as GPL-2.0+).

  8. https://www.debian.org/

  9. https://github.com/

  10. https://axis.apache.org/axis/cvs.html (Last access: Oct. 2nd, 2015)

  11. https://github.com/jcryptool/crypto

  12. https://github.com/Kalliope/minica

References

  • Alspaugh T, Asuncion H, Scacchi W (2009) Intellectual property rights requirements for heterogeneously-licensed systems. In: Proceedings of the 17th International Requirements Engineering Conference (RE2009), pp 24–33. doi:10.1109/RE.2009.22

  • Bettenburg N, Shang W, Ibrahim W, Adams B, Zou Y, Hassan A (2009) An empirical study on inconsistent changes to code clones at release level. In: Proceedings of the 16th Working Conference on Reverse Engineering (WCRE2009), pp 85–94. doi:10.1109/WCRE.2009.51

  • Boehm BW (1987) Improving software productivity. Computer 20(9):43–57. doi:10.1109/MC.1987.1663694

  • Di Penta M, German DM, Guéhéneuc YG, Antoniol G (2010) An exploratory study of the evolution of software licensing. In: Proceedings of the 32nd International Conference on Software Engineering (ICSE2010), pp 145–154

  • Gabel M, Yang J, Yu Y, Goldszmidt M, Su Z (2010) Scalable and systematic detection of buggy inconsistencies in source code. In: Proceedings of the 25th International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA2010), pp 175–190

  • German D, Di Penta M, Gueheneuc YG, Antoniol G (2009) Code siblings: Technical and legal implications of copying code between applications. In: Proceedings of the 6th Working Conference on Mining Software Repositories (MSR2009), pp 81–90. doi:10.1109/MSR.2009.5069483

  • German D, Di Penta M, Davies J (2010a) Understanding and auditing the licensing of open source software distributions. In: Proceedings of the 18th International Conference on Program Comprehension (ICPC2010), pp 84–93. doi:10.1109/ICPC.2010.48

  • German DM, Hassan AE (2009) License integration patterns: Addressing license mismatches in component-based development. In: Software Engineering, 2009. ICSE 2009. IEEE 31st International Conference on, IEEE, pp 188–198

  • German DM, Manabe Y, Inoue K (2010b) A sentence-matching method for automatic license identification of source code files. In: Proceedings of the 25th International Conference on Automated Software Engineering (ASE2010), pp 437–446

  • Gobeille R (2008) The FOSSology project. In: Proceedings of the 5th Working Conference on Mining Software Repositories (MSR2008), pp 47–50

  • Göde N, Harder J (2011) Oops! . . . I changed it again. In: Proceedings of the 5th International Workshop on Software Clones (IWSC2011), pp 14–20

  • Göde N, Koschke R (2011) Frequency and risks of changes to clones. In: Proceedings of the 33rd International Conference on Software Engineering (ICSE2011). doi:10.1145/1985793.1985836, pp 311–320

  • Higo Y, Kusumoto S (2014) MPAnalyzer: A tool for finding unintended inconsistencies in program source code. In: Proceedings of the 29th International Conference on Automated Software Engineering (ASE2014), pp 843–846

  • Kamiya T, Kusumoto S, Inoue K (2002) CCFinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28(7):654–670

    Article  Google Scholar 

  • Krinke J (2007) A study of consistent and inconsistent changes to code clones. In: Proceedings of the 14th Working Conference on Reverse Engineering (WCRE2007), pp 170–178. doi:10.1109/WCRE.2007.7

  • Li J, Conradi R, Bunse C, Torchiano M, Slyngstad O, Morisio M (2009) Development with off-the-shelf components: 10 facts. IEEE Softw 26(2):80–87. doi:10.1109/MS.2009.33

  • Manabe Y, Hayase Y, Inoue K (2010) Evolutional analysis of licenses in FOSS. In: Proceedings of the Joint ERCIM Workshop on Software Evolution and International Workshop on Principles of Software Evolution (IWPSE-EVOL2010), pp 83–87. doi:10.1145/1862372.1862391

  • Manabe Y, German D, Inoue K (2014) Analyzing the relationship between the license of packages and their files in free and open source software . In: Proceedings of the 10th International Conference on Open Source Systems (OSS2014), pp 51–60. doi:10.1007/978-3-642-55128-4_6

  • McIlroy MD, Buxton J, Naur P, Randell B (1968) Mass-produced software components. In: Proceedings of the 1st International Conference on Software Engineering (ICSE1968), pp 88–98

  • Roy CK, Cordy JR, Koschke R (2009) Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Sci Comput Program 74(7):470–495

    Article  MathSciNet  MATH  Google Scholar 

  • Sasaki Y, Yamamoto T, Hayase Y, Inoue K (2010) Finding file clones in FreeBSD ports collection. In: Proceedings of the 7th Working Conference on Mining Software Repositories (MSR2010), pp 102–105

  • Standish TA (1984) An essay on software reuse. IEEE Trans Softw Eng SE-10 (5):494–497. doi:10.1109/TSE.1984.5010272

  • Tuunanen T, Koskinen J, Kärkkäinen T (2009) Automated software license analysis. Autom Softw Eng 16(3-4):455–490. doi:10.1007/s10515-009-0054-z

  • Vendome C, Linares-Vásquez M, Bavota G, Di Penta M, Germán DM, Poshyvanyk D (2015) License usage and changes: A large-scale study of java projects on github. In: The 23rd IEEE International Conference on Program Comprehension, ICPC

  • Vendome C, Linares-Vásquez M, Bavota G, Di Penta M, German DM, Poshyvanyk D (2015b) When and why developers adopt and change software licenses. In: 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, pp 31–40

  • Wu Y, Manabe Y, Kanda T, German DM, Inoue K (2015) A method to detect license inconsistencies in large-scale open source projects. In: Proceedings of the 12th Working Conference on Mining Software Repositories (MSR2015), pp 324–333

  • Zhang H, Shi B, Zhan L (2010) Automatic checking of license compliance. In: 2010 IEEE International Conference on Software maintenance (ICSM). IEEE, pp 1–3

Download references

Acknowledgments

This work is supported by Japan Society for the Promotion of Science, Grant-in-Aid for Scientific Research (S) “Collecting, Analyzing, and Evaluating Software Assets for Effective Reuse”(No.25220003) and Osaka University Program for Promoting International Joint Research, “Software License Evolution Analysis”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuhao Wu.

Additional information

Communicated by: Romain Robbes, Martin Pinzger and Yasutaka Kamei

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Y., Manabe, Y., Kanda, T. et al. Analysis of license inconsistency in large collections of open source projects. Empir Software Eng 22, 1194–1222 (2017). https://doi.org/10.1007/s10664-016-9487-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-016-9487-8

Keywords

Navigation