Abstract
Free and open source software (FOSS) plays an important role in source code reuse practice. They usually come with one or more software licenses written in the header part of source files, stating the requirements and conditions which should be followed when been reused. Removing or modifying the license statement by re-distributors will result in the inconsistency of license with its ancestor, and may potentially cause license infringement. In this paper, we describe and categorize different types of license inconsistencies and propose a method to detect them. Then we applied this method to Debian 7.5 and a collection of 10,514 Java projects on GitHub and present the license inconsistency cases found in these systems. With a manual analysis, we summarized various reasons behind these license inconsistency cases, some of which imply potential license infringement and require attention from the developers. This analysis also exposes the difficulty to discover license infringements, highlighting the usefulness of finding and maintaining source code provenance.
Similar content being viewed by others
Notes
In this paper we will use the abbreviations of FOSS licenses of the Software Package Data Exchange (SPDX), found at http://spdx.org/licenses/.
Some licenses, such as the Mozilla tri-license (which allowed the reuse of the file under either the MPL-1.0, the GPL-2.0+ or the LGPL-2.1) allow the user to remove one or two licenses. Similarly, files are frequently licensed with the ability to use newer versions of the license (corresponding to the + sign in the SPDX abbreviations of license names, such as GPL-2.0+).
https://www.debian.org/
https://github.com/
https://axis.apache.org/axis/cvs.html (Last access: Oct. 2nd, 2015)
References
Alspaugh T, Asuncion H, Scacchi W (2009) Intellectual property rights requirements for heterogeneously-licensed systems. In: Proceedings of the 17th International Requirements Engineering Conference (RE2009), pp 24–33. doi:10.1109/RE.2009.22
Bettenburg N, Shang W, Ibrahim W, Adams B, Zou Y, Hassan A (2009) An empirical study on inconsistent changes to code clones at release level. In: Proceedings of the 16th Working Conference on Reverse Engineering (WCRE2009), pp 85–94. doi:10.1109/WCRE.2009.51
Boehm BW (1987) Improving software productivity. Computer 20(9):43–57. doi:10.1109/MC.1987.1663694
Di Penta M, German DM, Guéhéneuc YG, Antoniol G (2010) An exploratory study of the evolution of software licensing. In: Proceedings of the 32nd International Conference on Software Engineering (ICSE2010), pp 145–154
Gabel M, Yang J, Yu Y, Goldszmidt M, Su Z (2010) Scalable and systematic detection of buggy inconsistencies in source code. In: Proceedings of the 25th International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA2010), pp 175–190
German D, Di Penta M, Gueheneuc YG, Antoniol G (2009) Code siblings: Technical and legal implications of copying code between applications. In: Proceedings of the 6th Working Conference on Mining Software Repositories (MSR2009), pp 81–90. doi:10.1109/MSR.2009.5069483
German D, Di Penta M, Davies J (2010a) Understanding and auditing the licensing of open source software distributions. In: Proceedings of the 18th International Conference on Program Comprehension (ICPC2010), pp 84–93. doi:10.1109/ICPC.2010.48
German DM, Hassan AE (2009) License integration patterns: Addressing license mismatches in component-based development. In: Software Engineering, 2009. ICSE 2009. IEEE 31st International Conference on, IEEE, pp 188–198
German DM, Manabe Y, Inoue K (2010b) A sentence-matching method for automatic license identification of source code files. In: Proceedings of the 25th International Conference on Automated Software Engineering (ASE2010), pp 437–446
Gobeille R (2008) The FOSSology project. In: Proceedings of the 5th Working Conference on Mining Software Repositories (MSR2008), pp 47–50
Göde N, Harder J (2011) Oops! . . . I changed it again. In: Proceedings of the 5th International Workshop on Software Clones (IWSC2011), pp 14–20
Göde N, Koschke R (2011) Frequency and risks of changes to clones. In: Proceedings of the 33rd International Conference on Software Engineering (ICSE2011). doi:10.1145/1985793.1985836, pp 311–320
Higo Y, Kusumoto S (2014) MPAnalyzer: A tool for finding unintended inconsistencies in program source code. In: Proceedings of the 29th International Conference on Automated Software Engineering (ASE2014), pp 843–846
Kamiya T, Kusumoto S, Inoue K (2002) CCFinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28(7):654–670
Krinke J (2007) A study of consistent and inconsistent changes to code clones. In: Proceedings of the 14th Working Conference on Reverse Engineering (WCRE2007), pp 170–178. doi:10.1109/WCRE.2007.7
Li J, Conradi R, Bunse C, Torchiano M, Slyngstad O, Morisio M (2009) Development with off-the-shelf components: 10 facts. IEEE Softw 26(2):80–87. doi:10.1109/MS.2009.33
Manabe Y, Hayase Y, Inoue K (2010) Evolutional analysis of licenses in FOSS. In: Proceedings of the Joint ERCIM Workshop on Software Evolution and International Workshop on Principles of Software Evolution (IWPSE-EVOL2010), pp 83–87. doi:10.1145/1862372.1862391
Manabe Y, German D, Inoue K (2014) Analyzing the relationship between the license of packages and their files in free and open source software . In: Proceedings of the 10th International Conference on Open Source Systems (OSS2014), pp 51–60. doi:10.1007/978-3-642-55128-4_6
McIlroy MD, Buxton J, Naur P, Randell B (1968) Mass-produced software components. In: Proceedings of the 1st International Conference on Software Engineering (ICSE1968), pp 88–98
Roy CK, Cordy JR, Koschke R (2009) Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Sci Comput Program 74(7):470–495
Sasaki Y, Yamamoto T, Hayase Y, Inoue K (2010) Finding file clones in FreeBSD ports collection. In: Proceedings of the 7th Working Conference on Mining Software Repositories (MSR2010), pp 102–105
Standish TA (1984) An essay on software reuse. IEEE Trans Softw Eng SE-10 (5):494–497. doi:10.1109/TSE.1984.5010272
Tuunanen T, Koskinen J, Kärkkäinen T (2009) Automated software license analysis. Autom Softw Eng 16(3-4):455–490. doi:10.1007/s10515-009-0054-z
Vendome C, Linares-Vásquez M, Bavota G, Di Penta M, Germán DM, Poshyvanyk D (2015) License usage and changes: A large-scale study of java projects on github. In: The 23rd IEEE International Conference on Program Comprehension, ICPC
Vendome C, Linares-Vásquez M, Bavota G, Di Penta M, German DM, Poshyvanyk D (2015b) When and why developers adopt and change software licenses. In: 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, pp 31–40
Wu Y, Manabe Y, Kanda T, German DM, Inoue K (2015) A method to detect license inconsistencies in large-scale open source projects. In: Proceedings of the 12th Working Conference on Mining Software Repositories (MSR2015), pp 324–333
Zhang H, Shi B, Zhan L (2010) Automatic checking of license compliance. In: 2010 IEEE International Conference on Software maintenance (ICSM). IEEE, pp 1–3
Acknowledgments
This work is supported by Japan Society for the Promotion of Science, Grant-in-Aid for Scientific Research (S) “Collecting, Analyzing, and Evaluating Software Assets for Effective Reuse”(No.25220003) and Osaka University Program for Promoting International Joint Research, “Software License Evolution Analysis”.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Romain Robbes, Martin Pinzger and Yasutaka Kamei
Rights and permissions
About this article
Cite this article
Wu, Y., Manabe, Y., Kanda, T. et al. Analysis of license inconsistency in large collections of open source projects. Empir Software Eng 22, 1194–1222 (2017). https://doi.org/10.1007/s10664-016-9487-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-016-9487-8