Abstract
Open-source software (OSS) licenses dictate the conditions, which should be followed to reuse, distribute, and modify software. Apart from widely-used licenses such as the MIT License, developers are also allowed to customize their own licenses (called custom license), whose descriptions are more flexible. The presence of such various licenses imposes challenges to understand licenses and their compatibility. To avoid financial and legal risks, it is essential to ensure license compatibility when integrating third-party packages or reusing code accompanied with licenses. In this work, we propose LiDetector, an effective tool that extracts and interprets OSS licenses (including both official licenses and custom licenses), and detects license incompatibility among these licenses. Specifically, LiDetector introduces a learning-based method to automatically identify meaningful license terms from an arbitrary license, and employs Probabilistic Context-Free Grammar (PCFG) to infer rights and obligations for incompatibility detection. Experiments demonstrate that LiDetector outperforms existing methods with 93.28% precision for term identification, and 91.09% accuracy for right and obligation inference, and can effectively detect incompatibility with 10.06% FP rate and 2.56% FN rate. Furthermore, with LiDetector, our large-scale empirical study on 1,846 projects reveals that 72.91% of the projects are suffering from license incompatibility, including popular ones such as the MIT License and the Apache License. We highlighted lessons learned from perspectives of different stakeholders and made all related data and the replication package publicly available to facilitate follow-up research.
- [1] . 2009. Intellectual property rights requirements for heterogeneously-licensed systems. In Proceedings of the 17th IEEE International Requirements Engineering Conference. 24–33.Google ScholarDigital Library
- [2] . 2010. Software licenses in context: The challenge of heterogeneously-licensed systems. Journal of the Association for Information Systems 11, 11 (2010), 2.Google ScholarCross Ref
- [3] . 2019. Policylint: Investigating internal privacy policy contradictions on Google Play. In Proceedings of the 28th USENIX Conference on Security Symposium. 585–602.Google Scholar
- [4] . 2021. The Backdoor Factory. Retrieved 27th Sep 2021 from https://github.com/secretsquirrel/the-backdoor-factory.Google Scholar
- [5] . 2021. A blocking, shuffling and lossless compression library. Retrieved 27th Sep 2021 from https://github.com/Blosc/c-blosc.Google Scholar
- [6] . 2020. An empirical assessment of security risks of global android banking apps. In Proceedings of the 2020 IEEE/ACM 42nd International Conference on Software Engineering. IEEE, 1310–1322.Google ScholarDigital Library
- [7] . 2018. Are mobile banking apps secure? what can be improved? In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 797–802.Google ScholarDigital Library
- [8] . 2012. Choose an open source license. Retrieved 27th Sep 2021 from https://choosealicense.com/no-permission/.Google Scholar
- [9] 2010. Report on prototype decision support system for oss license compatibility issues. Qualipso 79 (2010), 80.Google Scholar
- [10] . 2021. Augmented Traffic Control. Retrieved 27th Sep 2021 from https://github.com/facebookarchive/augmented-traffic-control.Google Scholar
- [11] . 2020. Deep learning-based named entity recognition and knowledge graph construction for geological hazards. ISPRS International Journal of Geo-Information 9, 1 (2020), 15.Google Scholar
- [12] . 2018. The Software Package Data Exchange. Retrieved 27th Sep 2021 from https://spdx.dev/.Google Scholar
- [13] . 2012. Managing license compliance in free and open source software development. Information Systems Frontiers 14, 2 (2012), 143–154.Google ScholarCross Ref
- [14] . 2012. A method for open source license compliance of java applications. IEEE Software 29, 3 (2012), 58–63.Google ScholarDigital Library
- [15] . 2010. A sentence-matching method for automatic license identification of source code files. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering. 437–446.Google ScholarDigital Library
- [16] . 2008. The fossology project. In Proceedings of the 2008 International Working Conference on Mining Software Repositories. 47–50.Google ScholarDigital Library
- [17] . 2011. Analyzing open source license compatibility issues with carneades. In Proceedings of the 13th International Conference on Artificial Intelligence and Law. 51–55.Google ScholarDigital Library
- [18] . 2013. Introducing the carneades web application. In Proceedings of the 14th International Conference on Artificial Intelligence and Law. 243–244.Google ScholarDigital Library
- [19] . 2014. A demonstration of the MARKOS license analyser. In Proceedings of the 5th International Conference on Computational Models of Argument. 461–462.Google Scholar
- [20] . 2020. corenlp. Retrieved 27th Sep 2021 from https://stanfordnlp.github.io/CoreNLP/.Google Scholar
- [21] . 2022. Detecting and augmenting missing key aspects in vulnerability descriptions. ACM Transactions on Software Engineering and Methodology (TOSEM) 31, 3 (2022), 1–27.Google ScholarDigital Library
- [22] . 2021. Key aspects augmentation of vulnerability description based on multiple security databases. In Proceedings of the 2021 IEEE 45th Annual Computers, Software, and Applications Conference. IEEE, 1020–1025.Google ScholarCross Ref
- [23] . 2021. Habo Linux Malware Analysis System. Retrieved 27th Sep 2021 from https://github.com/Tencent/HaboMalHunter.Google Scholar
- [24] . 2016. Clustering OSS license statements toward automatic generation of license rules. In Proceddings of the 7th International Workshop on Empirical Software Engineering in Practice. 30–35.Google Scholar
- [25] . 2019. Modeling and recommending open source licenses with findOSSLicense. IEEE Transactions on Software Engineering 47, 5 (2019), 919–935.Google Scholar
- [26] . 2014. Open source license violation check for SPDX files. In Proceedings of the Software Reuse for Dynamic Systems in the Cloud and Beyond. 90–105.Google ScholarCross Ref
- [27] . 2017. Automating the license compatibility process in open source software with SPDX. Journal of Systems and Software 131 (2017), 386–401.Google ScholarDigital Library
- [28] . 2017. Identifying terms in open source software license texts. In Proceedigns of the 24th Asia-Pacific Software Engineering Conference. 540–545.Google ScholarCross Ref
- [29] . 2018. Topic recommendation using Doc2Vec. In Proceedings of the 2018 International Joint Conference on Neural Networks. 1–6.Google ScholarCross Ref
- [30] . 2012. Software Licenses in Plain English. Retrieved 27th Sep 2021 from https://tldrlegal.com/.Google Scholar
- [31] . 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning. 1188–1196.Google ScholarDigital Library
- [32] . 2017. Cclearner: A deep learning-based clone detection approach. In Proceedings of the 2017 IEEE International Conference on Software Maintenance and Evolution. IEEE, 249–260.Google ScholarCross Ref
- [33] . 2015. Check compatibility between different SPDX licenses for checking dependency license compatibility. Retrieved from https://github.com/librariesio/license-compatibility.Google Scholar
- [34] Lawrence Rosen. 2004. Open Source Licensing: Software Freedom and Intellectual Property Law. Upper Saddle River, Prentice Hall.Google Scholar
- [35] . 2003. Alphabetical list of part-of-speech tags used in the Penn Treebank Project. Retrieved 27th Sep 2021 from https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html.Google Scholar
- [36] . 2022. Demystifying the vulnerability propagation and its evolution via dependency trees in the NPM ecosystem. In Proceedings of the 2022 IEEE/ACM 44nd International Conference on Software Engineering. IEEE.Google ScholarDigital Library
- [37] . 2006. Managing the complexity of large free and open source package-based software distributions. In Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering. 199–208.Google ScholarDigital Library
- [38] . 2012. An empirical study of license violations in open source projects. In Proceedings of the 35th Annual IEEE Software Engineering Workshop. 168–176.Google ScholarDigital Library
- [39] . 2021. Natural Language Toolkit. Retrieved 27th Sep 2021 from https://www.nltk.org/.Google Scholar
- [40] . 2021. What is open source? Retrieved 27th Sep 2021 from https://opensource.com/resources/what-open-source.Google Scholar
- [41] . 2016. Validate your SPDX files for open source license violations. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 1047–1051.Google ScholarDigital Library
- [42] . 2021. Full extractor of class/interface/method definitions. Retrieved 27th Sep 2021 from https://github.com/paul-hammant/qdox.Google Scholar
- [43] . 2021. Find licenses for your project’s dependencies. Retrieved 27th Sep 2021 from https://github.com/pivotal/LicenseFinder.Google Scholar
- [44] ProgrammerSought. 2021. The First Case of GPL Agreement in China is Settled. How Should the Relevant Open Source Software be Controlled? Retrieved from https://segmentfault.com/a/1190000040661920/en.Google Scholar
- [45] . 2021. Find, install and publish Python packages with the Python Package Index. Retrieved 27th Sep 2021 from https://pypi.org/.Google Scholar
- [46] . 2015. The Consequences of Violating Open Source Licenses. Retrieved 27th Sep 2021 from https://btlj.org/2015/11/consequences-violating-open-source-licenses/.Google Scholar
- [47] Nils Reimers and Iryna Gurevych. 2017. Optimal hyperparameters for deep LSTM-networks for sequence labeling tasks. arXiv:1707.06799. Retrieved from https://arxiv.org/abs/1707.06799.Google Scholar
- [48] . 2014. GloVe: Global Vectors for Word Representation. Retrieved 27th Sep 2021 from https://nlp.stanford.edu/projects/glove/.Google Scholar
- [49] . 2021. Faust. Retrieved 27th Sep 2021 from https://github.com/robinhood/faust.Google Scholar
- [50] . 2016. Grammatical inference of PCFGs applied to language modelling and unsupervised parsing. Fundamenta Informaticae 146, 4 (2016), 379–402.Google Scholar
- [51] . 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1631–1642.Google Scholar
- [52] . 2014. Multilingual sentiment analysis using emoticons and keywords. In Proceedings of the IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies. 102–109.Google ScholarDigital Library
- [53] . 2018. Apache License 2.0. Retrieved 27th Sep 2021 from https://spdx.org/licenses/Apache-2.0.html.Google Scholar
- [54] . 2018. BSD 3-Clause “New” or “Revised” License. Retrieved 27th Sep 2021 from https://spdx.org/licenses/BSD-3-Clause.html.Google Scholar
- [55] . 2018. Creative Commons Attribution Share Alike 4.0 International. Retrieved 27th Sep 2021 from https://spdx.org/licenses/CC-BY-SA-4.0.html.Google Scholar
- [56] . 2018. GNU Lesser General Public License v3.0 only. Retrieved 27th Sep 2021 from https://spdx.org/licenses/LGPL-3.0-only.html.Google Scholar
- [57] . 2018. The MIT License. Retrieved 27th Sep 2021 from https://spdx.org/licenses/MIT.html.Google Scholar
- [58] . 2018. Zope Public License 2.1. Retrieved 27th Sep 2021 from https://spdx.org/licenses/ZPL-2.1.html.Google Scholar
- [59] . 2021. Creative Commons Attribution 3.0 Unported. Retrieved 27th Sep 2021 from https://spdx.org/licenses/CC-BY-3.0.html.Google Scholar
- [60] . 2021. SPDX License List. Retrieved 27th Sep 2021 from https://spdx.org/licenses/.Google Scholar
- [61] . 2021. Statsite. Retrieved 27th Sep 2021 from https://github.com/statsite/statsite.Google Scholar
- [62] . 2009. Automated software license analysis. Automated Software Engineering 16 (2009), 455–490.Google ScholarDigital Library
- [63] . 2007. The free-libre / open source software (FLOSS) license slide. Retrieved 27th Sep 2021 from http://www.dwheeler.com/essays/floss-license-slide.pdf.Google Scholar
- [64] . 2019. Automatic essay scoring model based on two-layer bi-directional long-short term memory network. In Proceedings of the 2019 3rd International Conference on Computer Science and Artificial Intelligence. 133–137.Google ScholarDigital Library
- [65] . 2010. The design and implement of open source license tracking system. In Proceddings of the 2010 International Conference on Computational Intelligence and Software Engineering. 1–4.Google ScholarCross Ref
- [66] . 2021. LiDetector: License Incompatiblity Detection for Open Source Software. Retrieved 1st Jan 2022 from https://sites.google.com/view/lidetector.Google Scholar
- [67] . 2021. LiDetector: License Incompatiblity Detection for Open Source Software. Retrieved 1st Jan 2022 from https://github.com/XuSihan/LiDetector.Google Scholar
- [68] . 2021. Atvhunter: Reliable version detection of third-party libraries for vulnerability identification in android applications. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering. IEEE, 1695–1707.Google ScholarDigital Library
- [69] . 2020. Automated third-party library detection for android applications: Are we there yet?. In Proceedings of the 2020 35th IEEE/ACM International Conference on Automated Software Engineering. IEEE, 919–930.Google ScholarDigital Library
- [70] . 2021. Research on third-party libraries in android apps: A taxonomy and systematic literature review. IEEE Transactions on Software Engineering (2021), 1–32.Google Scholar
Index Terms
- LiDetector: License Incompatibility Detection for Open Source Software
Recommendations
Open Source License Inconsistencies on GitHub
Almost all software, open or closed, builds on open source software and therefore needs to comply with the license obligations of the open source code. Not knowing which licenses to comply with poses a legal danger to anyone using open source software. ...
LiResolver: License Incompatibility Resolution for Open Source Software
ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and AnalysisOpen source software (OSS) licenses regulate the conditions under which OSS can be legally reused, distributed, and modified. However, a common issue arises when incorporating third-party OSS accompanied with licenses, i.e., license incompatibility, ...
Open source license alternatives for software applications: is it a solution to stop software piracy?
ACM-SE 43: Proceedings of the 43rd annual Southeast regional conference - Volume 2The open source movement has introduced a wealth of software applications that may challenge commercial applications in ease of use, features, and speed. Typically open source applications are available "free-of-charge", but the potential for hidden ...
Comments