skip to main content
research-article

LiDetector: License Incompatibility Detection for Open Source Software

Authors Info & Claims
Published:13 February 2023Publication History
Skip Abstract Section

Abstract

Open-source software (OSS) licenses dictate the conditions, which should be followed to reuse, distribute, and modify software. Apart from widely-used licenses such as the MIT License, developers are also allowed to customize their own licenses (called custom license), whose descriptions are more flexible. The presence of such various licenses imposes challenges to understand licenses and their compatibility. To avoid financial and legal risks, it is essential to ensure license compatibility when integrating third-party packages or reusing code accompanied with licenses. In this work, we propose LiDetector, an effective tool that extracts and interprets OSS licenses (including both official licenses and custom licenses), and detects license incompatibility among these licenses. Specifically, LiDetector introduces a learning-based method to automatically identify meaningful license terms from an arbitrary license, and employs Probabilistic Context-Free Grammar (PCFG) to infer rights and obligations for incompatibility detection. Experiments demonstrate that LiDetector outperforms existing methods with 93.28% precision for term identification, and 91.09% accuracy for right and obligation inference, and can effectively detect incompatibility with 10.06% FP rate and 2.56% FN rate. Furthermore, with LiDetector, our large-scale empirical study on 1,846 projects reveals that 72.91% of the projects are suffering from license incompatibility, including popular ones such as the MIT License and the Apache License. We highlighted lessons learned from perspectives of different stakeholders and made all related data and the replication package publicly available to facilitate follow-up research.

REFERENCES

  1. [1] Alspaugh Thomas A., Asuncion Hazeline U., and Scacchi Walt. 2009. Intellectual property rights requirements for heterogeneously-licensed systems. In Proceedings of the 17th IEEE International Requirements Engineering Conference. 2433.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Alspaugh Thomas A., Scacchi Walt, and Asuncion Hazeline U.. 2010. Software licenses in context: The challenge of heterogeneously-licensed systems. Journal of the Association for Information Systems 11, 11 (2010), 2.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Andow Benjamin, Mahmud Samin Yaseer, Wang Wenyu, Whitaker Justin, Enck William, Reaves Bradley, Singh Kapil, and Xie Tao. 2019. Policylint: Investigating internal privacy policy contradictions on Google Play. In Proceedings of the 28th USENIX Conference on Security Symposium. 585602.Google ScholarGoogle Scholar
  4. [4] BDF. 2021. The Backdoor Factory. Retrieved 27th Sep 2021 from https://github.com/secretsquirrel/the-backdoor-factory.Google ScholarGoogle Scholar
  5. [5] Blosc. 2021. A blocking, shuffling and lossless compression library. Retrieved 27th Sep 2021 from https://github.com/Blosc/c-blosc.Google ScholarGoogle Scholar
  6. [6] Chen Sen, Fan Lingling, Meng Guozhu, Su Ting, Xue Minhui, Xue Yinxing, Liu Yang, and Xu Lihua. 2020. An empirical assessment of security risks of global android banking apps. In Proceedings of the 2020 IEEE/ACM 42nd International Conference on Software Engineering. IEEE, 13101322.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Chen Sen, Su Ting, Fan Lingling, Meng Guozhu, Xue Minhui, Liu Yang, and Xu Lihua. 2018. Are mobile banking apps secure? what can be improved? In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 797802.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] choosealicense. 2012. Choose an open source license. Retrieved 27th Sep 2021 from https://choosealicense.com/no-permission/.Google ScholarGoogle Scholar
  9. [9] F. Gordon Thomas2010. Report on prototype decision support system for oss license compatibility issues. Qualipso 79 (2010), 80.Google ScholarGoogle Scholar
  10. [10] facebookarchive. 2021. Augmented Traffic Control. Retrieved 27th Sep 2021 from https://github.com/facebookarchive/augmented-traffic-control.Google ScholarGoogle Scholar
  11. [11] Fan Runyu, Wang Lizhe, Yan Jining, Song Weijing, Zhu Yingqian, and Chen Xiaodao. 2020. Deep learning-based named entity recognition and knowledge graph construction for geological hazards. ISPRS International Journal of Geo-Information 9, 1 (2020), 15.Google ScholarGoogle Scholar
  12. [12] Foundation Linux. 2018. The Software Package Data Exchange. Retrieved 27th Sep 2021 from https://spdx.dev/.Google ScholarGoogle Scholar
  13. [13] Gangadharan GR, D’Andrea Vincenzo, Paoli Stefano De, and Weiss Michael. 2012. Managing license compliance in free and open source software development. Information Systems Frontiers 14, 2 (2012), 143154.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] German Daniel and Penta Massimiliano Di. 2012. A method for open source license compliance of java applications. IEEE Software 29, 3 (2012), 5863.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] German Daniel M., Manabe Yuki, and Inoue Katsuro. 2010. A sentence-matching method for automatic license identification of source code files. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering. 437446.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Gobeille Robert. 2008. The fossology project. In Proceedings of the 2008 International Working Conference on Mining Software Repositories. 4750.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Gordon Thomas F.. 2011. Analyzing open source license compatibility issues with carneades. In Proceedings of the 13th International Conference on Artificial Intelligence and Law. 5155.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Gordon Thomas F.. 2013. Introducing the carneades web application. In Proceedings of the 14th International Conference on Artificial Intelligence and Law. 243244.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Gordon Thomas F.. 2014. A demonstration of the MARKOS license analyser. In Proceedings of the 5th International Conference on Computational Models of Argument. 461462.Google ScholarGoogle Scholar
  20. [20] Group Stanford NLP. 2020. corenlp. Retrieved 27th Sep 2021 from https://stanfordnlp.github.io/CoreNLP/.Google ScholarGoogle Scholar
  21. [21] Guo Hao, Chen Sen, Xing Zhenchang, Li Xiaohong, Bai Yude, and Sun Jiamou. 2022. Detecting and augmenting missing key aspects in vulnerability descriptions. ACM Transactions on Software Engineering and Methodology (TOSEM) 31, 3 (2022), 1–27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Guo Hao, Xing Zhenchang, Chen Sen, Li Xiaohong, Bai Yude, and Zhang Hu. 2021. Key aspects augmentation of vulnerability description based on multiple security databases. In Proceedings of the 2021 IEEE 45th Annual Computers, Software, and Applications Conference. IEEE, 10201025.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] HaboMalHunter. 2021. Habo Linux Malware Analysis System. Retrieved 27th Sep 2021 from https://github.com/Tencent/HaboMalHunter.Google ScholarGoogle Scholar
  24. [24] Higashi Yunosuke, Manabe Yuki, and Ohira Masao. 2016. Clustering OSS license statements toward automatic generation of license rules. In Proceddings of the 7th International Workshop on Empirical Software Engineering in Practice. 3035.Google ScholarGoogle Scholar
  25. [25] Kapitsaki Georgia and Charalambous Georgia. 2019. Modeling and recommending open source licenses with findOSSLicense. IEEE Transactions on Software Engineering 47, 5 (2019), 919–935.Google ScholarGoogle Scholar
  26. [26] Kapitsaki Georgia M. and Kramer Frederik. 2014. Open source license violation check for SPDX files. In Proceedings of the Software Reuse for Dynamic Systems in the Cloud and Beyond. 90105.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Kapitsaki Georgia M., Kramer Frederik, and Tselikas Nikolaos D.. 2017. Automating the license compatibility process in open source software with SPDX. Journal of Systems and Software 131 (2017), 386401.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Kapitsaki Georgia M. and Paschalides Demetris. 2017. Identifying terms in open source software license texts. In Proceedigns of the 24th Asia-Pacific Software Engineering Conference. 540545.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Karvelis Petros, Gavrilis Dimitris, Georgoulas George, and Stylios Chrysostomos. 2018. Topic recommendation using Doc2Vec. In Proceedings of the 2018 International Joint Conference on Neural Networks. 16.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] kevin. 2012. Software Licenses in Plain English. Retrieved 27th Sep 2021 from https://tldrlegal.com/.Google ScholarGoogle Scholar
  31. [31] Le Quoc and Mikolov Tomas. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning. 11881196.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Li Liuqing, Feng He, Zhuang Wenjie, Meng Na, and Ryder Barbara. 2017. Cclearner: A deep learning-based clone detection approach. In Proceedings of the 2017 IEEE International Conference on Software Maintenance and Evolution. IEEE, 249260.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] librariesio. 2015. Check compatibility between different SPDX licenses for checking dependency license compatibility. Retrieved from https://github.com/librariesio/license-compatibility.Google ScholarGoogle Scholar
  34. [34] Lawrence Rosen. 2004. Open Source Licensing: Software Freedom and Intellectual Property Law. Upper Saddle River, Prentice Hall.Google ScholarGoogle Scholar
  35. [35] Ling. 2003. Alphabetical list of part-of-speech tags used in the Penn Treebank Project. Retrieved 27th Sep 2021 from https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html.Google ScholarGoogle Scholar
  36. [36] Liu Chengwei, Chen Sen, Fan Lingling, Chen Bihuan, Liu Yang, and Peng Xin. 2022. Demystifying the vulnerability propagation and its evolution via dependency trees in the NPM ecosystem. In Proceedings of the 2022 IEEE/ACM 44nd International Conference on Software Engineering. IEEE.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Mancinelli Fabio, Boender Jaap, Cosmo Roberto Di, Vouillon Jerome, Durak Berke, Leroy Xavier, and Treinen Ralf. 2006. Managing the complexity of large free and open source package-based software distributions. In Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering. 199208.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Mathur Arunesh, Choudhary Harshal, Vashist Priyank, Thies William, and Thilagam Santhi. 2012. An empirical study of license violations in open source projects. In Proceedings of the 35th Annual IEEE Software Engineering Workshop. 168176.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] nltk. 2021. Natural Language Toolkit. Retrieved 27th Sep 2021 from https://www.nltk.org/.Google ScholarGoogle Scholar
  40. [40] Opensource. 2021. What is open source? Retrieved 27th Sep 2021 from https://opensource.com/resources/what-open-source.Google ScholarGoogle Scholar
  41. [41] Paschalides Demetris and Kapitsaki Georgia M. 2016. Validate your SPDX files for open source license violations. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 10471051.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] paul. 2021. Full extractor of class/interface/method definitions. Retrieved 27th Sep 2021 from https://github.com/paul-hammant/qdox.Google ScholarGoogle Scholar
  43. [43] pivotal. 2021. Find licenses for your project’s dependencies. Retrieved 27th Sep 2021 from https://github.com/pivotal/LicenseFinder.Google ScholarGoogle Scholar
  44. [44] ProgrammerSought. 2021. The First Case of GPL Agreement in China is Settled. How Should the Relevant Open Source Software be Controlled? Retrieved from https://segmentfault.com/a/1190000040661920/en.Google ScholarGoogle Scholar
  45. [45] PyPi. 2021. Find, install and publish Python packages with the Python Package Index. Retrieved 27th Sep 2021 from https://pypi.org/.Google ScholarGoogle Scholar
  46. [46] Reddy Jaideep. 2015. The Consequences of Violating Open Source Licenses. Retrieved 27th Sep 2021 from https://btlj.org/2015/11/consequences-violating-open-source-licenses/.Google ScholarGoogle Scholar
  47. [47] Nils Reimers and Iryna Gurevych. 2017. Optimal hyperparameters for deep LSTM-networks for sequence labeling tasks. arXiv:1707.06799. Retrieved from https://arxiv.org/abs/1707.06799.Google ScholarGoogle Scholar
  48. [48] Socher Christopher D. Manning Richard. 2014. GloVe: Global Vectors for Word Representation. Retrieved 27th Sep 2021 from https://nlp.stanford.edu/projects/glove/.Google ScholarGoogle Scholar
  49. [49] robinhood. 2021. Faust. Retrieved 27th Sep 2021 from https://github.com/robinhood/faust.Google ScholarGoogle Scholar
  50. [50] Higuera Colin Scicluna, James de la. 2016. Grammatical inference of PCFGs applied to language modelling and unsupervised parsing. Fundamenta Informaticae 146, 4 (2016), 379402.Google ScholarGoogle Scholar
  51. [51] Socher Richard, Perelygin Alex, Wu Jean, Chuang Jason, Manning Christopher D., Ng Andrew, and Potts Christopher. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 16311642.Google ScholarGoogle Scholar
  52. [52] Solakidis Georgios S., Vavliakis Konstantinos N., and Mitkas Pericles A.. 2014. Multilingual sentiment analysis using emoticons and keywords. In Proceedings of the IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies. 102109.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] SPDX. 2018. Apache License 2.0. Retrieved 27th Sep 2021 from https://spdx.org/licenses/Apache-2.0.html.Google ScholarGoogle Scholar
  54. [54] SPDX. 2018. BSD 3-Clause “New” or “Revised” License. Retrieved 27th Sep 2021 from https://spdx.org/licenses/BSD-3-Clause.html.Google ScholarGoogle Scholar
  55. [55] SPDX. 2018. Creative Commons Attribution Share Alike 4.0 International. Retrieved 27th Sep 2021 from https://spdx.org/licenses/CC-BY-SA-4.0.html.Google ScholarGoogle Scholar
  56. [56] SPDX. 2018. GNU Lesser General Public License v3.0 only. Retrieved 27th Sep 2021 from https://spdx.org/licenses/LGPL-3.0-only.html.Google ScholarGoogle Scholar
  57. [57] SPDX. 2018. The MIT License. Retrieved 27th Sep 2021 from https://spdx.org/licenses/MIT.html.Google ScholarGoogle Scholar
  58. [58] SPDX. 2018. Zope Public License 2.1. Retrieved 27th Sep 2021 from https://spdx.org/licenses/ZPL-2.1.html.Google ScholarGoogle Scholar
  59. [59] SPDX. 2021. Creative Commons Attribution 3.0 Unported. Retrieved 27th Sep 2021 from https://spdx.org/licenses/CC-BY-3.0.html.Google ScholarGoogle Scholar
  60. [60] SPDX. 2021. SPDX License List. Retrieved 27th Sep 2021 from https://spdx.org/licenses/.Google ScholarGoogle Scholar
  61. [61] Statsite. 2021. Statsite. Retrieved 27th Sep 2021 from https://github.com/statsite/statsite.Google ScholarGoogle Scholar
  62. [62] Timo Tuunanen, Jussi Koskinen, and Tommi Kärkkäinen. 2009. Automated software license analysis. Automated Software Engineering 16 (2009), 455490.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. [63] Wheeler David A.. 2007. The free-libre / open source software (FLOSS) license slide. Retrieved 27th Sep 2021 from http://www.dwheeler.com/essays/floss-license-slide.pdf.Google ScholarGoogle Scholar
  64. [64] Xia Linzhong, Liu Jun, and Zhang Zhenjiu. 2019. Automatic essay scoring model based on two-layer bi-directional long-short term memory network. In Proceedings of the 2019 3rd International Conference on Computer Science and Artificial Intelligence. 133137.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. [65] Xu HongBo, Yang HuiHui, Wan Dan, and Wan JiangPing. 2010. The design and implement of open source license tracking system. In Proceddings of the 2010 International Conference on Computational Intelligence and Software Engineering. 14.Google ScholarGoogle ScholarCross RefCross Ref
  66. [66] Xu Sihan, Gao Ya, Fan Lingling, Liu Zheli, Liu Yang, and Ji Hua. 2021. LiDetector: License Incompatiblity Detection for Open Source Software. Retrieved 1st Jan 2022 from https://sites.google.com/view/lidetector.Google ScholarGoogle Scholar
  67. [67] Xu Sihan, Gao Ya, Fan Lingling, Liu Zheli, Liu Yang, and Ji Hua. 2021. LiDetector: License Incompatiblity Detection for Open Source Software. Retrieved 1st Jan 2022 from https://github.com/XuSihan/LiDetector.Google ScholarGoogle Scholar
  68. [68] Zhan Xian, Fan Lingling, Chen Sen, Wu Feng, Liu Tianming, Luo Xiapu, and Liu Yang. 2021. Atvhunter: Reliable version detection of third-party libraries for vulnerability identification in android applications. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering. IEEE, 16951707.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. [69] Zhan Xian, Fan Lingling, Liu Tianming, Chen Sen, Li Li, Wang Haoyu, Xu Yifei, Luo Xiapu, and Liu Yang. 2020. Automated third-party library detection for android applications: Are we there yet?. In Proceedings of the 2020 35th IEEE/ACM International Conference on Automated Software Engineering. IEEE, 919930.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. [70] Zhan Xian, Liu Tianming, Fan Lingling, Li Li, Chen Sen, Luo Xiapu, and Liu Yang. 2021. Research on third-party libraries in android apps: A taxonomy and systematic literature review. IEEE Transactions on Software Engineering (2021), 1–32.Google ScholarGoogle Scholar

Index Terms

  1. LiDetector: License Incompatibility Detection for Open Source Software

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Software Engineering and Methodology
          ACM Transactions on Software Engineering and Methodology  Volume 32, Issue 1
          January 2023
          954 pages
          ISSN:1049-331X
          EISSN:1557-7392
          DOI:10.1145/3572890
          • Editor:
          • Mauro Pezzè
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 February 2023
          • Online AM: 19 May 2022
          • Accepted: 14 February 2022
          • Revised: 15 December 2021
          • Received: 13 August 2021
          Published in tosem Volume 32, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed
        • Article Metrics

          • Downloads (Last 12 months)469
          • Downloads (Last 6 weeks)42

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format