ABSTRACT
In this research we have tried to identify successful and unsuccessful projects on GitHub from a sample of 5000 randomly picked projects in a number of randomly selected languages (Java, PHP, JavaScript, C#/C++, HTML). We have selected 1000 projects for each of these languages through the publicly available GitHub API, refined our dataset, and applied different machine learning algorithms to achieve our aim. We initially implemented numerous queries against the dataset and found meaningful relationships and correlations between some of the fetched attributes which have an effect on the popularity of these projects. Later we could develop an application that will determine the success or failure of a specific open source project.
- Jaeger, Till, and Axel Metzger. 2006. Open-source-Software: Rechtliche Rahmenbedingungen Der Freien Software. München: Beck, 2006. Print.Google Scholar
- Pyle, Dorian. Data Preparation for Data Mining. San Francisco, CA: Morgan Kaufmann, 1999. Print Google ScholarDigital Library
- Keys to open source success: http://www.itworld.com/article/2694496/cloud-computing/3-keys-to-open-source-success.html.Google Scholar
- Forbes, 2016. Time distribution of Data Scientists.Google Scholar
- "What Is the Truck-Factor of Popular GitHub Applications?" Hacker News (Puch Cool). N.p., n.d. Web. 21 Nov. 2016.Google Scholar
- Istiyanto, J. E., and Wahju Rahardjo Emanuel, A. 2009. Success Factors of Open Source Software Projects using Datamining Technique. 1st Information and Communication Technology International Seminar, July 2009. ISSN 2085-692XGoogle Scholar
- Midha, V., and Palvia, P. 2012. Factors affecting the success of Open Source Software. Journal of Systems and Software 85 (4), (2012), 895--905. Google ScholarDigital Library
- AnnaLiisa Mattila., and Tanja Mehtonen. Measuring Open Source Software Success and Recognising Success Factors For introductionsGoogle Scholar
- Guerrouj, L., Azad, S., and Rigby, P. C. 2015. The influence of App churns on App success and StackOverflow discussions. In Y.-G. Guéhéneuc, B. Adams & A. Serebrenik (eds.), SANER (pp. 321--330),: IEEE. ISBN: 978-1-4799-8469-5Google Scholar
- The 13th Working Conference on Mining Software Repositories - http://2016.msrconf.orgGoogle Scholar
- Offline mirror of GitHub repositories - http://ghtorrent.org/Google Scholar
- Project Link- http://junaidmaqsood.com/success-or-failure-identification-for-githubs-open-source-projectsGoogle Scholar
- Weka Tool - http://www.cs.waikato.ac.nz/ml/weka/Google Scholar
- Maqsood, J. {dot}Net Library for SMS Spam Detection, http://junaidmaqsood.com/dot-net-library-for-sms-spam-detectionGoogle Scholar
Recommendations
Why modern open source projects fail
ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of Software EngineeringOpen source is experiencing a renaissance period, due to the appearance of modern platforms and workflows for developing and maintaining public code. As a result, developers are creating open source software at speeds never seen before. Consequently, ...
The promises and perils of mining GitHub
MSR 2014: Proceedings of the 11th Working Conference on Mining Software RepositoriesWith over 10 million git repositories, GitHub is becoming one of the most important source of software artifacts on the Internet. Researchers are starting to mine the information stored in GitHub's event logs, trying to understand how its users employ ...
Mining software engineering data from GitHub
ICSE-C '17: Proceedings of the 39th International Conference on Software Engineering CompanionGitHub is the largest collaborative source code hosting site built on top of the Git version control system. The availability of a comprehensive API has made GitHub a target for many software engineering and online collaboration research efforts. In our ...
Comments