skip to main content
10.1145/3422392.3422501acmotherconferencesArticle/Chapter ViewAbstractPublication PagessbesConference Proceedingsconference-collections
research-article

Data mining tool to discover DevOps trends from public repositories: Predicting Release Candidates with gthbmining.rc

Published: 21 December 2020 Publication History

Abstract

Public repositories have been performing an essential role in bringing software and services to technical communities and general users. Most of the cases, public repositories have a DevOps tool, with a live and historical database behind it, to support delivering and all steps this software or service should adopt before going to production. This paper introduces gthbmining, a data mining set of tools to discover DevOps trends from public repositories, and presents the module gthbmining.rc. Considering the premise of a GitHub public repository, the main contribution here is predicting release candidates, an important label a software release has. The methodology, architecture, components and interfaces are explained, as well as potential users. The results show a reliable and flexible tool, as classifiers metrics and graphics are provided, along with the possibility to add new data mining algorithms in the open source module presented. Related works are also supplied, and a conclusion shows the outcomes gthbmining.rc can provide.

References

[1]
Guilherme Avelino, Leonardo Passos, Fabio Petrillo, and Marco Tulio Valente. 2019. Who Can Maintain This Code?: Assessing the Effectiveness of Repository-Mining Techniques for Identifying Software Maintainers. IEEE Software 36, 6 (11 2019), 34--42. https://doi.org/10.1109/MS.2018.185140155
[2]
Hudson Borges, Rodrigo Brito, and Marco Tulio Valente. 2019. Beyond Textual Issues: Understanding the Usage and Impact of GitHub Reactions. In Proceedings of the XXXIII Brazilian Symposium on Software Engineering (SBES). 397---406. https://doi.org/10.1145/3350768.3350788
[3]
Hudson Borges, Andre Hora, and Marco Tulio Valente. 2016. Understanding the Factors That Impact the Popularity of GitHub Repositories. In 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME). 334--344. https://doi.org/10.1109/ICSME.2016.31
[4]
Jailton Coelho, Marco Tulio Valente, Luciana L. Silva, and Emad Shihab. 2018. Identifying Unmaintained Projects in Github. In Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). https://doi.org/10.1145/3239235.3240501
[5]
Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N. Nguyen. 2013. Boa: A language and infrastructure for analyzing ultra-large-scale software repositories. In Proceedings of the 35th International Conference on Software Engineering (ICSE). 422--431. https://doi.org/10.1109/ICSE.2013.6606588
[6]
Justin R. Erenkrantz. 2003. Release management within open source projects. In Proceedings of the 3rd Workshop on Open Source Software Engineering. 51--55.
[7]
Haiyang Feng, Zhengrui Jiang, and Dengpan Liu. 2018. Quality, Pricing, and Release Time: Optimal Market Entry Strategy for Software-as-a-Service Vendors. MIS Quarterly 42, 1 (3 2018), 333---354. https://doi.org/10.25300/MISQ/2018/14057
[8]
Georgios Gousios. 2013. The GHTorent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR). 233--236. https://doi.org/10.1109/MSR.2013.6624034
[9]
Jacob A. Harer, Louis Y. Kim, Rebecca L. Russell, Onur Ozdemir, Leonard R. Kosta, Akshay Rangamani, Lei H. Hamilton, Gabriel I. Centeno, Jonathan R. Key, Paul M. Ellingwood, Erik Antelman, Alan Mackay, Marc W. McConley, Jeffrey M. Opper, Peter Chin, and Tomo Lazovich. 2018. Automated software vulnerability detection with machine learning. arXiv preprint arXiv:1803.04497. https://arxiv.org/abs/1803.04497
[10]
Jing Jiang, David Lo, Jiahuan He, Xin Xia, Pavneet Singh Kochhar, and Li Zhang. 2017. Why and How Developers Fork What from Whom in GitHub. Empirical Software Engineering 22, 1 (2 2017), 547---578. https://doi.org/10.1007/s10664-016-9436-6
[11]
Jing Jiang, David Lo, Yun Yang, Jianfeng Li, and Li Zhang. 2019. A first look at unfollowing behavior on GitHub. Information and Software Technology 105 (1 2019), 150--160. https://doi.org/10.1016/j.infsof.2018.08.012
[12]
Jing Jiang, Li Zhang, and Lei Li. 2013. Understanding project dissemination on a social coding site. In Proceedings of the 20th Working Conference on Reverse Engineering (WCRE). 132--141. https://doi.org/10.1109/WCRE.2013.6671288
[13]
Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. German, and Daniela Damian. 2014. The Promises and Perils of Mining GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR). 92---101. https://doi.org/10.1145/2597073.2597074
[14]
Noureddine Kerzazi and Bram Adams. 2016. Who Needs Release and Devops Engineers, and Why?. In Proceedings of the International Workshop on Continuous Software Evolution and Delivery (CSED). 77----83. https://doi.org/10.1145/2896941.2896957
[15]
Frank Nagle, Juliane Wissel, Michael Zaggl, Hila Lifshitz-Assaf, Maha Shaikh, Prasanna Tambe, and Sirui Wang. 2019. Open Source Software Development and Organizations. Academy of Management Proceedings 2019, 1 (2019), 18072. https://doi.org/10.5465/AMBPP.2019.18072symposium
[16]
Davide Spadini, Mauricio F. Aniche, and Alberto Bacchelli. 2018. PyDriller: Python Framework for Mining Software Repositories. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 908---911. https://doi.org/10.1145/3236024.3264598
[17]
Alexey Zagalsky, Joseph Feliciano, Margaret-Anne Storey, Yiyun Zhao, and Weiliang Wang. 2015. The Emergence of GitHub as a Collaborative Platform for Education. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work Social Computing (CSCW '15). 1906--1917. https://doi.org/10.1145/2675133.2675284

Cited By

View all
  • (2021)A Mining Software Repository Extended Cookbook: Lessons learned from a literature reviewProceedings of the XXXV Brazilian Symposium on Software Engineering10.1145/3474624.3474627(1-10)Online publication date: 27-Sep-2021

Index Terms

  1. Data mining tool to discover DevOps trends from public repositories: Predicting Release Candidates with gthbmining.rc

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        SBES '20: Proceedings of the XXXIV Brazilian Symposium on Software Engineering
        October 2020
        901 pages
        ISBN:9781450387538
        DOI:10.1145/3422392
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        In-Cooperation

        • SBC: Brazilian Computer Society

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 21 December 2020

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Data Mining
        2. DevOps
        3. GitHub Mining Tool
        4. Release Candidate

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        SBES '20

        Acceptance Rates

        Overall Acceptance Rate 147 of 427 submissions, 34%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)13
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 14 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2021)A Mining Software Repository Extended Cookbook: Lessons learned from a literature reviewProceedings of the XXXV Brazilian Symposium on Software Engineering10.1145/3474624.3474627(1-10)Online publication date: 27-Sep-2021

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media