skip to main content
10.1145/2901739.2901751acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Using dynamic and contextual features to predict issue lifetime in GitHub projects

Published: 14 May 2016 Publication History

Abstract

Methods for predicting issue lifetime can help software project managers to prioritize issues and allocate resources accordingly. Previous studies on issue lifetime prediction have focused on models built from static features, meaning features calculated at one snapshot of the issue's lifetime based on data associated to the issue itself. However, during its lifetime, an issue typically receives comments from various stakeholders, which may carry valuable insights into its perceived priority and difficulty and may thus be exploited to update lifetime predictions. Moreover, the lifetime of an issue depends not only on characteristics of the issue itself, but also on the state of the project as a whole. Hence, issue lifetime prediction may benefit from taking into account features capturing the issue's context (contextual features). In this work, we analyze issues from more than 4000 GitHub projects and build models to predict, at different points in an issue's lifetime, whether or not the issue will close within a given calendric period, by combining static, dynamic and contextual features. The results show that dynamic and contextual features complement the predictive power of static ones, particularly for long-term predictions.

References

[1]
Antoniol, G., Ayari, K., Di Penta, M., Khomh, F., and Guéhéneuc, Y.-G. Is it a bug or an enhancement?: A text-based approach to classify change requests. In Proceedings of the 2008 Conference of the Center for Advanced Studies on Collaborative Research: Meeting of Minds (2008), CASCON '08, ACM, pp. 23:304--23:318.
[2]
Assar, S., Borg, M., and Pfahl, D. Using text clustering to predict defect resolution time: a conceptual replication and an evaluation of prediction accuracy. Empirical Software Engineering (2015), 1--39.
[3]
Bhattacharya, P., and Neamtiu, I. Bug-fix time prediction models: Can we do better? In Proceedings of the 8th Working Conference on Mining Software Repositories (2011), MSR '11, ACM, pp. 207--210.
[4]
Breiman, L. Random forests. Machine learning 45, 1 (2001), 5--32.
[5]
Breiman, L., Friedman, J., Olshen, R., and Stone, C. Classification and Regression Trees. Wadsworth and Brooks, Monterey, CA, 1984.
[6]
Choetkiertikul, M., Dam, H. K., Tran, T., and Ghose, A. Predicting delays in software projects using networked classification. In Automated Software Engineering (ASE), 2015 30th IEEE/ACM International Conference on (2015), pp. 353--364.
[7]
Fernández-Delgado, M., Cernadas, E., Barro, S., and Amorim, D. Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn., Res. 15, 1 (2014), 3133--3181.
[8]
Francis, P., and Williams, L. Determining 'grim reaper'; policies to prevent languishing bugs. In Software Maintenance (ICSM), 2013 29th IEEE International Conference on (2013), pp. 436--439.
[9]
Giger, E., Pinzger, M., and Gall, H. Predicting the fix time of bugs. In Proceedings of the 2Nd International Workshop on Recommendation Systems for Software Engineering (2010), RSSE '10, ACM, pp. 52--56.
[10]
Gousios, G. The GHTorrent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories (2013), MSR '13, pp. 233--236.
[11]
Gousios, G., Pinzger, M., and Deursen, A. v. An exploratory study of the pull-based software development model. In Proceedings of the 36th International Conference on Software Engineering (2014), ICSE 2014, ACM, pp. 345--355.
[12]
Gousios, G., Zaidman, A., Storey, M.-A., and van Deursen, A. Work practices and challenges in pull-based development: The integrator's perspective. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (2015), ICSE '15, IEEE Press, pp. 358--368.
[13]
Guo, P., Zimmermann, T., Nagappan, N., and Murphy, B. Characterizing and predicting which bugs get fixed: an empirical study of microsoft windows. In Software Engineering, 2010 ACM/IEEE 32nd International Conference on (2010), vol. 1, pp. 495--504.
[14]
Guo, P. J., Zimmermann, T., Nagappan, N., and Murphy, B. "not my bug!" and other reasons for software bug report reassignments. In Proceedings of the ACM 2011 Conference on Computer Supported Cooperative Work (2011), CSCW '11, ACM, pp. 395--404.
[15]
Jiang, Y., Adams, B., and German, D. M. Will my patch make it? and how fast?: Case study on the linux kernel. In Proceedings of the 10th Working Conference on Mining Software Repositories (2013), MSR '13, IEEE Press, pp. 101--110.
[16]
Kalliamvakou, E., Gousios, G., Blincoe, K., Singer, L., German, D. M., and Damian, D. The promises and perils of mining github. In Proceedings of the 11th Working Conference on Mining Software Repositories (2014), MSR 2014, ACM, pp. 92--101.
[17]
Kikas, R., Dumas, M., and Pfahl, D. Issue dynamics in github projects. In Product-Focused Software Process Improvement, vol. 9459 of Lecture Notes in Computer Science. Springer International Publishing, 2015, pp. 295--310.
[18]
Liu, H., and Setiono, R. Chi2: feature selection and discretization of numeric attributes. In Tools with Artificial Intelligence, 1995. Proceedings., Seventh International Conference on (1995), pp. 388--391.
[19]
Marks, L., Zou, Y., and Hassan, A. E. Studying the fix-time for bugs in large open source projects. In Proceedings of the 7th International Conference on Predictive Models in Software Engineering (2011), Promise '11, ACM, pp. 11:1--11:8.
[20]
Murgia, A., Concas, G., Tonelli, R., Ortu, M., Demeyer, S., and Marchesi, M. On the influence of maintenance activity types on the issue resolution time. In Proceedings of the 10th International Conference on Predictive Models in Software Engineering (2014), PROMISE '14, ACM, pp. 12--21.
[21]
Panjer, L. D. Predicting eclipse bug lifetimes. In Proceedings of the Fourth International Workshop on Mining Software Repositories (2007), MSR '07, IEEE Computer Society, p. 29.
[22]
Saha, R., Khurshid, S., and Perry, D. An empirical study of long lived bugs. In Software Maintenance, Reengineering and Reverse Engineering (CSMR-WCRE), 2014 Software Evolution Week - IEEE Conference on (2014), pp. 144--153.
[23]
Shihab, E., Ihara, A., Kamei, Y., Ibrahim, W. M., Ohira, M., Adams, B., Hassan, A. E., and Matsumoto, K.-I. Studying re-opened bugs in open source software. Empirical Software Engineering 18, 5 (2012), 1005--1042.
[24]
Tian, Y., Lo, D., and Sun, C. Drone: Predicting priority of reported bugs by multi-factor analysis. In Software Maintenance (ICSM), 2013 29th IEEE International Conference on (2013), pp. 200--209.
[25]
Tsay, J., Dabbish, L., and Herbsleb, J. Influence of social and technical factors for evaluating contribution in github. In Proceedings of the 36th International Conference on Software Engineering (2014), ICSE 2014, ACM, pp. 356--366.
[26]
Tsuruoka, Y., Tsujii, J., and Ananiadou, S. Stochastic gradient descent training for 11-regularized log-linear models with cumulative penalty. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1 (2009), ACL '09, Association for Computational Linguistics, pp. 477--485.
[27]
Weinberger, K., Dasgupta, A., Langford, J., Smola, A., and Attenberg, J. Feature hashing for large scale multitask learning. In Proceedings of the 26th Annual International Conference on Machine Learning (2009), ICML '09, ACM, pp. 1113--1120.
[28]
Weiss, C., Premraj, R., Zimmermann, T., and Zeller, A. How long will it take to fix this bug? In Proceedings of the Fourth International Workshop on Mining Software Repositories (2007), MSR '07, IEEE Computer Society, p. 1.
[29]
Xia, X., Lo, D., Shihab, E., Wang, X., and Zhou, B. Automatic, high accuracy prediction of reopened bugs. Automated Software Engineering 22, 1 (2015), 75--109.
[30]
Xia, X., Lo, D., Wang, X., Yang, X., Li, S., and Sun, J. A comparative study of supervised learning algorithms for re-opened bug prediction. In Software Maintenance and Reengineering (CSMR), 2013 17th European Conference on (2013), pp. 331--334.
[31]
Yu, Y., Wang, H., Filkov, V., Devanbu, P., and Vasilescu, B. Wait for it: Determinants of pull request evaluation latency on github. In Proceedings of the 12th Working Conference on Mining Software Repositories (2015), MSR '15, IEEE Press, pp. 367--371.
[32]
Zhou, Y., Tong, Y., Gu, R., and Gall, H. Combining text mining and data mining for bug report classification. In Proceedings of the 2014 IEEE International Conference on Software Maintenance and Evolution (2014), ICSME '14, IEEE Computer Society, pp. 311--320.

Cited By

View all
  • (2025)On the Prediction of Open Source Software Ecosystem Health Based on Time Series AnalysisComputer Supported Cooperative Work and Social Computing10.1007/978-981-96-2373-0_10(138-149)Online publication date: 5-Mar-2025
  • (2024)An Exploratory Study on Machine Learning Model ManagementACM Transactions on Software Engineering and Methodology10.1145/368884134:1(1-31)Online publication date: 16-Aug-2024
  • (2024)Learning from Very Little Data: On the Value of Landscape Analysis for Predicting Software Project HealthACM Transactions on Software Engineering and Methodology10.1145/363025233:3(1-22)Online publication date: 14-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '16: Proceedings of the 13th International Conference on Mining Software Repositories
May 2016
544 pages
ISBN:9781450341868
DOI:10.1145/2901739
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 May 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. issue lifetime prediction
  2. issue tracking
  3. mining software repositories

Qualifiers

  • Research-article

Conference

ICSE '16
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)38
  • Downloads (Last 6 weeks)3
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)On the Prediction of Open Source Software Ecosystem Health Based on Time Series AnalysisComputer Supported Cooperative Work and Social Computing10.1007/978-981-96-2373-0_10(138-149)Online publication date: 5-Mar-2025
  • (2024)An Exploratory Study on Machine Learning Model ManagementACM Transactions on Software Engineering and Methodology10.1145/368884134:1(1-31)Online publication date: 16-Aug-2024
  • (2024)Learning from Very Little Data: On the Value of Landscape Analysis for Predicting Software Project HealthACM Transactions on Software Engineering and Methodology10.1145/363025233:3(1-22)Online publication date: 14-Mar-2024
  • (2024)Improving Issue-PR Link Prediction via Knowledge-Aware Heterogeneous Graph LearningIEEE Transactions on Software Engineering10.1109/TSE.2024.340844850:7(1901-1920)Online publication date: 1-Jul-2024
  • (2024)Wiki2GH: A Recommendation Service to Link Software Engineering Knowledge to Practical DevelopmentService Science10.1007/978-981-97-5760-2_14(203-220)Online publication date: 19-Aug-2024
  • (2024)Data-Driven Software Engineering: A Systematic Literature ReviewSystems, Software and Services Process Improvement10.1007/978-3-031-71139-8_2(19-32)Online publication date: 7-Sep-2024
  • (2024)Predicting Issue Resolution Time of OSS Using Multiple FeaturesJournal of Software: Evolution and Process10.1002/smr.274637:1Online publication date: 22-Nov-2024
  • (2024)Balanced knowledge distribution among software development teams—Observations from open‐ and closed‐source software developmentJournal of Software: Evolution and Process10.1002/smr.2655Online publication date: 13-Feb-2024
  • (2023)Prioritizing tasks in software development: A systematic literature reviewPLOS ONE10.1371/journal.pone.028383818:4(e0283838)Online publication date: 6-Apr-2023
  • (2023)Automated Identification and Qualitative Characterization of Safety Concerns Reported in UAV Software PlatformsACM Transactions on Software Engineering and Methodology10.1145/356482132:3(1-37)Online publication date: 26-Apr-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media