skip to main content
10.1145/2950290.2950353acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models

Published: 01 November 2016 Publication History

Abstract

Unsupervised models do not require the defect data to build the prediction models and hence incur a low building cost and gain a wide application range. Consequently, it would be more desirable for practitioners to apply unsupervised models in effort-aware just-in-time (JIT) defect prediction if they can predict defect-inducing changes well. However, little is currently known on their prediction effectiveness in this context. We aim to investigate the predictive power of simple unsupervised models in effort-aware JIT defect prediction, especially compared with the state-of-the-art supervised models in the recent literature. We first use the most commonly used change metrics to build simple unsupervised models. Then, we compare these unsupervised models with the state-of-the-art supervised models under cross-validation, time-wise-cross-validation, and across-project prediction settings to determine whether they are of practical value. The experimental results, from open-source software systems, show that many simple unsupervised models perform better than the state-of-the-art supervised models in effort-aware JIT defect prediction.

References

[1]
E. Arisholm, L. C. Briand, and E. B. Johannessen. A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Journal of Systems and Software, 83(1):2–17, Jan. 2010.
[2]
Y. Benjamini and Y. Hochberg. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1):289–300, Jan. 1995.
[3]
P. Bishnu and V. Bhattacherjee. Software Fault Prediction Using Quad Tree-Based K-Means Clustering Algorithm. IEEE Transactions on Knowledge and Data Engineering, 24(6):1146–1150, June 2012.
[4]
M. D’Ambros, M. Lanza, and R. Robbes. An extensive comparison of bug prediction approaches. In 2010 7th IEEE Working Conference on Mining Software Repositories (MSR), pages 31–41, May 2010.
[5]
J. Ekanayake, J. Tappolet, H. C. Gall, and A. Bernstein. Time variance and defect prediction in software projects. Empirical Software Engineering, 17(4-5):348–389, Nov. 2011.
[6]
J. Eyolfson, L. Tan, and P. Lam. Do Time of Day and Developer Experience Affect Commit Bugginess? MSR ’11, pages 153–162, New York, NY, USA, 2011. ACM.
[7]
T. Fukushima, Y. Kamei, S. McIntosh, K. Yamashita, and N. Ubayashi. An Empirical Study of Just-in-time Defect Prediction Using Cross-project Models. In Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pages 172–181, New York, NY, USA, 2014. ACM.
[8]
B. Ghotra, S. McIntosh, and A. E. Hassan. Revisiting the Impact of Classification Techniques on the Performance of Defect Prediction Models. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE), volume 1, pages 789–800, May 2015.
[9]
T. Graves, A. Karr, J. Marron, and H. Siy. Predicting fault incidence using software change history. IEEE Transactions on Software Engineering, 26(7):653–661, July 2000.
[10]
T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell. A Systematic Literature Review on Fault Prediction Performance in Software Engineering. IEEE Transactions on Software Engineering, 38(6):1276–1304, Nov. 2012.
[11]
A. E. Hassan. Predicting Faults Using the Complexity of Code Changes. In Proceedings of the 31st International Conference on Software Engineering, ICSE ’09, pages 78–88, Washington, DC, USA, 2009. IEEE Computer Society.
[12]
Y. Kamei, S. Matsumoto, A. Monden, K.-i. Matsumoto, B. Adams, and A. E. Hassan. Revisiting Common Bug Prediction Findings Using Effort-aware Models. In Proceedings of the 2010 IEEE International Conference on Software Maintenance, ICSM ’10, pages 1–10, Washington, DC, USA, 2010. IEEE Computer Society.
[13]
Y. Kamei, E. Shihab, B. Adams, A. Hassan, A. Mockus, A. Sinha, and N. Ubayashi. A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering, 39(6):757–773, June 2013.
[14]
H. Khalid, M. Nagappan, E. Shihab, and A. E. Hassan. Prioritizing the Devices to Test Your App on: A Case Study of Android Game Apps. In Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014, pages 610–620, New York, NY, USA, 2014. ACM.
[15]
S. Kim, E. Whitehead, and Y. Zhang. Classifying Software Changes: Clean or Buggy? IEEE Transactions on Software Engineering, 34(2):181–196, Mar. 2008.
[16]
A. Koru, D. Zhang, K. El Emam, and H. Liu. An Investigation into the Functional Form of the Size-Defect Relationship for Software Modules. IEEE Transactions on Software Engineering, 35(2):293–304, Mar. 2009.
[17]
A. Koru, D. Zhang, and H. Liu. Modeling the Effect of Size on Defect Proneness for Open-Source Software. pages 115–124, May 2007.
[18]
A. G. Koru, K. E. Emam, D. Zhang, H. Liu, and D. Mathew. Theory of relative defect proneness. Empirical Software Engineering, 13(5):473–498, Oct. 2008.
[19]
G. Koru, H. Liu, D. Zhang, and K. E. Emam. Testing the theory of relative defect proneness for closed-source software. Empirical Software Engineering, 15(6):577–598, Dec. 2010.
[20]
S. Lessmann, B. Baesens, C. Mues, and S. Pietsch. Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings. IEEE Transactions on Software Engineering, 34(4):485–496, July 2008.
[21]
C. Lewis, Z. Lin, C. Sadowski, X. Zhu, R. Ou, and E. J. Whitehead Jr. Does Bug Prediction Support Human Developers? Findings from a Google Case Study. In Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pages 372–381, Piscataway, NJ, USA, 2013. IEEE Press.
[22]
S. Matsumoto, Y. Kamei, A. Monden, K.-i. Matsumoto, and M. Nakamura. An analysis of developer metrics for fault prediction. In Proceedings of the 6th International Conference on Predictive Models in Software Engineering, PROMISE ’10, pages 18:1–18:9, New York, NY, USA, 2010. ACM.
[23]
T. Mende and R. Koschke. Revisiting the Evaluation of Defect Prediction Models. In Proceedings of the 5th International Conference on Predictor Models in Software Engineering, PROMISE ’09, pages 7:1–7:10, New York, NY, USA, 2009. ACM.
[24]
T. Mende and R. Koschke. Effort-aware defect prediction models. In Proceedings of the 2010 14th European Conference on Software Maintenance and Reengineering, CSMR ’10, pages 107–116, Washington, DC, USA, 2010. IEEE Computer Society.
[25]
T. Menzies, J. Greenwald, and A. Frank. Data Mining Static Code Attributes to Learn Defect Predictors. IEEE Transactions on Software Engineering, 33(1):2–13, Jan. 2007.
[26]
T. Menzies, Z. Milton, B. Turhan, B. Cukic, Y. Jiang, and A. Bener. Defect prediction from static code features: current results, limitations, new approaches. Automated Software Engineering, 17(4):375–407, May 2010.
[27]
N. Mittas and L. Angelis. Ranking and Clustering Software Cost Estimation Models through a Multiple Comparisons Algorithm. IEEE Transactions on Software Engineering, 39(4):537–551, Apr. 2013.
[28]
A. Mockus and D. M. Weiss. Predicting risk of software changes. Bell Labs Technical Journal, 5(2):169–180, Apr. 2000.
[29]
A. Monden, T. Hayashi, S. Shinoda, K. Shirai, J. Yoshida, M. Barker, and K. Matsumoto. Assessing the Cost Effectiveness of Fault Prediction in Acceptance Testing. IEEE Transactions on Software Engineering, 39(10):1345–1357, Oct. 2013.
[30]
N. Nagappan and T. Ball. Use of relative code churn measures to predict system defect density. In Proceedings of the 27th International Conference on Software Engineering, ICSE ’05, pages 284–292, New York, NY, USA, 2005. ACM.
[31]
A. T. Nguyen, T. T. Nguyen, H. A. Nguyen, and T. N. Nguyen. Multi-layered Approach for Recovering Links Between Bug Reports and Fixes. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, FSE ’12, pages 63:1–63:11, New York, NY, USA, 2012. ACM.
[32]
F. Rahman and P. Devanbu. How, and why, process metrics are better. In Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pages 432–441, Piscataway, NJ, USA, 2013.
[33]
IEEE Press.
[34]
F. Rahman, D. Posnett, and P. Devanbu. Recalling the ”imprecision” of cross-project defect prediction. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, FSE ’12, pages 61:1–61:11, New York, NY, USA, 2012. ACM.
[35]
F. Rahman, D. Posnett, A. Hindle, E. Barr, and P. Devanbu. BugCache for Inspections: Hit or Miss? In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, ESEC/FSE ’11, pages 322–331, New York, NY, USA, 2011. ACM.
[36]
J. Romano, J. D. Kromrey, J. Coraggio, and J. Skowronek. Appropriate statistics for ordinal level data: Should we really be using t-test and cohen’s d for evaluating group differences on the nsse and other surveys. In annual meeting of the Florida Association of Institutional Research, pages 1–33, 2006.
[37]
Y. Shin, A. Meneely, L. Williams, and J. Osborne. Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities. IEEE Transactions on Software Engineering, 37(6):772–787, Nov. 2011.
[38]
J. Sliwerski, T. Zimmermann, and A. Zeller. When do changes induce fixes? In Proceedings of the 2005 International Workshop on Mining Software Repositories, MSR 2005, Saint Louis, Missouri, USA, May 17, 2005. ACM, 2005.
[39]
R. Wu, H. Zhang, S. Kim, and S.-C. Cheung. ReLink: Recovering Links Between Bugs and Changes. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, ESEC/FSE ’11, pages 15–25, New York, NY, USA, 2011. ACM.
[40]
Y. Yang, Y. Zhou, H. Lu, L. Chen, Z. Chen, B. Xu, H. Leung, and Z. Zhang. Are Slice-Based Cohesion Metrics Actually Useful in Effort-Aware Post-Release Fault-Proneness Prediction? An Empirical Study. IEEE Transactions on Software Engineering, 41(4):331–357, Apr. 2015.
[41]
Z. Yin, D. Yuan, Y. Zhou, S. Pasupathy, and L. Bairavasundaram. How Do Fixes Become Bugs? In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, ESEC/FSE ’11, pages 26–36, New York, NY, USA, 2011. ACM.
[42]
S. Zhong, T. M. Khoshgoftaar, and N. Seliya. Unsupervised learning for expert-based software quality estimation. In Proceedings of the Eighth IEEE International Conference on High Assurance Systems Engineering, HASE’04, pages 149–155, Washington, DC, USA, 2004. IEEE Computer Society.
[43]
Y. Zhou, B. Xu, H. Leung, and L. Chen. An in-depth study of the potentially confounding effect of class size in fault prediction. ACM Trans. Softw. Eng. Methodol., 23(1):10:1–10:51, Feb. 2014.
[44]
T. Zimmermann, N. Nagappan, H. Gall, E. Giger, and B. Murphy. Cross-project Defect Prediction: A Large Scale Experiment on Data vs. Domain vs. Process. In Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, ESEC/FSE ’09, pages 91–100, New York, NY, USA, 2009. ACM.
[45]
T. Zimmermann, R. Premraj, and A. Zeller. Predicting defects for eclipse. In Proceedings of the Third International Workshop on Predictor Models in Software Engineering, PROMISE ’07, pages 9–, Washington, DC, USA, 2007. IEEE Computer Society.

Cited By

View all
  • (2025)Line-Level Defect Prediction by Capturing Code Contexts With Graph Convolutional NetworksIEEE Transactions on Software Engineering10.1109/TSE.2024.350372351:1(172-191)Online publication date: Jan-2025
  • (2025)Multi-view learning based on product and process metrics for software defect predictionApplied Intelligence10.1007/s10489-025-06288-655:6Online publication date: 4-Feb-2025
  • (2024)Effort-Aware Fault-Proneness Prediction Using Non-API-Based Package-Modularization MetricsMathematics10.3390/math1214220112:14(2201)Online publication date: 13-Jul-2024
  • Show More Cited By

Index Terms

  1. Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    FSE 2016: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering
    November 2016
    1156 pages
    ISBN:9781450342186
    DOI:10.1145/2950290
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 November 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Defect
    2. changes
    3. effort-aware
    4. just-in-time
    5. prediction

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    FSE'16
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 17 of 128 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)58
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Line-Level Defect Prediction by Capturing Code Contexts With Graph Convolutional NetworksIEEE Transactions on Software Engineering10.1109/TSE.2024.350372351:1(172-191)Online publication date: Jan-2025
    • (2025)Multi-view learning based on product and process metrics for software defect predictionApplied Intelligence10.1007/s10489-025-06288-655:6Online publication date: 4-Feb-2025
    • (2024)Effort-Aware Fault-Proneness Prediction Using Non-API-Based Package-Modularization MetricsMathematics10.3390/math1214220112:14(2201)Online publication date: 13-Jul-2024
    • (2024)An Extensive Comparison of Static Application Security Testing ToolsProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661199(69-78)Online publication date: 18-Jun-2024
    • (2024)Mining Action Rules for Defect Reduction PlanningProceedings of the ACM on Software Engineering10.1145/36608091:FSE(2309-2331)Online publication date: 12-Jul-2024
    • (2024)An Empirical Study on Just-in-time Conformal Defect PredictionProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644928(88-99)Online publication date: 15-Apr-2024
    • (2024)Estimating Uncertainty in Labeled Changes by SZZ Tools on Just-In-Time Defect PredictionACM Transactions on Software Engineering and Methodology10.1145/363722633:4(1-25)Online publication date: 18-Apr-2024
    • (2024)Just-In-Time TODO-Missed Commits DetectionIEEE Transactions on Software Engineering10.1109/TSE.2024.340500550:11(2732-2752)Online publication date: Nov-2024
    • (2024)TWAO: Time-Weight-Aware Oversampling Method for Just-in-Time Software Defect Prediction2024 IEEE 24th International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS62785.2024.00040(328-339)Online publication date: 1-Jul-2024
    • (2024)Just-in-Time Software Defect Prediction Techniques: A Survey2024 15th International Conference on Information and Communication Systems (ICICS)10.1109/ICICS63486.2024.10638276(1-6)Online publication date: 13-Aug-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media