research-article

Understanding Thresholds of Software Features for Defect Prediction

Authors:
Geanderson Santos

Universidade Federal de Minas Gerais, Brazil

Universidade Federal de Minas Gerais, Brazil
View Profile

,
Adriano Veloso

Universidade Federal de Minas Gerais, Brazil

Universidade Federal de Minas Gerais, Brazil
View Profile

,
Eduardo Figueiredo

Universidade Federal de Minas Gerais, Brazil

Universidade Federal de Minas Gerais, Brazil
View Profile

SBES '22: Proceedings of the XXXVI Brazilian Symposium on Software EngineeringOctober 2022Pages 305–310https://doi.org/10.1145/3555228.3555269

Published:05 October 2022Publication History

SBES '22: Proceedings of the XXXVI Brazilian Symposium on Software Engineering

Pages 305–310

ABSTRACT

Software defect prediction is a subject of study involving the interplay of the software engineering and machine learning areas. The current literature proposed numerous machine learning models to predict software defects from software data, such as commits and code metrics. However, existing machine learning models are more valuable when we can understand the prediction. Otherwise, software developers cannot reason why a machine learning model made such predictions, generating many questions about the model’s applicability in software projects. As explainable machine learning models for the defect prediction problem remain a recent research topic, it leaves room for exploration. In this paper, we propose a preliminary analysis of an extensive dataset to predict software defects. The dataset includes 47,618 classes from 53 open-source projects and covers 66 software features related to numerous features of the code. Therefore, we offer contributions on explaining how each selected software feature favors the prediction of software defects in Java projects. Our initial results suggest that developers should keep the values of some specific software features small to avoid software defects. We hope our approach can guide more discussions about explainable machine learning for defect prediction and its impact on software development.

References

1990. IEEE Standard Glossary of Software Engineering Terminology. In IEEE Std 610.12-1990. https://doi.org/10.1109/IEEESTD.1990.101064Google ScholarCross Ref
D. Baehrens, T. Schroeter, S. Harmeling, M. Kawanabe, K. Hansen, and K. Müller. 2010. How to Explain Individual Classification Decisions. In Journal of Machine Learning Research (JMLR).Google Scholar
M. D’Ambros, M. Lanza, and R. Robbes. 2010. An extensive comparison of bug prediction approaches. In 7th IEEE Working Conference on Mining Software Repositories (MSR).Google Scholar
R. Ferenc, Z. Tóth, G. Ladányi, I. Siket, and T. Gyimóthy. 2018. A Public Unified Bug Dataset for Java. In Proceedings of the 14th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE).Google Scholar
R. Ferenc, Z. Tóth, G. Ladányi, I. Siket, and T. Gyimóthy. 2020. A public unified bug dataset for java and its assessment regarding metrics and bug prediction. In Software Quality Journal (SQJ).Google Scholar
R. Ferenc, Z. Tóth, G. Ladányi, I. Siket, and T. Gyimóthy. 2020. Unified Bug Dataset. https://doi.org/10.5281/zenodo.3693686Google ScholarCross Ref
F. Ferreira, G. Vale, J. P. Diniz, and Figueiredo. E.2021. Evaluating T-wise testing strategies in a community-wide dataset of configurable software systems. In Journal of Systems and Software (JSS).Google Scholar
M. Fowler. 1999. Refactoring: Improving the Design of Existing Code. Addison-Wesley.Google ScholarDigital Library
T. Fukushima, Y. Kamei, S. McIntosh, K. Yamashita, and N. Ubayashi. 2014. An empirical study of just-in-time defect prediction using cross-project models. In Working Conference on Mining Software Repositories (MSR).Google Scholar
B. Ghotra, S. McIntosh, and A. E. Hassan. 2015. Revisiting the Impact of Classification Techniques on the Performance of Defect Prediction Models. In International Conference on Software Engineering (ICSE).Google Scholar
T. Hall, M. Zhang, D. Bowes, and Y. Sun. 2014. Some Code Smells Have a Significant but Small Effect on Faults. In Transactions on Software Engineering and Methodology (TOSEM).Google Scholar
A. E Hassan. 2009. Predicting faults using the complexity of code changes. In International Conference of Software Engineering (ICSE).Google ScholarDigital Library
S. Herbold. 2015. CrossPare: A Tool for Benchmarking Cross-Project Defect Predictions. In International Conference on Automated Software Engineering Workshop (ASEW).Google Scholar
A. Hindle, D. M. German, M. W. Godfrey, and R. C. Holt. 2009. Automatic classification of large changes into maintenance categories. In International Conference on Program Comprehension (ICPC).Google Scholar
J. Jiarpakdee, C. Tantithamthavorn, H. K. Dam, and J. Grundy. 2020. An Empirical Study of Model-Agnostic Techniques for Defect Prediction Models. In Transactions on Software Engineering (TSE).Google Scholar
X. Jing, S. Ying, Z. Zhang, S. Wu, and J. Liu. 2014. Dictionary learning based software defect prediction. In International Conference of Software Engineering (ICSE).Google Scholar
M. Jureczko and Spinellis D. D.2010. Using Object-Oriented Design Metrics to Predict Software Defects. In Models and Methods of System Dependability (MMSD).Google Scholar
M. Jureczko and L. Madeyski. 2010. Towards Identifying Software Project Clusters with Regard to Defect Prediction. In Proceedings of the 6th International Conference on Predictive Models in Software Engineering (PROMISE).Google Scholar
S. Levin and A. Yehudai. 2017. Boosting Automatic Commit Classification Into Maintenance Activities By Utilizing Source Code Changes. In Proceedings of the 13rd International Conference on Predictor Models in Software Engineering (PROMISE).Google Scholar
Z. Lin, G. Ding, M. Hu, and J. Wang. 2014. Multi-Label Classification via Feature-Aware Implicit Label Space Encoding. In International Conference on International Conference on Machine Learning (ICML).Google ScholarDigital Library
S. M. Lundberg and S. Lee. 2017. A unified approach to interpreting model predictions. In Conference on Neural Information Processing Systems (NIPS).Google Scholar
T. Menzies, J. Greenwald, and A. Frank. 2007. Data mining static code attributes to learn defect predictors. In Transactions on Software Engineering (TSE).Google Scholar
T. Menzies, Z. Milton, B. Turhan, B. Cukic, Y. Jiang, and A. Bener. 2010. Defect prediction from static code features: current results, limitations, new approaches. In Automated Software Engineering (ASE).Google Scholar
T. Mori and N. Uchihira. 2018. Balancing the trade-off between accuracy and interpretability in software defect prediction. In Empirical Software Engineering (EMSE).Google Scholar
R. Moser, W. Pedrycz, and Succi. G.2008. A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In International Conference on Software Engineering (ICSE).Google Scholar
N. Nagappan and T. Ball. 2005. Use of relative code churn measures to predict system defect density. In International Conference on Software Engineering (ICSE).Google Scholar
N. Nagappan, T. Ball, and A. Zeller. 2006. Mining Metrics to Predict Component Failures. In International Conference on Software Engineering (ICSE).Google Scholar
J. Petrić, D. Bowes, T. Hall, B. Christianson, and N. Baddoo. 2016. The Jinx on the NASA Software Defect Data Sets. In International Conference on Evaluation and Assessment in Software Engineering (EASE).Google Scholar
C. Pornprasit, C. Tantithamthavorn, J. Jiarpakdee, M. Fu, and P. Thongtanunam. 2021. PyExplainer: Explaining the Predictions of Just-In-Time Defect Models. In International Conference on Automated Software Engineering (ASE).Google Scholar
G. Santos, E. Figueiredo, A. Veloso, M. Viggiato, and N. Ziviani. 2020. Predicting Software Defects with Explainable Machine Learning. In Brazilian Symposium on Software Quality (SBQS).Google Scholar
G. Santos, E. Figueiredo, A. Veloso, M. Viggiato, and N. Ziviani. 2020. Understanding machine learning software defect predictions. In Automated Software Engineering Journal (ASEJ).Google Scholar
G. Santos, A. Veloso, and E. Figueiredo. 2022. The Subtle Art of Digging for Defects: Analyzing Features for Defect Prediction in Java Projects. In International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE).Google Scholar
J. Sayyad S. and T.J. Menzies. 2005. The PROMISE Repository of Software Engineering Databases.http://promise.site.uottawa.ca/SERepositoryGoogle Scholar
C. Tantithamthavorn and A. E. Hassan. 2018. An Experience Report on Defect Modelling in Practice: Pitfalls and Challenges. In International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).Google Scholar
C. Tantithamthavorn, A. E. Hassan, and K. Matsumoto. 2019. The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models. In Transactions on Software Engineering (TSE).Google Scholar
C. Tantithamthavorn, S. McIntosh, A. E. Hassan, A. Ihara, and K. Matsumoto. 2015. The Impact of Mislabelling on the Performance and Interpretation of Defect Prediction Models. In International Conference on Software Engineering (ICSE).Google Scholar
C. Tantithamthavorn, S. McIntosh, A. E. Hassan, and K. Matsumoto. 2019. The Impact of Automated Parameter Optimization on Defect Prediction Models. In Transactions on Software Engineering (TSE).Google Scholar
Z. Tóth, P. Gyimesi, and R. Ferenc. 2016. A Public Bug Database of GitHub Projects and Its Application in Bug Prediction. In Computational Science and Its Applications (ICCSA).Google Scholar
B. Turhan, T. Menzies, A. B Bener, and J. Di Stefano. 2009. On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering (EMSE)(2009).Google Scholar
G. Vale, C. Hunsen, E. Figueiredo, and S. Apel. 2021. Challenges of Resolving Merge Conflicts: A Mining and Survey Study. In Transactions on Software Engineering (TSE).Google Scholar
E. Štrumbelj and I. Kononenko. 2014. Explaining Prediction Models and Individual Predictions with Feature Contributions. In Knowledge and Information Systems (KAIS).Google Scholar
S. Wang, T. Liu, and L. Tan. 2016. Automatically learning semantic features for defect prediction. In International Conference of Software Engineering (ICSE).Google Scholar
C. Wohlin, P. Runeson, M. Hst, M. C. Ohlsson, B. Regnell, and A. Wessln. 2012. Experimentation in Software Engineering. Springer.Google Scholar
Z. Xu, J. Liu, X. Luo, and T. Zhang. 2018. Cross-version defect prediction via hybrid active learning with kernel principal component analysis. In International Conference on Software Analysis, Evolution and Reengineering (SANER).Google Scholar
S. Yatish, J. Jiarpakdee, P. Thongtanunam, and C. Tantithamthavorn. 2019. Mining Software Defects: Should We Consider Affected Releases?. In International Conference on Software Engineering (ICSE).Google Scholar
T. Zimmermann, R. Premraj, and A. Zeller. 2007. Predicting Defects for Eclipse. In International Workshop on Predictor Models in Software Engineering (PROMISE).Google Scholar

Index Terms

Understanding Thresholds of Software Features for Defect Prediction
1. Computing methodologies
  1. Machine learning
    1. Cross-validation
2. Software and its engineering

Recommendations

Heterogeneous defect prediction
ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

Software defect prediction is one of the most active research areas in software engineering. We can build a prediction model with defect data collected from a software project and predict defects in the same project, i.e. within-project defect ...
Read More
Defect prediction on a legacy industrial software: a case study on software with few defects
CESI '16: Proceedings of the 4th International Workshop on Conducting Empirical Studies in Industry

Context: Building defect prediction models for software projects is helpful for reducing the effort in locating defects. In this paper, we share our experiences in building a defect prediction model for a large industrial software project. We extract ...
Read More
Cross-project smell-based defect prediction
Abstract
Defect prediction is a technique introduced to optimize the testing phase of the software development pipeline by predicting which components in the software may contain defects. Its methodology trains a classifier with data regarding a set of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SBES '22: Proceedings of the XXXVI Brazilian Symposium on Software Engineering
October 2022
457 pages
ISBN:9781450397353
DOI:10.1145/3555228
Editors:
Marcelo Maia
Universidade Federal de Uberlândia (UFU), Brazil
,
Fabiano Dorça
Universidade Federal de Uberlândia (UFU), Brazil
,
Rafael Araújo
Universidade Federal de Uberlândia (UFU), Brazil
,
Christina von Flach
Universidade Federal da Bahia (UFBA), Brazil
,
Elisa Yumi Nakagawa
Universidade de São Paulo (ICMC-USP), Brazil
,
Edna Dias Canedo
Universidade de Brasília (UnB), Brazil
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
defect prediction
explainable machine learning
software features for defect prediction
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate147of427submissions,34%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 65
  Total Downloads
- Downloads (Last 12 months)33
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Understanding Thresholds of Software Features for Defect Prediction

SBES '22: Proceedings of the XXXVI Brazilian Symposium on Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Heterogeneous defect prediction

Defect prediction on a legacy industrial software: a case study on software with few defects

Cross-project smell-based defect prediction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Understanding Thresholds of Software Features for Defect Prediction

SBES '22: Proceedings of the XXXVI Brazilian Symposium on Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Heterogeneous defect prediction

Defect prediction on a legacy industrial software: a case study on software with few defects

Cross-project smell-based defect prediction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media