research-article

Customization support for CBR-based defect prediction

Authors:
Elham Paikari

University of Calgary, Calgary, AB, Canada

University of Calgary, Calgary, AB, Canada
View Profile

,
Bo Sun

University of Calgary, Calgary, AB, Canada

University of Calgary, Calgary, AB, Canada
View Profile

,
Guenther Ruhe

University of Calgary, Calgary, AB, Canada

University of Calgary, Calgary, AB, Canada
View Profile

,
Emadoddin Livani

University of Calgary, Calgary, AB, Canada

University of Calgary, Calgary, AB, Canada
View Profile

Promise '11: Proceedings of the 7th International Conference on Predictive Models in Software EngineeringSeptember 2011Article No.: 16Pages 1–10https://doi.org/10.1145/2020390.2020406

Published:20 September 2011Publication History

Promise '11: Proceedings of the 7th International Conference on Predictive Models in Software Engineering

Pages 1–10

ABSTRACT

Background: The prediction performance of a case-based reasoning (CBR) model is influenced by the combination of the following parameters: (i) similarity function, (ii) number of nearest neighbor cases, (iii) weighting technique used for attributes, and (iv) solution algorithm. Each combination of the above parameters is considered as an instantiation of the general CBR-based prediction method. The selection of an instantiation for a new data set with specific characteristics (such as size, defect density and language) is called customization of the general CBR method.

Aims: For the purpose of defect prediction, we approach the question which combinations of parameters works best at which situation. Three more specific questions were studied:

(RQ1) Does one size fit all? Is one instantiation always the best?

(RQ2) If not, which individual and combined parameter settings occur most frequently in generating the best prediction results?

(RQ3) Are there context-specific rules to support the customization?

Method: In total, 120 different CBR instantiations were created and applied to 11 data sets from the PROMISE repository. Predictions were evaluated in terms of their mean magnitude of relative error (MMRE) and percentage Pred(α) of objects fulfilling a prediction quality level α. For the third research question, dependency network analysis was performed.

Results: Most frequent parameter options for CBR instantiations were neural network based sensitivity analysis (as the weighting technique), un-weighted average (as the solution algorithm), and maximum number of nearest neighbors (as the number of nearest neighbors). Using dependency network analysis, a set of recommendations for customization was provided.

Conclusion: An approach to support customization is provided. It was confirmed that application of context-specific rules across groups of similar data sets is risky and produces poor results.

References

Aamodt, A. and Plaza, E. 1994. Case-Based reasoning: foundational issues: methodological variations, and system approaches. Artificial Intelligence Communications, vol. 7 (1), pp. 39--52. Google ScholarDigital Library
Bartsch-Spoerl, B. 1995. Toward the integration of case-based, schema-based, and model-based reasoning for supporting complex design tasks. In Proceeding of the 1 ^st International Conference on Case-based Reasoning, pp. 145--156. Google ScholarDigital Library
Brady, A. and Menzies, T. 2010. Case-based reasoning vs parametric models for software quality optimization. In Proceedings of the 6 ^th International Conference on Predictive Models in Software Engineering, pp. 3:1--3:10. Google ScholarDigital Library
Catal, C. and Diri, B. 2009. A systematic review of software fault prediction studies. Expert Systems with Applications, vol. 36 (4), pp. 7346--7354. Google ScholarDigital Library
Conte, S. D., Dunsmore, H., and Shen, V. Y. 1986. Software engineering metrics and models, Benjamin-Cummings Publishing Co. Inc. Google ScholarDigital Library
El Emam, K., Benlarbi, S., Goel, N., and Rai, S. N. 2001. Comparing case-based reasoning classifiers for predicting high risk software components. The Journal of Systems and Software, vol. 55, pp. 301--320. Google ScholarDigital Library
Foss, T., Stensrud, E., Kitchenham, B., and Myrtveit, I. 2003. A simulation study of the model evaluation criterion MMRE. IEEE Transactions on Software Engineering, vol. 29 (11), pp. 985--995. Google ScholarDigital Library
Ganesan, K., Khoshgoftaar, T. M., and Allen, E. B. 2000. Case-based software quality prediction. International Journal of Software Engineering and Knowledge Engineering, vol. 10(2), pp. 139--152.Google ScholarCross Ref
Heckerman, D., Chickering, D. M., Meek, C., Rounthwaite, R., Kadie, C. 2000. Dependency networks for inference collaborative, filtering, and data visualization. Journal of Machine Learning Research, vol. 1, pp. 49--75. Google ScholarDigital Library
Idri, A., Abran, A., and Khoshgoftaar, T. M. 2002. Estimating software project effort by analogy based on linguistic values. In Proceeding of the 8 ^th International Software Metrics Symposium, pp. 21--30. Google ScholarDigital Library
Khoshgoftaar, T. M., Allen E. B., and Busboom, J. C. 2000. Modeling software quality: the software measurement analysis and reliability toolkit. In Proceeding of the 12 ^th IEEE International Conference on Tools with Artificial Intelligence, pp. 54--61. Google ScholarDigital Library
Khoshgoftaar, T. M., Ganesan, K., Allen, E. B., Ross, F. D., Munikoti, R., Goel, N., and Nandi, A. 1997. Predicting fault-prone modules with case-based reasoning," In Proceeding of the 8 ^th International Symposium on Software Reliability Engineering, pp. 27--35. Google ScholarDigital Library
Khoshgoftaar, T. M., Seliya, N., and Sundaresh, N. 2006. An empirical study of predicting software faults with case-based reasoning. Software Quality Journal, vol. 14, pp. 85--111. Google ScholarDigital Library
Kohavi, R. and Provost, F. 1998. Glossary of terms. Machine Learning, vol. 30(12), pp. 271--274. Google ScholarDigital Library
Kotssiantis, S. and Kanellopoulos, D. 2006. Discretization techniques: A recent survey. GESTS International Transactions on Computer Science and Engineering, vol. 32(1), pp. 47--58.Google Scholar
Larose, D. T. 2005. Discovering knowledge in data; an introduction to data mining. John Wiley & Sons, New Jersey, USA. Google ScholarDigital Library
Li, J. and Ruhe, G. 2008. Analysis of attribute weighting heuristics for analogy-based software effort estimation method AQUA+. Empirical Software Engineering, vol. 13(1), pp. 63--96. Google ScholarDigital Library
Li, J. and Ruhe, G. 2008. Software effort estimation by analogy using attributes selection based on rough set analysis. International Journal of Software Engineering and Knowledge Engineering, vol. 18 (1), pp. 1--23.Google ScholarCross Ref
Liu, Y., Khoshgoftaar, T. M., and Seliya, N. 2010. Evolutionary optimization of software quality modeling with multiple repositories. IEEE Transactions on Software Engineering, vol. 36 (6), pp. 852--864. Google ScholarDigital Library
Menzies, T., Jalali, O., Hihn, J., Baker, D., and Lum, K. 2010. Stable rankings for different effort models. Automated Software Engineering, vol. 17(4), pp. 409--437. Google ScholarDigital Library
Metrics Data Program, NASA Independent verification and validation facility. http://mdp.ivv.nasa.gov. Last access on 04/05/2011.Google Scholar
myCBR an open-source case-based reasoning tool developed at DFKI. http://mycbr-project.net/index.html. Last access on 04/05/2011.Google Scholar
Paikari, E., Richter, M. M., and Ruhe, G. 2010. A comparative study of attribute weighting techniques for software defect prediction using case-based reasoning. In Proceeding of the 22 ^nd International Conference on Software Engineering and Knowledge Engineering, pp. 380--386.Google Scholar
Ramamoorthy, C. V., Chandra, C., Ishihara, S., and Ng, Y. 1993. Knowledge-based Tools for Risk Assessment in Software Development and Reuse. In Proceedings of 5 ^th International Conference on Tools with Artificial Intelligence, pp. 364--371.Google Scholar
Sayyad, S. J. and Menzies, T. J. 2005. PROMISE Repository of Software Engineering Databases. School of Information Technology and Engineering, University of Ottawa, Canada. http://promise.site.uottawa.ca/SERepository. Last access on 04/05/2011.Google Scholar
Song, Q., Jia, Z., Shepperd, M., Ying, S., and Liu, J., 2011. A general software defect-proneness prediction framework. IEEE Transactions on Software Engineering, vol. 37 (3), pp. 356--370. Google ScholarDigital Library
Turhan, B., Bener, A., and Menzies, T. 2010. Regularities in learning defect predictors. In Proceeding of the 11 ^th International Conference on Product Focused Software, pp. 116--130. Google ScholarDigital Library
WinMine Toolkit, Machine Learning and Applied Statistics Group, Microsoft Research, http://research.microsoft.com/~dmax/winmine/tooldoc.htm. Last access on 04/05/2011.Google Scholar
Witten, I. H., and Frank, E. 2005. Data mining: practical machine learning tools and techniques. 2nd Edition. Morgan Kaufmann, San Francisco. Google ScholarDigital Library
Zhang, H., Nelson, A., and Menzies, T. 2010. On the value of learning from defect dense components for software defect prediction. In Proceedings of the 6 ^th International Conference on Predictive Models in Software Engineering, pp. 14:1--14:9. Google ScholarDigital Library

Index Terms

Customization support for CBR-based defect prediction

Recommendations

How Far We Have Progressed in the Journey? An Examination of Cross-Project Defect Prediction

Background. Recent years have seen an increasing interest in cross-project defect prediction (CPDP), which aims to apply defect prediction models built on source projects to a target project. Currently, a variety of (complex) CPDP models have been ...
Read More
Heterogeneous defect prediction
ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

Software defect prediction is one of the most active research areas in software engineering. We can build a prediction model with defect data collected from a software project and predict defects in the same project, i.e. within-project defect ...
Read More
Cross-project smell-based defect prediction
Abstract
Defect prediction is a technique introduced to optimize the testing phase of the software development pipeline by predicting which components in the software may contain defects. Its methodology trains a classifier with data regarding a set of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
Promise '11: Proceedings of the 7th International Conference on Predictive Models in Software Engineering
September 2011
145 pages
ISBN:9781450307093
DOI:10.1145/2020390
General Chair:
Tim Menzies
WVU
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 September 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
case-based reasoning
customization
defect prediction
dependency network analysis
instantiation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate64of125submissions,51%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 206
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Customization support for CBR-based defect prediction

Promise '11: Proceedings of the 7th International Conference on Predictive Models in Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

How Far We Have Progressed in the Journey? An Examination of Cross-Project Defect Prediction

Heterogeneous defect prediction

Cross-project smell-based defect prediction