skip to main content
10.1145/2020390.2020406acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Customization support for CBR-based defect prediction

Published:20 September 2011Publication History

ABSTRACT

Background: The prediction performance of a case-based reasoning (CBR) model is influenced by the combination of the following parameters: (i) similarity function, (ii) number of nearest neighbor cases, (iii) weighting technique used for attributes, and (iv) solution algorithm. Each combination of the above parameters is considered as an instantiation of the general CBR-based prediction method. The selection of an instantiation for a new data set with specific characteristics (such as size, defect density and language) is called customization of the general CBR method.

Aims: For the purpose of defect prediction, we approach the question which combinations of parameters works best at which situation. Three more specific questions were studied:

(RQ1) Does one size fit all? Is one instantiation always the best?

(RQ2) If not, which individual and combined parameter settings occur most frequently in generating the best prediction results?

(RQ3) Are there context-specific rules to support the customization?

Method: In total, 120 different CBR instantiations were created and applied to 11 data sets from the PROMISE repository. Predictions were evaluated in terms of their mean magnitude of relative error (MMRE) and percentage Pred(α) of objects fulfilling a prediction quality level α. For the third research question, dependency network analysis was performed.

Results: Most frequent parameter options for CBR instantiations were neural network based sensitivity analysis (as the weighting technique), un-weighted average (as the solution algorithm), and maximum number of nearest neighbors (as the number of nearest neighbors). Using dependency network analysis, a set of recommendations for customization was provided.

Conclusion: An approach to support customization is provided. It was confirmed that application of context-specific rules across groups of similar data sets is risky and produces poor results.

References

  1. Aamodt, A. and Plaza, E. 1994. Case-Based reasoning: foundational issues: methodological variations, and system approaches. Artificial Intelligence Communications, vol. 7 (1), pp. 39--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bartsch-Spoerl, B. 1995. Toward the integration of case-based, schema-based, and model-based reasoning for supporting complex design tasks. In Proceeding of the 1 st International Conference on Case-based Reasoning, pp. 145--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Brady, A. and Menzies, T. 2010. Case-based reasoning vs parametric models for software quality optimization. In Proceedings of the 6 th International Conference on Predictive Models in Software Engineering, pp. 3:1--3:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Catal, C. and Diri, B. 2009. A systematic review of software fault prediction studies. Expert Systems with Applications, vol. 36 (4), pp. 7346--7354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Conte, S. D., Dunsmore, H., and Shen, V. Y. 1986. Software engineering metrics and models, Benjamin-Cummings Publishing Co. Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. El Emam, K., Benlarbi, S., Goel, N., and Rai, S. N. 2001. Comparing case-based reasoning classifiers for predicting high risk software components. The Journal of Systems and Software, vol. 55, pp. 301--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Foss, T., Stensrud, E., Kitchenham, B., and Myrtveit, I. 2003. A simulation study of the model evaluation criterion MMRE. IEEE Transactions on Software Engineering, vol. 29 (11), pp. 985--995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ganesan, K., Khoshgoftaar, T. M., and Allen, E. B. 2000. Case-based software quality prediction. International Journal of Software Engineering and Knowledge Engineering, vol. 10(2), pp. 139--152.Google ScholarGoogle ScholarCross RefCross Ref
  9. Heckerman, D., Chickering, D. M., Meek, C., Rounthwaite, R., Kadie, C. 2000. Dependency networks for inference collaborative, filtering, and data visualization. Journal of Machine Learning Research, vol. 1, pp. 49--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Idri, A., Abran, A., and Khoshgoftaar, T. M. 2002. Estimating software project effort by analogy based on linguistic values. In Proceeding of the 8 th International Software Metrics Symposium, pp. 21--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Khoshgoftaar, T. M., Allen E. B., and Busboom, J. C. 2000. Modeling software quality: the software measurement analysis and reliability toolkit. In Proceeding of the 12 th IEEE International Conference on Tools with Artificial Intelligence, pp. 54--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Khoshgoftaar, T. M., Ganesan, K., Allen, E. B., Ross, F. D., Munikoti, R., Goel, N., and Nandi, A. 1997. Predicting fault-prone modules with case-based reasoning," In Proceeding of the 8 th International Symposium on Software Reliability Engineering, pp. 27--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Khoshgoftaar, T. M., Seliya, N., and Sundaresh, N. 2006. An empirical study of predicting software faults with case-based reasoning. Software Quality Journal, vol. 14, pp. 85--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kohavi, R. and Provost, F. 1998. Glossary of terms. Machine Learning, vol. 30(12), pp. 271--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kotssiantis, S. and Kanellopoulos, D. 2006. Discretization techniques: A recent survey. GESTS International Transactions on Computer Science and Engineering, vol. 32(1), pp. 47--58.Google ScholarGoogle Scholar
  16. Larose, D. T. 2005. Discovering knowledge in data; an introduction to data mining. John Wiley & Sons, New Jersey, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Li, J. and Ruhe, G. 2008. Analysis of attribute weighting heuristics for analogy-based software effort estimation method AQUA+. Empirical Software Engineering, vol. 13(1), pp. 63--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Li, J. and Ruhe, G. 2008. Software effort estimation by analogy using attributes selection based on rough set analysis. International Journal of Software Engineering and Knowledge Engineering, vol. 18 (1), pp. 1--23.Google ScholarGoogle ScholarCross RefCross Ref
  19. Liu, Y., Khoshgoftaar, T. M., and Seliya, N. 2010. Evolutionary optimization of software quality modeling with multiple repositories. IEEE Transactions on Software Engineering, vol. 36 (6), pp. 852--864. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Menzies, T., Jalali, O., Hihn, J., Baker, D., and Lum, K. 2010. Stable rankings for different effort models. Automated Software Engineering, vol. 17(4), pp. 409--437. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Metrics Data Program, NASA Independent verification and validation facility. http://mdp.ivv.nasa.gov. Last access on 04/05/2011.Google ScholarGoogle Scholar
  22. myCBR an open-source case-based reasoning tool developed at DFKI. http://mycbr-project.net/index.html. Last access on 04/05/2011.Google ScholarGoogle Scholar
  23. Paikari, E., Richter, M. M., and Ruhe, G. 2010. A comparative study of attribute weighting techniques for software defect prediction using case-based reasoning. In Proceeding of the 22 nd International Conference on Software Engineering and Knowledge Engineering, pp. 380--386.Google ScholarGoogle Scholar
  24. Ramamoorthy, C. V., Chandra, C., Ishihara, S., and Ng, Y. 1993. Knowledge-based Tools for Risk Assessment in Software Development and Reuse. In Proceedings of 5 th International Conference on Tools with Artificial Intelligence, pp. 364--371.Google ScholarGoogle Scholar
  25. Sayyad, S. J. and Menzies, T. J. 2005. PROMISE Repository of Software Engineering Databases. School of Information Technology and Engineering, University of Ottawa, Canada. http://promise.site.uottawa.ca/SERepository. Last access on 04/05/2011.Google ScholarGoogle Scholar
  26. Song, Q., Jia, Z., Shepperd, M., Ying, S., and Liu, J., 2011. A general software defect-proneness prediction framework. IEEE Transactions on Software Engineering, vol. 37 (3), pp. 356--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Turhan, B., Bener, A., and Menzies, T. 2010. Regularities in learning defect predictors. In Proceeding of the 11 th International Conference on Product Focused Software, pp. 116--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. WinMine Toolkit, Machine Learning and Applied Statistics Group, Microsoft Research, http://research.microsoft.com/~dmax/winmine/tooldoc.htm. Last access on 04/05/2011.Google ScholarGoogle Scholar
  29. Witten, I. H., and Frank, E. 2005. Data mining: practical machine learning tools and techniques. 2nd Edition. Morgan Kaufmann, San Francisco. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Zhang, H., Nelson, A., and Menzies, T. 2010. On the value of learning from defect dense components for software defect prediction. In Proceedings of the 6 th International Conference on Predictive Models in Software Engineering, pp. 14:1--14:9. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Customization support for CBR-based defect prediction

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            Promise '11: Proceedings of the 7th International Conference on Predictive Models in Software Engineering
            September 2011
            145 pages
            ISBN:9781450307093
            DOI:10.1145/2020390

            Copyright © 2011 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 20 September 2011

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate64of125submissions,51%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader