Abstract
We used several machine learning algorithms to predict the defective modules in five NASA products, namely, CM1, JM1, KC1, KC2, and PC1. A set of static measures were employed as predictor variables. While doing so, we observed that a large portion of the modules were small, as measured by lines of code (LOC). When we experimented on the data subsets created by partitioning according to module size, we obtained higher prediction performance for the subsets that include larger modules. We also performed defect prediction using class-level data for KC1 rather than the method-level data. In this case, the use of class-level data resulted in improved prediction performance compared to using method-level data. These findings suggest that quality assurance activities can be guided even better if defect prediction is performed by using data that belong to larger modules.
- Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. Classification and Regression Trees. Wadsworth & Brooks, 1984.Google Scholar
- Khaled El Emam, Saïda Benlarbi, Nishith Goel, and Shesh N. Rai. The confounding effect of class size on the validity of object-oriented metrics. IEEE Trans. on Software Engineering, 27(7):630--650, July 2001. Google ScholarDigital Library
- Sallie Henry and Dennis Kafura. Software structure metrics based on information flow. IEEE Trans. on Software Engineering, 7(5):510--518, September 1981.Google ScholarDigital Library
- Taghi M. Khoshgoftaar, Edward B. Allen, Kalai S. Kalaichelvan, and Nishith Goel. Early quality prediction: A case study in telecommunications. IEEE Software, 13(1):65--71, January 1996. Google ScholarDigital Library
- Taghi M. Khoshgoftaar, Abhijit S. Pandya, and David L. Lanning. Application of neural networks for predicting program faults. Annals of Software Engineering, 1:141--154, 1995.Google ScholarCross Ref
- Tim Menzies, Justin S. Di Stefano, Chris Cunanan, and Robert (Mike) Chapman. Mining repositories to assist in project planning and resource allocation. In International Workshop on Mining Software Repositories, May 2004.Google ScholarCross Ref
- Allen P. Nikora and John C. Munson. The effects of fault counting methods on fault model quality. In COMPSAC '04: The 28th International Computer Software and Application Conference, pages 192--201. IEEE Press, September 2004. Google ScholarDigital Library
- Martin Shepperd and Darrel Ince. Derivation and Validation of Software Metrics. Clarendon Press - Oxford, Oxford University Press, Walton Street, Oxford OX2 6DP, 1993. Google ScholarDigital Library
- Jeff Tian, Anthony Nguyen, Curt Allen, and Ravi Appan. Experience with identifying and characterizing problem prone modules in telecommunication software systems. Journal of Systems and Software, 57(3):207--215, July 2001. Google ScholarDigital Library
- Ian H. Witten and Eibe Frank. Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco, 2000. Google ScholarDigital Library
Index Terms
- An investigation of the effect of module size on defect prediction using static measures
Recommendations
An investigation of the effect of module size on defect prediction using static measures
PROMISE '05: Proceedings of the 2005 workshop on Predictor models in software engineeringWe used several machine learning algorithms to predict the defective modules in five NASA products, namely, CM1, JM1, KC1, KC2, and PC1. A set of static measures were employed as predictor variables. While doing so, we observed that a large portion of ...
Heterogeneous defect prediction
ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software EngineeringSoftware defect prediction is one of the most active research areas in software engineering. We can build a prediction model with defect data collected from a software project and predict defects in the same project, i.e. within-project defect ...
An empirical study on software defect prediction with a simplified metric set
ContextSoftware defect prediction plays a crucial role in estimating the most defect-prone components of software, and a large number of studies have pursued improving prediction accuracy within a project or across projects. However, the rules for ...
Comments