Abstract
As the application layer in embedded systems dominates over the hardware, ensuring software quality becomes a real challenge. Software testing is the most time-consuming and costly project phase, specifically in the embedded software domain. Misclassifying a safe code as defective increases the cost of projects, and hence leads to low margins. In this research, we present a defect prediction model based on an ensemble of classifiers. We have collaborated with an industrial partner from the embedded systems domain. We use our generic defect prediction models with data coming from embedded projects. The embedded systems domain is similar to mission critical software so that the goal is to catch as many defects as possible. Therefore, the expectation from a predictor is to get very high probability of detection (pd). On the other hand, most embedded systems in practice are commercial products, and companies would like to lower their costs to remain competitive in their market by keeping their false alarm (pf) rates as low as possible and improving their precision rates. In our experiments, we used data collected from our industry partners as well as publicly available data. Our results reveal that ensemble of classifiers significantly decreases pf down to 15% while increasing precision by 43% and hence, keeping balance rates at 74%. The cost-benefit analysis of the proposed model shows that it is enough to inspect 23% of the code on local datasets to detect around 70% of defects.
Similar content being viewed by others
References
Adrian, R. W., Branstad, A. M., & Cherniavsky, C. J. (1982). Validation, verification and testing of computer software. ACM Computing Surveys, 14(22), 159–192.
Alpaydın, E. (2004). Introduction to machine learning. Cambridge: The MIT Press.
Amasaki, S., Takagi, Y., Mizuno, O., & Kikuno, T. (2005). Constructing a Bayesian belief network to predict final quality in embedded system development. IEICE Transactions on Information and Systems, 134, 1134–1141.
Arisholm, E., & Briand, L. C. (2006). Predicting fault-prone components in a java legacy system. In ISESE ’06: Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering (pp. 8–17). ACM.
Basili, V. R., Briand, L. C., & Melo, W. L. (1996). A validation of object-oriented design metrics as quality indicators. IEEE Transactions on Software Engineering. IEEE Press, 22, 751–761
Basili, V. R., McGarry, F. E., Pajerski, R., & Zelkowitz, M. V. (2002). Lessons learned from 25 years of process improvement: The rise and fall of the NASA software engineering laboratory. In ICSE ’02: proceedings of the 24th international conference on software engineering (pp. 69–79). ACM.
Biffl, S., Halling, M., & Kszegi, S. (2003). Investigating the accuracy of defect estimation models for individuals and teams based on inspection data. In ISESE ’03: Proceedings of the 2003 international symposium on empirical software engineering (p. 232). IEEE Computer Society.
Boetticher, G., Menzies, T., & Ostrand, T. J. (2007). The PROMISE repository of empirical software engineering data West Virginia University, Lane Department of Computer Science and Electrical Engineering.
Brooks, F. P. (1995). The mythical man-month: Essays on software engineering. Reading: Anniversary Edition Addison-Wesley
Demiroz, G., & Guvenir, H. A. (1997). Classification by voting feature intervals. In ECML ’97: Proceedings of the 9th European conference on machine learning (pp. 85–92). Springer.
Fagan, M. (1976). Design and code inspections to reduce errors in program development. IBM Systems Journal, 15, 182–211.
Fenton, N., Neil, M., Marsh, W., Hearty, P., Marquez, D., Krause, P. & Mishra, R. (2007). Predicting software defects in varying development lifecycles using Bayesian nets Information and Software Technology. Butterworth-Heinemann, 49, 32–43.
Fenton, N., & Neil, M. (1999). A critique of software defect prediction models. IEEE Transactions on Software Engineering, 25(5), 675–689
Hall, M. A., & Holmes, G. (2003). Benchmarking attribute selection techniques for discrete class data mining IEEE transactions on knowledge and data engineering. IEEE Educational Activities Department, 15, 1437–1447.
Heeger, D. (1998). Signal detection theory.
IEEE Glossary of Software Engineering Terminology. (1990). ANSI/IEEE Standard 610.12 IEEE, New York.
Jiang, Y., Cukic, B., & Menzies, T. (2008). Can data transformation help in the detection of fault-prone modules? In DEFECTS ’08: Proceedings of the 2008 workshop on defects in large software systems (pp. 16–20). ACM, New York.
Kocaguneli, E., Tosun, A., Bener, A., Turhan, B., & Caglayan, B. (2009). Prest: An intelligent software metrics extraction. In Analysis and Defect Prediction Tool, SEKE’09: Proceedings of the 21st international conference on software engineering & knowledge engineering (SEKE’2009) (pp. 526–529). Boston, MA, USA, July 1–3.
Lee, E. A. (2002). Embedded software, advances in computers 56. London: Academic Press.
Kan, S. H. (2002). Metrics and models in software quality engineering. Reading: Addison-Wesley.
Khoshgoftaar, T. M., & Szabo, R. M. (1996). Using neural networks to predict software faults during testing. IEEE Transactions on Reliability, 45, 456–462.
Khoshgoftaar, T. M., & Allen, E. B. (1999). Predicting fault-prone software modules in embedded systems with classification trees. In HASE ’99: The 4th IEEE international symposium on high-assurance systems engineering (p. 105). IEEE Computer Society.
Khoshgoftaar, T. M., & Seliya, N. (2003). Fault prediction modeling for software quality estimation: Comparing commonly used techniques. Empirical Software Engineering, 8 255–283.
Khoshgoftaar, T., & Seliya, N. (2004). The necessity of assuring quality in software measurement data. In METRICS ’04: Proceedings of the software metrics, 10th international symposium (pp. 119–130). IEEE Computer Society, Washington, DC, USA.
Khoshgoftaar, T., Zhong, S., & Joshi, V. (2005). Enhancing software quality estimation using ensemble-classifier based noise filtering. Intelligent Data Analysis, 9, 3–27
Khoshgoftaar, T. M., & Gao, K. (2006). Assessment of a multi-strategy classifier for an embedded software system. In ICTAI ’06: Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence (pp. 651–658). IEEE Computer Society.
Kittler, J., Hatef, M., Duin, R. P. W., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239.
Kocaguneli, E., Tosun, A., Bener, A., Turhan, B., Caglayan, B. (2009). Prest: An intelligent software metrics extraction. In Analysis and Defect Prediction Tool, SEKE 2009: Proceedings of the 21st international conference on software engineering and knowledge engineering (pp. 637–642).
Koru, A. G., Zhang, D., El Emam, K., & Liu, H. (2009). An investigation into the functional form of the size-defect relationship for software modules. IEEE Transactions on Software Engineering, 35(2), 293–304
Kuncheva, L. I. (2004). Combining pattern classifiers: Methods and algorithms. Hoboken: Wiley-Interscience.
Lessmann, S., Baesens, B., Mues, C., & Pietsch, S. (2008). Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Transactions on Software Engineering, 34(4), 1–12.
Libralon, G., Carvalho, A., & Lorena, A. (2009). Ensembles of pre-processing techniques for noise detection in gene expression data. In Proceedings of the 15th international conference on advances in neuro-information processing (pp. 486–493). New Zealand.
Li, Q., & Yao, C. (2003). Real-time concepts for embedded systems. San Francisco: CMP Books.
Marchenko, A., & Abrahamsson, P. (2007). Predicting software defect density: A case study on automated static code analysis. In XP ’07: Proceedings of the International Conference on Agile Processes in Software Engineering and Extreme Programming (pp. 137–140). Springer.
Menzies, T., Greenwald, J., & Frank, A. (2007). Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, IEEE Computer Society, 32(11), 2–13
Menzies, T., Dekhtyar, A., Distefano, J., & Greenwald, J. (2007). Problems with Precision: A response to comments on data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(9), 637–640.
Munson, J. C., & Khoshgoftaar, T. M. (1992). The Detection of Fault-Prone Programs. IEEE Transactions on Software Engineering, IEEE Press, 18, 423–433.
Ohlsson, N., & Wohlin, C. (1998). Experiences of fault data in a large software system. Failure and Lessons Learned in Information Technology Management, 2, 163–171.
Oral, A. D., & Bener, A. (2007). Defect Prediction for Embedded Software. ISCIS ’07: Proceedings of the 22nd international symposium on computer and information sciences (pp. 1–6).
Ostrand, T. J., Weyuker E. J., & Bell, R. M. (2005). Predicting the location and number of faults in large software systems. IEEE Transactions on Software Engineering, 31(4), 340–355.
Padberg, F., Ragg, T., & Schoknecht, R. (2004). Using machine learning for estimating the defect content after an inspection. IEEE Transactions on Software Engineering, IEEE Press, 30, 17–28.
Rombach, D., & Seelisch, F. (2008). Formalisms in software engineering: Myths Vs. empirical facts, CEE-SET’07. Lecture Notes in Computer Science (LNCS), 5082, 18–25.
Runeson, P., Ohlsson, M. C., & Wohlin, C. (2001). A classification scheme for studies on fault-prone components. In PROFES ’01: Proceedings of the third international conference on product focused software process improvement (pp. 341–355). Springer, Berlin.
Shull, F., Boehm, V. B., Brown, A., Costa, P., Lindvall, M., Port, D., Rus, I., Tesoriero, R., & Zelkowitz, M. (2002). What we have learned about fighting defects. In Proceedings of the eighth international software metrics symposium (pp. 249–258).
Shull, F. J., Carver, J. C., Vegas, S., & Juristo, N. (2008). The role of replications in empirical software engineering. Empirical Software Engineering Journal, 13, 211–218.
Tosun, A., Turhan, B., & Bener, A. (2008). Ensemble of software defect predictors: A case study. In Proceedings of the 2nd international symposium on empirical software engineering and measurement (pp. 318–320).
Tosun, A., Turhan, B., & Bener, A. (2009). Practical Considerations in Deploying AI for defect prediction: A case study within the Turkish telecommunication industry. In PROMISE’09: Proceedings of the first international conference on predictor models in software engineering. Vancouver, Canada.
Twala, B., & Cartwright, M. (2010). Ensemble missing data techniques for software effort prediction. Intelligent Data Analysis, 14(3), 299–331.
Twala, B., Cartwright, M., & Shepperd, M. (2006). Ensemble of missing data techniques to improve software prediction accuracy. Proceedings of International Conference on Software Engineering (pp. 909–912).
Turhan, B., & Bener, A. (2008). Analysis of naive Bayes’ assumptions on software fault data: An empirical study. Data and Knowledge Engineering Journal, 68, 278–290.
Turhan, B., & Bener, A. (2007). Software defect prediction: Heuristics for weighted naive bayes. In Proceedings of the 2nd international conference on software and data technologies (ICSOFT’07) (pp. 244–249).
Turhan, B., Menzies, T., Bener, A., & Distefano, J. (2009). On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering Journal, 14(5), 540–578.
Verbaeten, S., & Assche, A. V. (2003). Ensemble methods for noise elimination in classification problems. In Proceedings of the 4th international conference on multiple classifier systems (pp. 317–325). UK.
Wohlin, C., Aurum, A., Petersson, H., Shull, F., & Ciolkowski, M. (2002). Software inspection benchmarking—A qualitative and quantitative comparative opportunity. In METRICS ’02: Proceedings of the 8th international symposium on software metrics (pp. 118–127). IEEE Computer Society.
Xu, W., Qin, Z., Ji, L., & Chang, Y. (2009). A feature weighted ensemble classifier on stream data. In Proceedings of international conference on computational intelligence and software engineering (pp. 1–5). China.
Zhang, H., & Zhang, X. (2007). Comments on data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(9), 635–637.
Acknowledgments
This research is supported in part by Turkish State Planning Organization (DPT) under project number 2007K120610.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mısırlı, A.T., Bener, A.B. & Turhan, B. An industrial case study of classifier ensembles for locating software defects. Software Qual J 19, 515–536 (2011). https://doi.org/10.1007/s11219-010-9128-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11219-010-9128-1