Skip to main content
Log in

An industrial case study of classifier ensembles for locating software defects

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

As the application layer in embedded systems dominates over the hardware, ensuring software quality becomes a real challenge. Software testing is the most time-consuming and costly project phase, specifically in the embedded software domain. Misclassifying a safe code as defective increases the cost of projects, and hence leads to low margins. In this research, we present a defect prediction model based on an ensemble of classifiers. We have collaborated with an industrial partner from the embedded systems domain. We use our generic defect prediction models with data coming from embedded projects. The embedded systems domain is similar to mission critical software so that the goal is to catch as many defects as possible. Therefore, the expectation from a predictor is to get very high probability of detection (pd). On the other hand, most embedded systems in practice are commercial products, and companies would like to lower their costs to remain competitive in their market by keeping their false alarm (pf) rates as low as possible and improving their precision rates. In our experiments, we used data collected from our industry partners as well as publicly available data. Our results reveal that ensemble of classifiers significantly decreases pf down to 15% while increasing precision by 43% and hence, keeping balance rates at 74%. The cost-benefit analysis of the proposed model shows that it is enough to inspect 23% of the code on local datasets to detect around 70% of defects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Adrian, R. W., Branstad, A. M., & Cherniavsky, C. J. (1982). Validation, verification and testing of computer software. ACM Computing Surveys, 14(22), 159–192.

    Article  Google Scholar 

  • Alpaydın, E. (2004). Introduction to machine learning. Cambridge: The MIT Press.

    Google Scholar 

  • Amasaki, S., Takagi, Y., Mizuno, O., & Kikuno, T. (2005). Constructing a Bayesian belief network to predict final quality in embedded system development. IEICE Transactions on Information and Systems, 134, 1134–1141.

    Google Scholar 

  • Arisholm, E., & Briand, L. C. (2006). Predicting fault-prone components in a java legacy system. In ISESE ’06: Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering (pp. 8–17). ACM.

  • Basili, V. R., Briand, L. C., & Melo, W. L. (1996). A validation of object-oriented design metrics as quality indicators. IEEE Transactions on Software Engineering. IEEE Press, 22, 751–761

    Article  Google Scholar 

  • Basili, V. R., McGarry, F. E., Pajerski, R., & Zelkowitz, M. V. (2002). Lessons learned from 25 years of process improvement: The rise and fall of the NASA software engineering laboratory. In ICSE ’02: proceedings of the 24th international conference on software engineering (pp. 69–79). ACM.

  • Biffl, S., Halling, M., & Kszegi, S. (2003). Investigating the accuracy of defect estimation models for individuals and teams based on inspection data. In ISESE ’03: Proceedings of the 2003 international symposium on empirical software engineering (p. 232). IEEE Computer Society.

  • Boetticher, G., Menzies, T., & Ostrand, T. J. (2007). The PROMISE repository of empirical software engineering data West Virginia University, Lane Department of Computer Science and Electrical Engineering.

  • Brooks, F. P. (1995). The mythical man-month: Essays on software engineering. Reading: Anniversary Edition Addison-Wesley

    Google Scholar 

  • Demiroz, G., & Guvenir, H. A. (1997). Classification by voting feature intervals. In ECML ’97: Proceedings of the 9th European conference on machine learning (pp. 85–92). Springer.

  • Fagan, M. (1976). Design and code inspections to reduce errors in program development. IBM Systems Journal, 15, 182–211.

    Google Scholar 

  • Fenton, N., Neil, M., Marsh, W., Hearty, P., Marquez, D., Krause, P. & Mishra, R. (2007). Predicting software defects in varying development lifecycles using Bayesian nets Information and Software Technology. Butterworth-Heinemann, 49, 32–43.

    Google Scholar 

  • Fenton, N., & Neil, M. (1999). A critique of software defect prediction models. IEEE Transactions on Software Engineering, 25(5), 675–689

    Article  Google Scholar 

  • Hall, M. A., & Holmes, G. (2003). Benchmarking attribute selection techniques for discrete class data mining IEEE transactions on knowledge and data engineering. IEEE Educational Activities Department, 15, 1437–1447.

    Google Scholar 

  • Heeger, D. (1998). Signal detection theory.

  • IEEE Glossary of Software Engineering Terminology. (1990). ANSI/IEEE Standard 610.12 IEEE, New York.

  • Jiang, Y., Cukic, B., & Menzies, T. (2008). Can data transformation help in the detection of fault-prone modules? In DEFECTS ’08: Proceedings of the 2008 workshop on defects in large software systems (pp. 16–20). ACM, New York.

  • Kocaguneli, E., Tosun, A., Bener, A., Turhan, B., & Caglayan, B. (2009). Prest: An intelligent software metrics extraction. In Analysis and Defect Prediction Tool, SEKE’09: Proceedings of the 21st international conference on software engineering & knowledge engineering (SEKE’2009) (pp. 526–529). Boston, MA, USA, July 1–3.

  • Lee, E. A. (2002). Embedded software, advances in computers 56. London: Academic Press.

    Google Scholar 

  • Kan, S. H. (2002). Metrics and models in software quality engineering. Reading: Addison-Wesley.

    Google Scholar 

  • Khoshgoftaar, T. M., & Szabo, R. M. (1996). Using neural networks to predict software faults during testing. IEEE Transactions on Reliability, 45, 456–462.

    Article  Google Scholar 

  • Khoshgoftaar, T. M., & Allen, E. B. (1999). Predicting fault-prone software modules in embedded systems with classification trees. In HASE ’99: The 4th IEEE international symposium on high-assurance systems engineering (p. 105). IEEE Computer Society.

  • Khoshgoftaar, T. M., & Seliya, N. (2003). Fault prediction modeling for software quality estimation: Comparing commonly used techniques. Empirical Software Engineering, 8 255–283.

    Google Scholar 

  • Khoshgoftaar, T., & Seliya, N. (2004). The necessity of assuring quality in software measurement data. In METRICS ’04: Proceedings of the software metrics, 10th international symposium (pp. 119–130). IEEE Computer Society, Washington, DC, USA.

  • Khoshgoftaar, T., Zhong, S., & Joshi, V. (2005). Enhancing software quality estimation using ensemble-classifier based noise filtering. Intelligent Data Analysis, 9, 3–27

    Google Scholar 

  • Khoshgoftaar, T. M., & Gao, K. (2006). Assessment of a multi-strategy classifier for an embedded software system. In ICTAI ’06: Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence (pp. 651–658). IEEE Computer Society.

  • Kittler, J., Hatef, M., Duin, R. P. W., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239.

    Article  Google Scholar 

  • Kocaguneli, E., Tosun, A., Bener, A., Turhan, B., Caglayan, B. (2009). Prest: An intelligent software metrics extraction. In Analysis and Defect Prediction Tool, SEKE 2009: Proceedings of the 21st international conference on software engineering and knowledge engineering (pp. 637–642).

  • Koru, A. G., Zhang, D., El Emam, K., & Liu, H. (2009). An investigation into the functional form of the size-defect relationship for software modules. IEEE Transactions on Software Engineering, 35(2), 293–304

    Article  Google Scholar 

  • Kuncheva, L. I. (2004). Combining pattern classifiers: Methods and algorithms. Hoboken: Wiley-Interscience.

    Book  MATH  Google Scholar 

  • Lessmann, S., Baesens, B., Mues, C., & Pietsch, S. (2008). Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Transactions on Software Engineering, 34(4), 1–12.

    Article  Google Scholar 

  • Libralon, G., Carvalho, A., & Lorena, A. (2009). Ensembles of pre-processing techniques for noise detection in gene expression data. In Proceedings of the 15th international conference on advances in neuro-information processing (pp. 486–493). New Zealand.

  • Li, Q., & Yao, C. (2003). Real-time concepts for embedded systems. San Francisco: CMP Books.

    Google Scholar 

  • Marchenko, A., & Abrahamsson, P. (2007). Predicting software defect density: A case study on automated static code analysis. In XP ’07: Proceedings of the International Conference on Agile Processes in Software Engineering and Extreme Programming (pp. 137–140). Springer.

  • Menzies, T., Greenwald, J., & Frank, A. (2007). Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, IEEE Computer Society, 32(11), 2–13

    Article  Google Scholar 

  • Menzies, T., Dekhtyar, A., Distefano, J., & Greenwald, J. (2007). Problems with Precision: A response to comments on data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(9), 637–640.

    Article  Google Scholar 

  • Munson, J. C., & Khoshgoftaar, T. M. (1992). The Detection of Fault-Prone Programs. IEEE Transactions on Software Engineering, IEEE Press, 18, 423–433.

    Article  Google Scholar 

  • Ohlsson, N., & Wohlin, C. (1998). Experiences of fault data in a large software system. Failure and Lessons Learned in Information Technology Management, 2, 163–171.

    Google Scholar 

  • Oral, A. D., & Bener, A. (2007). Defect Prediction for Embedded Software. ISCIS ’07: Proceedings of the 22nd international symposium on computer and information sciences (pp. 1–6).

  • Ostrand, T. J., Weyuker E. J., & Bell, R. M. (2005). Predicting the location and number of faults in large software systems. IEEE Transactions on Software Engineering, 31(4), 340–355.

    Article  Google Scholar 

  • Padberg, F., Ragg, T., & Schoknecht, R. (2004). Using machine learning for estimating the defect content after an inspection. IEEE Transactions on Software Engineering, IEEE Press, 30, 17–28.

    Article  Google Scholar 

  • Rombach, D., & Seelisch, F. (2008). Formalisms in software engineering: Myths Vs. empirical facts, CEE-SET’07. Lecture Notes in Computer Science (LNCS), 5082, 18–25.

    Google Scholar 

  • Runeson, P., Ohlsson, M. C., & Wohlin, C. (2001). A classification scheme for studies on fault-prone components. In PROFES ’01: Proceedings of the third international conference on product focused software process improvement (pp. 341–355). Springer, Berlin.

  • Shull, F., Boehm, V. B., Brown, A., Costa, P., Lindvall, M., Port, D., Rus, I., Tesoriero, R., & Zelkowitz, M. (2002). What we have learned about fighting defects. In Proceedings of the eighth international software metrics symposium (pp. 249–258).

  • Shull, F. J., Carver, J. C., Vegas, S., & Juristo, N. (2008). The role of replications in empirical software engineering. Empirical Software Engineering Journal, 13, 211–218.

    Article  Google Scholar 

  • Tosun, A., Turhan, B., & Bener, A. (2008). Ensemble of software defect predictors: A case study. In Proceedings of the 2nd international symposium on empirical software engineering and measurement (pp. 318–320).

  • Tosun, A., Turhan, B., & Bener, A. (2009). Practical Considerations in Deploying AI for defect prediction: A case study within the Turkish telecommunication industry. In PROMISE’09: Proceedings of the first international conference on predictor models in software engineering. Vancouver, Canada.

  • Twala, B., & Cartwright, M. (2010). Ensemble missing data techniques for software effort prediction. Intelligent Data Analysis, 14(3), 299–331.

    Google Scholar 

  • Twala, B., Cartwright, M., & Shepperd, M. (2006). Ensemble of missing data techniques to improve software prediction accuracy. Proceedings of International Conference on Software Engineering (pp. 909–912).

  • Turhan, B., & Bener, A. (2008). Analysis of naive Bayes’ assumptions on software fault data: An empirical study. Data and Knowledge Engineering Journal, 68, 278–290.

    Article  Google Scholar 

  • Turhan, B., & Bener, A. (2007). Software defect prediction: Heuristics for weighted naive bayes. In Proceedings of the 2nd international conference on software and data technologies (ICSOFT’07) (pp. 244–249).

  • Turhan, B., Menzies, T., Bener, A., & Distefano, J. (2009). On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering Journal, 14(5), 540–578.

    Article  Google Scholar 

  • Verbaeten, S., & Assche, A. V. (2003). Ensemble methods for noise elimination in classification problems. In Proceedings of the 4th international conference on multiple classifier systems (pp. 317–325). UK.

  • Wohlin, C., Aurum, A., Petersson, H., Shull, F., & Ciolkowski, M. (2002). Software inspection benchmarking—A qualitative and quantitative comparative opportunity. In METRICS ’02: Proceedings of the 8th international symposium on software metrics (pp. 118–127). IEEE Computer Society.

  • Xu, W., Qin, Z., Ji, L., & Chang, Y. (2009). A feature weighted ensemble classifier on stream data. In Proceedings of international conference on computational intelligence and software engineering (pp. 1–5). China.

  • Zhang, H., & Zhang, X. (2007). Comments on data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(9), 635–637.

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This research is supported in part by Turkish State Planning Organization (DPT) under project number 2007K120610.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ayşe Tosun Mısırlı.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mısırlı, A.T., Bener, A.B. & Turhan, B. An industrial case study of classifier ensembles for locating software defects. Software Qual J 19, 515–536 (2011). https://doi.org/10.1007/s11219-010-9128-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-010-9128-1

Keywords

Navigation