Abstract
New methodologies and tools have gradually made the life cycle for software development more human-independent. Much of the research in this field focuses on defect reduction, defect identification and defect prediction. Defect prediction is a relatively new research area that involves using various methods from artificial intelligence to data mining. Identifying and locating defects in software projects is a difficult task. Measuring software in a continuous and disciplined manner provides many advantages such as the accurate estimation of project costs and schedules as well as improving product and process qualities. This study aims to propose a model to predict the number of defects in the new version of a software product with respect to the previous stable version. The new version may contain changes related to a new feature or a modification in the algorithm or bug fixes. Our proposed model aims to predict the new defects introduced into the new version by analyzing the types of changes in an objective and formal manner as well as considering the lines of code (LOC) change. Defect predictors are helpful tools for both project managers and developers. Accurate predictors may help reducing test times and guide developers towards implementing higher quality codes. Our proposed model can aid software engineers in determining the stability of software before it goes on production. Furthermore, such a model may provide useful insight for understanding the effects of a feature, bug fix or change in the process of defect detection.
Similar content being viewed by others
References
Alpaydin, E. (2004). Introduction to machine learning. Cambridge, MA: MIT Press.
Barry, M. J. A., & Linoff, G. (1997). Data mining techniques: For marketing, sales, and customer support. New York: John Wiley.
Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Oxford University Press.
Boehm, B., & Basili, V. R. (2000). Gaining intellectual control of software development. Computer, 33(5), 27–33.
Boehm, B., & Basili, V. (2001). Software defect reduction top 10 list. IEEE Computer, 34(1), 135–137.
Boehm, B., Clark, B., Horowitz, E., & Westland, C. (1995). Cost models for future software life cycle processes: COCOMO 2.0. Annals of Software Engineering, Special Volume on Software Process and Product Measurement(1), 57–94.
Bowen, J. P., & Hinchey, M. G. (1995). Seven more myths of formal methods. IEEE Software, 12(4), 34–41. doi:10.1109/52.391826.
Brilliant, S. S., Knight, J. C., & Leveson, N. G. (1990). Analysis of faults in an N-version software experiment. IEEE Transactions on Software Engineering, 16(2), 238–247. doi:10.1109/32.44387.
Brun, Y., & Ernst, M. (2004). Finding latent code errors via machine learning over program executions. Edinburgh, Scotland: ICSE 2004, 26th International Conference on Software Engineering.
Ceylan, E., Kutlubay, O., & Bener, A. (2006). Software Defect Identification Using Machine Learning Techniques. In 32nd Euromicro Conference on Software Engineering and Advanced Applications (Euromicro-SEAA 2006), Crotia.
Clarke, E. M., & Wing, J. M. (1996). Formal methods: state of the art and future directions. ACM Computing Surveys, 28(4), 626–643. doi:10.1145/242223.242257.
Coppit, D., Yang, J., Khurshid, S., Le, W., & Sullivan, K. (2005). Software assurance by bounded exhaustive testing. IEEE Transactions on Software Engineering, 31(4), 328–339. doi:10.1109/TSE.2005.52.
Fenton, N., & Neil, M. (1999). A critique of software defect prediction models. IEEE Transactions on Software Engineering, 25(5), 675–689. doi:10.1109/32.815326.
Fenton, N., & Ohlsson, N. (2000). Quantitative analysis of faults and failures in a complex software system. IEEE Transactions on Software Engineering, 26(8), 797–814. doi:10.1109/32.879815.
Gregoriades, A., & Sutcliffe, A. (2005). Scenario-based assessment of nonfunctional requirements. IEEE Transactions on Software Engineering, 31(5), 392–409. doi:10.1109/TSE.2005.59.
Groce, P., & Visser, W. (2003). What went wrong: Explaining counterexamples. 10th International SPIN Workshop on Model Checking of Software. Portland, Oregon, pp. 121–135.
Harrold, M. J. (2000). Testing: A roadmap. Proceedings of the Conference on the Future of Software Engineering, Limerick, Ireland.
Inoue, K., Yokomori, R., Yamamoto, R., Matsushita, M., & Kusumoto, S. (2005). Ranking significance of software components based on use relations. IEEE Transactions on Software Engineering, 31(3), 213–225. doi:10.1109/TSE.2005.38.
Jensen, F. (1996). An introduction to Bayesian networks. NY: Springer Verlag.
Johnson, P. M., Kou, H., Paulding, M., Zhang, Q., Kagawa, A., & Yamashita, T. (2005). Improving software development management through software project telemetry. IEEE Software, 22(4), 76–85. doi:10.1109/MS.2005.95.
Jorgensen, M. (2005). Practical guidelines for expert-judgment-based software effort estimation. IEEE Software, 22(3), 57–63. doi:10.1109/MS.2005.73.
Khoshgoftaar, T. M., & Allen, E. B. (1999). A comparative study of ordering and classification of fault-prone software modules. Empirical Software Engineering, 4, 159–186. doi:10.1023/A:1009876418873.
Koru, G., & Liu, H. (2005). Building defect prediction models in practice. IEEE Software, 22(6), 23–29. doi:10.1109/MS.2005.149.
Kung, D. C., Gao, J., Hsia, F., Wen, F., Toyoshima, Y., & Chen, C. (1994). Change impact identification in object oriented software maintenance. Proceedings of the International Conference on Software Maintenance, pp. 202–211, IEEE Computer Society Press.
Menzies, T., DiStefano, J., & Chapman, R. (2004). Assessing predictors of software defects. Proccedings of Workshop on Predictive Software Models, Chicago.
Menzies, T., DiStefano, J. S., Chapman, M., & McGill, K. (2002). Metrics that matter. Proceedings of 27th NASA SEL Workshop on Software Engineering.
Menzies, T., Greenwald, J., & Frank, A. (2007). Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(1), 2–13. doi:10.1109/TSE.2007.256941.
Menzies, T., D. Stefano, J. S., & Chapman, M. (2003). Learning early life cycle IV & V quality indicators. Proceedings of Ninth International Software Metrics Symposium.
Mitchell, T. M. (1997). Machine Learning. NY: McGrawHill.
Munson, J., & Khoshgoftaar, T. M. (1990). Regression modelling of software quality: Empirical investigation. Journal of Electronic Materials, 19(6), 106–114.
Nagappan, N., & Ball, T. (2005a). Use of relative code churn measures to predict system defect density. St. Louis, MO: ICSE 2005.
Nagappan, N., & Ball, T. (2005b). Static analysis tools as early indicators of pre-release defect density. St. Louis, MO: ICSE 2005.
Nagappan, N., Williams, L., Osborne, J., Vouk, M., & Abrahamsson, P. (2005). Providing test quality feedback using static source code and automatic test suite metrics. Chicago, IL: International Symposium on Software Reliability Engineering.
Ostrand, T. J., Weyuker, E. J., & Bell, R. M. (2005). Predicting the location and number of faults in large software systems. IEEE Transactions on Software Engineering, 31(4), 340–355. doi:10.1109/TSE.2005.49.
Padberg, F., Ragg, T., & Schoknecht, R. (2004). Using machine learning for estimating the defect content after an inspection. IEEE Transactions on Software Engineering, 30(1), 17–28. doi:10.1109/TSE.2004.1265733.
Pendharkar, P. C., Subramanian, G. H., & Rodger, J. A. (2005). A probabilistic model for predicting software development effort. IEEE Transactions on Software Engineering, 31(7), 615–624. doi:10.1109/TSE.2005.75.
Podgurski, D., Leaon, P., Francis, P., Masri, W., Minch, M., Jiayang, S., & Wang, B. (2003). Automated support for classifying software failure reports. Portland, Oregon: ICSE 2003.
Porter, A., & Votta, L. (2004). Comparing detection methods for software requirements inspections: A replication using professional subjects. Empirical Software Engineering, 3(4), 355–379.
Sarle, W. (1996). How many hidden layer should i use. Neural Nets FAQ, http://www.faqs.org/faqs/ai-faq/neural-nets/part3/section-9.html.
Sheppard, M., & Ince, D. C. (1994). A critique of three metrics. The Journal of Systems and Software, 26(33), 197–210.
Song, O., Sheppard, M., Cartwright, M., & Mair, C. (2006). Software defect association mining and defect correction effort prediction. IEEE Transactions on Software Engineering, 32(2), 69–82. doi:10.1109/TSE.2006.1599417.
Sontag, E. D. (1992). Feedback stabilization using two-hidden-layer nets. IEEE Transactions on Neural Networks, 3, 981–990. doi:10.1109/72.165599.
Sourceforge. from www.sourceforge.net.
Swingler, K. (1996). Applying neural networks: Apractical guide. London: Academic Press.
Tahat, L. H., Vaysburg, B., Korel B., & Bader, A. J. (2001). A requirement-based automated black-box test generation. Proceedings of 25th Annual International Computer Software and Applications Conference, Chicago, IL, pp. 489–495.
Vaidyanathan, K., & Trivedi, S. (2005). A comprehensive model for software rejuvenation. IEEE Transactions on Dependable Secure Computing, 2(2), 124–137. doi:10.1109/TDSC.2005.15.
Zhang, D. (2000). Applying machine learning algorithms in software development. The Proceedings of 2000 Monterey Workshop on Modeling Software System Structures, Santa Margherita Ligure, Italy.
Acknowledgements
This work is supported in part by the Boğaziçi University research fund under grant number BAP–06HA104. Special thanks to our colleague Burak Turhan for his valuable comments on the manuscript. We would also thank Ms Cigdem Aksoy Fromm who has done the final editing of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kastro, Y., Bener, A.B. A defect prediction method for software versioning. Software Qual J 16, 543–562 (2008). https://doi.org/10.1007/s11219-008-9053-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11219-008-9053-8