Skip to main content
Log in

An empirical study of factors affecting cross-project aging-related bug prediction with TLAP

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Software aging is a phenomenon in which long-running software systems show an increasing failure rate and/or progressive performance degradation. Due to their nature, Aging-Related Bugs (ARBs) are hard to discover during software testing and are also challenging to reproduce. Therefore, automatically predicting ARBs before software release can help developers reduce ARB impact or avoid ARBs. Many bug prediction approaches have been proposed, and most of them show effectiveness in within-project prediction settings. However, due to the low presence and reproducing difficulty of ARBs, it is usually hard to collect sufficient training data to build an accurate prediction model. A recent work proposed a method named Transfer Learning based Aging-related bug Prediction (TLAP) for performing cross-project ARB prediction. Although this method considerably improves cross-project ARB prediction performance, it has been observed that its prediction result is affected by several key factors, such as the normalization methods, kernel functions, and machine learning classifiers. Therefore, this paper presents the first empirical study to examine the impact of these factors on the effectiveness of cross-project ARB prediction in terms of single-factor pattern, bigram pattern, and triplet pattern and validates the results with the Scott-Knott test technique. We find that kernel functions and classifiers are key factors affecting the effectiveness of cross-project ARB prediction, while normalization methods do not show statistical influence. In addition, the order of values in three single-factor patterns is maintained in three bigram patterns and one triplet pattern to a large extent. Similarly, the order of values in the three bigram patterns is also maintained in the triplet pattern.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Al Shalabi, L., & Shaaban, Z. (2006). Normalization as a preprocessing engine for data mining and the approach of preference matrix. In 2006 International conference on dependability of computer systems (pp. 207–214): IEEE.

  • Avritzer, A., & Weyuker, E.J. (1997). Monitoring smoothly degrading systems for increased dependability. Empirical Software Engineering, 2(1), 59–77.

    Article  Google Scholar 

  • Carrozza, G., Cotroneo, D., Natella, R., Pietrantuono, R., Russo, S. (2013). Analysis and prediction of mandelbugs in an industrial software system. In 2013 IEEE Sixth international conference on software testing, verification and validation (pp. 262–271): IEEE.

  • Cassidy, K.J., Gross, K.C., Malekpour, A. (2002). Advanced pattern recognition for detection of complex software aging phenomena in online transaction processing servers. In Proceedings international conference on dependable systems and networks (pp. 478–482): IEEE.

  • Castelli, V., Harper, R.E., Heidelberger, P., Hunter, S.W., Trivedi, K.S., Vaidyanathan, K., Zeggert, W.P. (2001). Proactive management of software aging. IBM Journal of Research and Development, 45(2), 311–332.

    Article  Google Scholar 

  • Catal, C. (2011). Software fault prediction: a literature review and current trends. Expert Systems with Applications, 38(4), 4626–4636.

    Article  Google Scholar 

  • Chen, L., Fang, B., Shang, Z., Tang, Y. (2015). Negative samples reduction in cross-company software defects prediction. Information and Software Technology, 62, 67–77.

    Article  Google Scholar 

  • Corazza, A, Di Martino, S, Ferrucci, F, Gravino, C, Sarro, F, Mendes, E. (2010). How effective is tabu search to configure support vector regression for effort estimation?. In Proceedings of the 6th international conference on predictive models in software engineering (p. 4): ACM.

  • Cotroneo, D., Natella, R., Pietrantuono, R. (2010). Is software aging related to software metrics?. In 2010 IEEE Second international workshop on software aging and rejuvenation (pp. 1–6): IEEE.

  • Cotroneo, D., Grottke, M., Natella, R., Pietrantuono, R., Trivedi, K.S. (2013a). Fault triggers in open-source software: an experience report. In 2013 IEEE 24th International symposium on software reliability engineering (ISSRE) (pp. 178–187): IEEE.

  • Cotroneo, D, Natella, R, Pietrantuono, R. (2013b). Predicting aging-related bugs using software complexity metrics. Performance Evaluation, 70(3), 163–178.

    Article  Google Scholar 

  • Di Martino, S, Ferrucci, F, Gravino, C, Sarro, F. (2011). A genetic algorithm to configure support vector machines for predicting fault-prone components. In International conference on product focused software process improvement (pp. 247–261): Springer.

  • Gao, K., Khoshgoftaar, T.M., Napolitano, A. (2012). A hybrid approach to coping with high dimensionality and class imbalance for software defect prediction. In 2012 11th international conference on machine learning and applications, (Vol. 2 pp. 281–288): IEEE.

  • Graf, A.B., & Borer, S. (2001). Normalization in support vector machines. In Joint pattern recognition symposium (pp. 277–282): Springer.

  • Grottke, M., Li, L., Vaidyanathan, K., Trivedi, K.S. (2006). Analysis of software aging in a web server. IEEE Transactions on Reliability, 55(3), 411–420.

    Article  Google Scholar 

  • Grottke, M., Matias, R., Trivedi, K.S. (2008). The fundamentals of software aging. In 2008 IEEE International conference on software reliability engineering workshops (ISSRE Wksp) (pp. 1–6): IEEE.

  • Grottke, M., Nikora, A.P., Trivedi, K.S. (2010). An empirical investigation of fault types in space mission system software. In 2010 IEEE/IFIP international conference on dependable systems & networks (DSN) (pp. 447–456): IEEE.

  • Hall, T., Beecham, S., Bowes, D., Gray, D., Counsell, S. (2012). A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering, 38(6), 1276–1304.

    Article  Google Scholar 

  • Han, J, Pei, J, Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.

  • Hassan, A.E. (2009). Predicting faults using the complexity of code changes. In Proceedings of the 31st international conference on software engineering (pp. 78–88): IEEE Computer Society.

  • He, Z, Peters, F, Menzies, T, Yang, Y. (2013). Learning from open-source projects: an empirical study on defect prediction. In 2013 ACM/IEEE international symposium on empirical software engineering and measurement (pp. 45–54): IEEE.

  • He, P., Li, B., Liu, X., Chen, J., Ma, Y. (2015). An empirical study on software defect prediction with a simplified metric set. Information and Software Technology, 59, 170–190.

    Article  Google Scholar 

  • Herbold, S. (2013). Training data selection for cross-project defect prediction. In Proceedings of the 9th international conference on predictive models in software engineering (p. 6): ACM.

  • Herbold, S. (2017). A systematic mapping study on cross-project defect prediction. arXiv:170506429.

  • Huang, Y., Kintala, C., Kolettis, N., Fulton, N.D. (1995). Software rejuvenation: analysis, module and applications. In Twenty-fifth international symposium on fault-tolerant computing. Digest of papers (pp. 381–390): IEEE.

  • Jelihovschi, E.G., Faria, J.C., Allaman, I.B. (2014). Scottknott: a package for performing the scott-knott clustering algorithm in r. TEMA (São Carlos), 15(1), 3–17.

    Article  MathSciNet  Google Scholar 

  • Khoshgoftaar, T.M., Gao, K., Seliya, N. (2010). Attribute selection and imbalanced data: problems in software defect prediction. In 2010 22nd IEEE International conference on tools with artificial intelligence, (Vol. 1 pp. 137–144): IEEE.

  • Kim, S, Zimmermann, T, Whitehead, EJ Jr, Zeller, A. (2007). Predicting faults from cached history. In Proceedings of the 29th international conference on software engineering (pp. 489–498): IEEE Computer Society.

  • Kotsiantis, S., Kanellopoulos, D., Pintelas, P. (2006). Data preprocessing for supervised leaning. International Journal of Computer Science, 1(2), 111–117.

    Google Scholar 

  • Kumar, L, & Sureka, A. (2018). Feature selection techniques to counter class imbalance problem for aging related bug prediction: aging related bug prediction. In Proceedings of the 11th innovations in software engineering conference (p. 2): ACM.

  • Li, M., Zhang, H., Wu, R., Zhou, Z.H. (2012). Sample-based software defect prediction with active and semi-supervised learning. Automated Software Engineering, 19(2), 201–230.

    Article  Google Scholar 

  • Ma, Y., Luo, G., Zeng, X., Chen, A. (2012). Transfer learning for cross-company software defect prediction. Information and Software Technology, 54(3), 248–256.

    Article  Google Scholar 

  • Marshall, E. (1992). Fatal error: how patriot overlooked a scud. Science, 255 (5050), 1347–1348.

    Article  Google Scholar 

  • Matias, R., & Paulo Filho, J. (2006). An experimental study on software aging and rejuvenation in web servers. In 30th Annual international computer software and applications conference (COMPSAC’06), (Vol. 1 pp. 189–196): IEEE.

  • Matias, R., Barbetta, P.A., Trivedi, K.S., Freitas Filho, P.J. (2010). Accelerated degradation tests applied to software aging experiments. IEEE Transactions on Reliability, 59(1), 102–114.

    Article  Google Scholar 

  • Menzies, T., Greenwald, J., Frank, A. (2007). Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(1), 2–13.

    Article  Google Scholar 

  • Moser, R, Pedrycz, W, Succi, G. (2008). A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In Proceedings of the 30th international conference on Software engineering (pp. 181–190): ACM.

  • Nagappan, N, & Ball, T. (2005). Use of relative code churn measures to predict system defect density. In Proceedings of the 27th international conference on software engineering (pp. 284–292): ACM.

  • Nam, J., Pan, S.J., Kim, S. (2013). Transfer defect learning. In 2013 35th International conference on software engineering (ICSE) (pp. 382–391): IEEE.

  • Nam, J., Fu, W., Kim, S., Menzies, T., Tan, L. (2018). Heterogeneous defect prediction. IEEE Transactions on Software Engineering, 44(9), 874–896.

    Article  Google Scholar 

  • Nayak, S., Misra, B., Behera, H. (2014). Impact of data normalization on stock index forecasting. International Journal of Computer and Information System Industrial Management Application, 6, 357–369.

    Google Scholar 

  • Pan, S.J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.

    Article  Google Scholar 

  • Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q. (2011). Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 22(2), 199–210.

    Article  Google Scholar 

  • Peters, F., Menzies, T., Gong, L., Zhang, H. (2013). Balancing privacy and utility in cross-company defect prediction. IEEE Transactions on Software Engineering, 39(8), 1054–1068.

    Article  Google Scholar 

  • Qiao, Y., Zheng, Z., Fang, Y., Qin, F., Trivedi, K.S., Cai, K.Y. (2018). Two-level rejuvenation for android smartphones and its optimization. IEEE Transactions on Reliability.

  • Qin, F., Zheng, Z., Bai, C., Qiao, Y., Zhang, Z., Chen, C. (2015). Cross-project aging related bug prediction. In 2015 IEEE International conference on software quality, reliability and security (pp. 43–48): IEEE.

  • Qin, F., Zheng, Z., Li, X., Qiao, Y., Trivedi, K.S. (2017). An empirical investigation of fault triggers in android operating system. In 2017 IEEE 22nd Pacific Rim international symposium on dependable computing (PRDC) (pp. 135–*144): IEEE.

  • Qin, F., Zheng, Z., Qiao, Y., Trivedi, K.S. (2018). Studying aging-related bug prediction using cross-project models. IEEE Transactions on Reliability, 99, 1–20.

    Google Scholar 

  • Ren, J., Qin, K., Ma, Y., Luo, G. (2014). On software defect prediction using machine learning. Journal of Applied Mathematics, 2014.

  • Ryu, D., Choi, O., Baik, J. (2016). Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empirical Software Engineering, 21 (1), 43–71.

    Article  Google Scholar 

  • Ryu, D., Jang, J.I., Baik, J. (2017). A transfer cost-sensitive boosting approach for cross-project defect prediction. Software Quality Journal, 25(1), 235–272.

    Article  Google Scholar 

  • Scott, A.J., & Knott, M. (1974). A cluster analysis method for grouping means in the analysis of variance. Biometrics, 507–512.

  • Tai, A.T., Chau, S.N., Alkalaj, L., Hecht, H. (1997). On-board preventive maintenance: analysis of effectiveness and optimal duty period. In Proceedings Third international workshop on object-oriented real-time dependable systems (pp. 40–47): IEEE.

  • Turhan, B., Menzies, T., Bener, A.B., Di Stefano, J. (2009). On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering, 14(5), 540–578.

    Article  Google Scholar 

  • Turhan, B., Mısırlı, A.T., Bener, A. (2013). Empirical evaluation of the effects of mixed project data on learning defect predictors. Information and Software Technology, 55(6), 1101–1118.

    Article  Google Scholar 

  • Vaidyanathan, K., & Trivedi, K.S. (2005). A comprehensive model for software rejuvenation. IEEE Transactions on Dependable and Secure Computing, 2(2), 124–137.

    Article  Google Scholar 

  • Watanabe, S, Kaiya, H., Kaijiri, K. (2008). Adapting a fault prediction model to allow inter languagereuse. In Proceedings of the 4th international workshop on predictor models in software engineering (pp. 19–24): ACM.

  • Weiss, K., Khoshgoftaar, T.M., Wang, D. (2016). A survey of transfer learning. Journal of Big data, 3(1), 9.

    Article  Google Scholar 

  • Xiao, G., Zheng, Z., Yin, B., Trivedi, K.S., Du, X., Cai, K. (2017). Experience report: fault triggers in linux operating system: from evolution perspective. In 2017 IEEE 28th international symposium on software reliability engineering (ISSRE) (pp. 101–111): IEEE.

  • Zhao, L, Song, Q, Zhu, L. (2008). Common software-aging-related faults in fault-tolerant systems. In 2008 International conference on computational intelligence for modelling control & automation (pp. 327–331): IEEE.

  • Zhao, J, Jin, Y, Trivedi, K.S., Matias, R. Jr. (2011). Injecting memory leaks to accelerate software failures. In 2011 IEEE 22nd international symposium on software reliability engineering (pp. 260–269): IEEE.

  • Zhou, Z. (2016). Machine learning. Tsinghua Press.

Download references

Funding

This work was supported by the State Key Laboratory of Software Development Environment under Grant SKLSDE-2018ZX-09, National Natural Science Foundation of China under Grant 61772055 and Grant 61872169, and the Technical Foundation Project of Ministry of Industry and Information Technology of China under Grant JSZL2016601B003.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Beibei Yin.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qin, F., Wan, X. & Yin, B. An empirical study of factors affecting cross-project aging-related bug prediction with TLAP. Software Qual J 28, 107–134 (2020). https://doi.org/10.1007/s11219-019-09460-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-019-09460-7

Keywords

Navigation