An empirical study of factors affecting cross-project aging-related bug prediction with TLAP

Qin, Fangyun; Wan, Xiaohui; Yin, Beibei

doi:10.1007/s11219-019-09460-7

An empirical study of factors affecting cross-project aging-related bug prediction with TLAP

Published: 16 October 2019

Volume 28, pages 107–134, (2020)
Cite this article

Software Quality Journal Aims and scope Submit manuscript

Fangyun Qin^1,2,
Xiaohui Wan^1,2 &
Beibei Yin^1,2

372 Accesses
8 Citations
Explore all metrics

Abstract

Software aging is a phenomenon in which long-running software systems show an increasing failure rate and/or progressive performance degradation. Due to their nature, Aging-Related Bugs (ARBs) are hard to discover during software testing and are also challenging to reproduce. Therefore, automatically predicting ARBs before software release can help developers reduce ARB impact or avoid ARBs. Many bug prediction approaches have been proposed, and most of them show effectiveness in within-project prediction settings. However, due to the low presence and reproducing difficulty of ARBs, it is usually hard to collect sufficient training data to build an accurate prediction model. A recent work proposed a method named Transfer Learning based Aging-related bug Prediction (TLAP) for performing cross-project ARB prediction. Although this method considerably improves cross-project ARB prediction performance, it has been observed that its prediction result is affected by several key factors, such as the normalization methods, kernel functions, and machine learning classifiers. Therefore, this paper presents the first empirical study to examine the impact of these factors on the effectiveness of cross-project ARB prediction in terms of single-factor pattern, bigram pattern, and triplet pattern and validates the results with the Scott-Knott test technique. We find that kernel functions and classifiers are key factors affecting the effectiveness of cross-project ARB prediction, while normalization methods do not show statistical influence. In addition, the order of values in three single-factor patterns is maintained in three bigram patterns and one triplet pattern to a large extent. Similarly, the order of values in the three bigram patterns is also maintained in the triplet pattern.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The significant impact of parameter tuning on blocking bug prediction

Article 19 June 2023

Data Transformation in Cross-project Defect Prediction

Article 14 April 2017

The impact of tangled code changes on defect prediction models

Article 16 April 2015

References

Al Shalabi, L., & Shaaban, Z. (2006). Normalization as a preprocessing engine for data mining and the approach of preference matrix. In 2006 International conference on dependability of computer systems (pp. 207–214): IEEE.
Avritzer, A., & Weyuker, E.J. (1997). Monitoring smoothly degrading systems for increased dependability. Empirical Software Engineering, 2(1), 59–77.
Article Google Scholar
Carrozza, G., Cotroneo, D., Natella, R., Pietrantuono, R., Russo, S. (2013). Analysis and prediction of mandelbugs in an industrial software system. In 2013 IEEE Sixth international conference on software testing, verification and validation (pp. 262–271): IEEE.
Cassidy, K.J., Gross, K.C., Malekpour, A. (2002). Advanced pattern recognition for detection of complex software aging phenomena in online transaction processing servers. In Proceedings international conference on dependable systems and networks (pp. 478–482): IEEE.
Castelli, V., Harper, R.E., Heidelberger, P., Hunter, S.W., Trivedi, K.S., Vaidyanathan, K., Zeggert, W.P. (2001). Proactive management of software aging. IBM Journal of Research and Development, 45(2), 311–332.
Article Google Scholar
Catal, C. (2011). Software fault prediction: a literature review and current trends. Expert Systems with Applications, 38(4), 4626–4636.
Article Google Scholar
Chen, L., Fang, B., Shang, Z., Tang, Y. (2015). Negative samples reduction in cross-company software defects prediction. Information and Software Technology, 62, 67–77.
Article Google Scholar
Corazza, A, Di Martino, S, Ferrucci, F, Gravino, C, Sarro, F, Mendes, E. (2010). How effective is tabu search to configure support vector regression for effort estimation?. In Proceedings of the 6th international conference on predictive models in software engineering (p. 4): ACM.
Cotroneo, D., Natella, R., Pietrantuono, R. (2010). Is software aging related to software metrics?. In 2010 IEEE Second international workshop on software aging and rejuvenation (pp. 1–6): IEEE.
Cotroneo, D., Grottke, M., Natella, R., Pietrantuono, R., Trivedi, K.S. (2013a). Fault triggers in open-source software: an experience report. In 2013 IEEE 24th International symposium on software reliability engineering (ISSRE) (pp. 178–187): IEEE.
Cotroneo, D, Natella, R, Pietrantuono, R. (2013b). Predicting aging-related bugs using software complexity metrics. Performance Evaluation, 70(3), 163–178.
Article Google Scholar
Di Martino, S, Ferrucci, F, Gravino, C, Sarro, F. (2011). A genetic algorithm to configure support vector machines for predicting fault-prone components. In International conference on product focused software process improvement (pp. 247–261): Springer.
Gao, K., Khoshgoftaar, T.M., Napolitano, A. (2012). A hybrid approach to coping with high dimensionality and class imbalance for software defect prediction. In 2012 11th international conference on machine learning and applications, (Vol. 2 pp. 281–288): IEEE.
Graf, A.B., & Borer, S. (2001). Normalization in support vector machines. In Joint pattern recognition symposium (pp. 277–282): Springer.
Grottke, M., Li, L., Vaidyanathan, K., Trivedi, K.S. (2006). Analysis of software aging in a web server. IEEE Transactions on Reliability, 55(3), 411–420.
Article Google Scholar
Grottke, M., Matias, R., Trivedi, K.S. (2008). The fundamentals of software aging. In 2008 IEEE International conference on software reliability engineering workshops (ISSRE Wksp) (pp. 1–6): IEEE.
Grottke, M., Nikora, A.P., Trivedi, K.S. (2010). An empirical investigation of fault types in space mission system software. In 2010 IEEE/IFIP international conference on dependable systems & networks (DSN) (pp. 447–456): IEEE.
Hall, T., Beecham, S., Bowes, D., Gray, D., Counsell, S. (2012). A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering, 38(6), 1276–1304.
Article Google Scholar
Han, J, Pei, J, Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.
Hassan, A.E. (2009). Predicting faults using the complexity of code changes. In Proceedings of the 31st international conference on software engineering (pp. 78–88): IEEE Computer Society.
He, Z, Peters, F, Menzies, T, Yang, Y. (2013). Learning from open-source projects: an empirical study on defect prediction. In 2013 ACM/IEEE international symposium on empirical software engineering and measurement (pp. 45–54): IEEE.
He, P., Li, B., Liu, X., Chen, J., Ma, Y. (2015). An empirical study on software defect prediction with a simplified metric set. Information and Software Technology, 59, 170–190.
Article Google Scholar
Herbold, S. (2013). Training data selection for cross-project defect prediction. In Proceedings of the 9th international conference on predictive models in software engineering (p. 6): ACM.
Herbold, S. (2017). A systematic mapping study on cross-project defect prediction. arXiv:170506429.
Huang, Y., Kintala, C., Kolettis, N., Fulton, N.D. (1995). Software rejuvenation: analysis, module and applications. In Twenty-fifth international symposium on fault-tolerant computing. Digest of papers (pp. 381–390): IEEE.
Jelihovschi, E.G., Faria, J.C., Allaman, I.B. (2014). Scottknott: a package for performing the scott-knott clustering algorithm in r. TEMA (São Carlos), 15(1), 3–17.
Article MathSciNet Google Scholar
Khoshgoftaar, T.M., Gao, K., Seliya, N. (2010). Attribute selection and imbalanced data: problems in software defect prediction. In 2010 22nd IEEE International conference on tools with artificial intelligence, (Vol. 1 pp. 137–144): IEEE.
Kim, S, Zimmermann, T, Whitehead, EJ Jr, Zeller, A. (2007). Predicting faults from cached history. In Proceedings of the 29th international conference on software engineering (pp. 489–498): IEEE Computer Society.
Kotsiantis, S., Kanellopoulos, D., Pintelas, P. (2006). Data preprocessing for supervised leaning. International Journal of Computer Science, 1(2), 111–117.
Google Scholar
Kumar, L, & Sureka, A. (2018). Feature selection techniques to counter class imbalance problem for aging related bug prediction: aging related bug prediction. In Proceedings of the 11th innovations in software engineering conference (p. 2): ACM.
Li, M., Zhang, H., Wu, R., Zhou, Z.H. (2012). Sample-based software defect prediction with active and semi-supervised learning. Automated Software Engineering, 19(2), 201–230.
Article Google Scholar
Ma, Y., Luo, G., Zeng, X., Chen, A. (2012). Transfer learning for cross-company software defect prediction. Information and Software Technology, 54(3), 248–256.
Article Google Scholar
Marshall, E. (1992). Fatal error: how patriot overlooked a scud. Science, 255 (5050), 1347–1348.
Article Google Scholar
Matias, R., & Paulo Filho, J. (2006). An experimental study on software aging and rejuvenation in web servers. In 30th Annual international computer software and applications conference (COMPSAC’06), (Vol. 1 pp. 189–196): IEEE.
Matias, R., Barbetta, P.A., Trivedi, K.S., Freitas Filho, P.J. (2010). Accelerated degradation tests applied to software aging experiments. IEEE Transactions on Reliability, 59(1), 102–114.
Article Google Scholar
Menzies, T., Greenwald, J., Frank, A. (2007). Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(1), 2–13.
Article Google Scholar
Moser, R, Pedrycz, W, Succi, G. (2008). A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In Proceedings of the 30th international conference on Software engineering (pp. 181–190): ACM.
Nagappan, N, & Ball, T. (2005). Use of relative code churn measures to predict system defect density. In Proceedings of the 27th international conference on software engineering (pp. 284–292): ACM.
Nam, J., Pan, S.J., Kim, S. (2013). Transfer defect learning. In 2013 35th International conference on software engineering (ICSE) (pp. 382–391): IEEE.
Nam, J., Fu, W., Kim, S., Menzies, T., Tan, L. (2018). Heterogeneous defect prediction. IEEE Transactions on Software Engineering, 44(9), 874–896.
Article Google Scholar
Nayak, S., Misra, B., Behera, H. (2014). Impact of data normalization on stock index forecasting. International Journal of Computer and Information System Industrial Management Application, 6, 357–369.
Google Scholar
Pan, S.J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.
Article Google Scholar
Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q. (2011). Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 22(2), 199–210.
Article Google Scholar
Peters, F., Menzies, T., Gong, L., Zhang, H. (2013). Balancing privacy and utility in cross-company defect prediction. IEEE Transactions on Software Engineering, 39(8), 1054–1068.
Article Google Scholar
Qiao, Y., Zheng, Z., Fang, Y., Qin, F., Trivedi, K.S., Cai, K.Y. (2018). Two-level rejuvenation for android smartphones and its optimization. IEEE Transactions on Reliability.
Qin, F., Zheng, Z., Bai, C., Qiao, Y., Zhang, Z., Chen, C. (2015). Cross-project aging related bug prediction. In 2015 IEEE International conference on software quality, reliability and security (pp. 43–48): IEEE.
Qin, F., Zheng, Z., Li, X., Qiao, Y., Trivedi, K.S. (2017). An empirical investigation of fault triggers in android operating system. In 2017 IEEE 22nd Pacific Rim international symposium on dependable computing (PRDC) (pp. 135–*144): IEEE.
Qin, F., Zheng, Z., Qiao, Y., Trivedi, K.S. (2018). Studying aging-related bug prediction using cross-project models. IEEE Transactions on Reliability, 99, 1–20.
Google Scholar
Ren, J., Qin, K., Ma, Y., Luo, G. (2014). On software defect prediction using machine learning. Journal of Applied Mathematics, 2014.
Ryu, D., Choi, O., Baik, J. (2016). Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empirical Software Engineering, 21 (1), 43–71.
Article Google Scholar
Ryu, D., Jang, J.I., Baik, J. (2017). A transfer cost-sensitive boosting approach for cross-project defect prediction. Software Quality Journal, 25(1), 235–272.
Article Google Scholar
Scott, A.J., & Knott, M. (1974). A cluster analysis method for grouping means in the analysis of variance. Biometrics, 507–512.
Tai, A.T., Chau, S.N., Alkalaj, L., Hecht, H. (1997). On-board preventive maintenance: analysis of effectiveness and optimal duty period. In Proceedings Third international workshop on object-oriented real-time dependable systems (pp. 40–47): IEEE.
Turhan, B., Menzies, T., Bener, A.B., Di Stefano, J. (2009). On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering, 14(5), 540–578.
Article Google Scholar
Turhan, B., Mısırlı, A.T., Bener, A. (2013). Empirical evaluation of the effects of mixed project data on learning defect predictors. Information and Software Technology, 55(6), 1101–1118.
Article Google Scholar
Vaidyanathan, K., & Trivedi, K.S. (2005). A comprehensive model for software rejuvenation. IEEE Transactions on Dependable and Secure Computing, 2(2), 124–137.
Article Google Scholar
Watanabe, S, Kaiya, H., Kaijiri, K. (2008). Adapting a fault prediction model to allow inter languagereuse. In Proceedings of the 4th international workshop on predictor models in software engineering (pp. 19–24): ACM.
Weiss, K., Khoshgoftaar, T.M., Wang, D. (2016). A survey of transfer learning. Journal of Big data, 3(1), 9.
Article Google Scholar
Xiao, G., Zheng, Z., Yin, B., Trivedi, K.S., Du, X., Cai, K. (2017). Experience report: fault triggers in linux operating system: from evolution perspective. In 2017 IEEE 28th international symposium on software reliability engineering (ISSRE) (pp. 101–111): IEEE.
Zhao, L, Song, Q, Zhu, L. (2008). Common software-aging-related faults in fault-tolerant systems. In 2008 International conference on computational intelligence for modelling control & automation (pp. 327–331): IEEE.
Zhao, J, Jin, Y, Trivedi, K.S., Matias, R. Jr. (2011). Injecting memory leaks to accelerate software failures. In 2011 IEEE 22nd international symposium on software reliability engineering (pp. 260–269): IEEE.
Zhou, Z. (2016). Machine learning. Tsinghua Press.

Download references

Funding

This work was supported by the State Key Laboratory of Software Development Environment under Grant SKLSDE-2018ZX-09, National Natural Science Foundation of China under Grant 61772055 and Grant 61872169, and the Technical Foundation Project of Ministry of Industry and Information Technology of China under Grant JSZL2016601B003.

Author information

Authors and Affiliations

State Key Laboratory of Software Development Environment, Beihang University, Beijing, China
Fangyun Qin, Xiaohui Wan & Beibei Yin
School of Automation Science and Electrical Engineering, Beihang University, Beijing, China
Fangyun Qin, Xiaohui Wan & Beibei Yin

Authors

Fangyun Qin
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohui Wan
View author publications
You can also search for this author in PubMed Google Scholar
Beibei Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Beibei Yin.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qin, F., Wan, X. & Yin, B. An empirical study of factors affecting cross-project aging-related bug prediction with TLAP. Software Qual J 28, 107–134 (2020). https://doi.org/10.1007/s11219-019-09460-7

Download citation

Published: 16 October 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s11219-019-09460-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An empirical study of factors affecting cross-project aging-related bug prediction with TLAP

Abstract

Access this article

Similar content being viewed by others

The significant impact of parameter tuning on blocking bug prediction

Data Transformation in Cross-project Defect Prediction

The impact of tangled code changes on defect prediction models

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An empirical study of factors affecting cross-project aging-related bug prediction with TLAP

Abstract

Access this article

Similar content being viewed by others

The significant impact of parameter tuning on blocking bug prediction

Data Transformation in Cross-project Defect Prediction

The impact of tangled code changes on defect prediction models

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation