Cross-project bug type prediction based on transfer learning

Du, Xiaoting; Zhou, Zenghui; Yin, Beibei; Xiao, Guanping

doi:10.1007/s11219-019-09467-0

Cross-project bug type prediction based on transfer learning

Published: 16 September 2019

Volume 28, pages 39–57, (2020)
Cite this article

Software Quality Journal Aims and scope Submit manuscript

Xiaoting Du ORCID: orcid.org/0000-0002-1609-0480¹,
Zenghui Zhou¹,
Beibei Yin¹ &
…
Guanping Xiao¹

869 Accesses
11 Citations
Explore all metrics

Abstract

The prediction of bug types provides useful insights into the software maintenance process. It can improve the efficiency of software testing and help developers adopt corresponding strategies to fix bugs before releasing software projects. Typically, the prediction tasks are performed through machine learning classifiers, which rely heavily on labeled data. However, for a software project that has insufficient labeled data, it is difficult to train the classification model for predicting bug types. Although labeled data of other projects can be used as training data, the results of the cross-project prediction are often poor. To solve this problem, this paper proposes a cross-project bug type prediction framework based on transfer learning. Transfer learning breaks the assumption of traditional machine learning methods that the training set and the test set should follow the same distribution. Our experiments show that the results of cross-project bug type prediction have significant improvement by adopting transfer learning. In addition, we have studied the factors that influence the prediction results, including different pairs of source and target projects, and the number of bug reports in the source project.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An empirical study of factors affecting cross-project aging-related bug prediction with TLAP

Article 16 October 2019

Fangyun Qin, Xiaohui Wan & Beibei Yin

TroBo: A Novel Deep Transfer Model for Enhancing Cross-Project Bug Localization

Cross-Project Defect Prediction: Leveraging Knowledge Transfer for Improved Software Quality Assurance

References

Antoniol, G., Ayari, K., Di Penta, M., Khomh, F., Guéhéneuc, Y.G. (2008). Is it a bug or an enhancement?: a text-based approach to classify change requests. In CASCON (vol. 8, pp. 304–318).
Asadollah, S.A., Sundmark, D., Eldh, S., Hansson, H. (2017). Concurrency bugs in open source software: a case study. Journal of Internet Services and Applications, 8(1), 4.
Article Google Scholar
Bhattacharya, P., & Neamtiu, I. (2011). Bug-fix time prediction models: can we do better? In Proceedings of the 8th Working Conference on Mining Software Repositories (pp. 207–210): ACM.
Blitzer, J., McDonald, R., Pereira, F. (2006). Domain adaptation with structural correspondence learning. In Proceedings of the 2006 conference on empirical methods in natural language processing, Association for Computational Linguistics (pp. 120–128).
Cavezza, D.G., Pietrantuono, R., Alonso, J., Russo, S., Trivedi, K.S. (2014). Reproducibility of environment-dependent software failures: an experience report. In 2014 IEEE 25th International Symposium on Software Reliability Engineering (pp. 267–276): IEEE.
Cotroneo, D., Grottke, M., Natella, R., Pietrantuono, R., Trivedi, K.S. (2013). Fault triggers in open-source software: an experience report. In 2013 IEEE 24Th international symposium on software reliability engineering (ISSRE) (pp. 178–187): IEEE.
Dai, W., Xue, G.R., Yang, Q., Yu, Y. (2007a). Co-clustering based classification for out-of-domain documents. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (pp 210–219): ACM.
Dai Wenyuan, Y.Q., Guirong, X., et al. (2007b). Boosting for transfer learning. In Proceedings of the 24th International Conference on Machine Learning, Corvallis, USA (pp. 193–200).
D’Ambros, M., Lanza, M., Robbes, R. (2010). An extensive comparison of bug prediction approaches. In 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010) (pp 31–41): IEEE.
Du, X., Zheng, Z., Xiao, G., Yin, B. (2017). The automatic classification of fault trigger based bug report. In 2017 IEEE International symposium on software reliability engineering workshops (ISSREW) (pp. 259–265); IEEE.
Feng, Y., Dreef, K., Jones, J.A., van Deursen, A. (2018). Hierarchical abstraction of execution traces for program comprehension. In Proceedings of the 26th Conference on Program Comprehension (pp. 86–96): ACM.
Frattini, F., Pietrantuono, R., Russo, S. (2016). Reproducibility of software bugs. In: Principles of performance and reliability modeling and evaluation (pp. 551–565): Springer.
Freund, Y., & Schapire, R.E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55 (1), 119–139.
Article MathSciNet Google Scholar
Friedman, J.H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189–1232.
Gharibi, G., Alanazi, R., Lee, Y. (2018). Automatic hierarchical clustering of static call graphs for program comprehension. In 2018 IEEE International conference on big data (Big Data) (pp. 4016–4025): IEEE.
Grottke, M., & Trivedi, K.S. (2005). A classification of software faults. Journal of Reliability Engineering Association of Japan, 27(7), 425–438.
Google Scholar
He, Z., Shu, F., Yang, Y., Li, M., Wang, Q. (2012). An investigation on the feasibility of cross-project defect prediction. Automated Software Engineering, 19 (2), 167–199.
Article Google Scholar
Javed, M.Y., Mohsin, H., et al. (2012). An automated approach for software bug classification. In 2012 Sixth International Conference on Complex, Intelligent, and Software Intensive Systems (pp. 414–419): IEEE.
Jing, X., Wu, F., Dong, X., Qi, F., Xu, B. (2015). Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (pp. 496–507): ACM.
Kim, S., Zimmermann, T., Whitehead, Jr E.J., Zeller, A. (2007). Predicting faults from cached history. In IEEE Computer Society Proceedings of the 29th international conference on Software Engineering (pp. 489–498).
Loper, E., & Bird, S. (2002). Nltk: The natural language toolkit. In Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics.
Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013a). Efficient estimation of word representations in vector space. arXiv:http://arXiv.org/abs/13013781.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp 3111–3119).
Nam, J., Pan, S.J., Kim, S. (2013). Transfer defect learning. In 2013 35Th international conference on software engineering (ICSE) (pp. 382–391): IEEE.
Nam, J., & Kim, S. (2015). Heterogeneous defect prediction. In Proceedings of the 2015 10th joint meeting on foundations of software engineering (pp 508–519): ACM.
Padberg, F., Pfaffe, P., Blersch, M. (2013). On mining concurrency defect-related reports from bug repositories. In International Workshop on Mining Unstructured Data (vol 10).
Pan, S.J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.
Article Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011). Scikit-learn: machine learning in python. Journal of Machine Learning Research, 12 (Oct), 2825–2830.
MathSciNet MATH Google Scholar
Pingclasai, N., Hata, H., Ki, M. (2013). Classifying bug reports to bugs and other requests using topic modeling. In 2013 20Th asia-pacific software engineering conference (APSEC) (vol 2, pp 13–18): IEEE.
Plisson, J., Lavrac, N., Mladenic, D., et al. (2004). A rule based approach to word lemmatization. Proceedings of IS-2004 (pp. 83–86).
Qiao, Y., Zheng, Z., Fang, Y., Qin, F., Trivedi, K.S., Cai, K.Y. (2018). Two-level rejuvenation for android smartphones and its optimization. IEEE Transactions on Reliability.
Qin, F., Zheng, Z., Li, X., Qiao, Y., Trivedi, K.S. (2017). An empirical investigation of fault triggers in android operating system. In 2017 IEEE 22Nd pacific rim international symposium on dependable computing (PRDC) (pp. 135–144). IEEE.
Qin, F., Zheng, Z., Qiao, Y., Trivedi, K.S. (2018). Studying aging-related bug prediction using cross-project models. IEEE Transactions on Reliability (99), 1–20.
Raina, R., Ng, A.Y., Koller, D. (2006). Constructing informative priors using transfer learning. In Proceedings of the 23rd international conference on Machine learning (pp. 713–720): ACM.
Silva, C., & Ribeiro, B. (2003). The importance of stop word removal on recall values in text categorization. In Proceedings of the International Joint Conference on Neural Networks, 2003 (vol. 3, pp. 1661–1666): IEEE.
Sokolova, M., Japkowicz, N., Szpakowicz, S. (2006). Beyond accuracy, F-score and roc: a family of discriminant measures for performance evaluation. In Australasian joint conference on artificial intelligence (pp 1015–1021): Springer.
Sui, Y., & Xue, J. (2016). Svf: interprocedural static value-flow analysis in llvm. In Proceedings of the 25th international conference on compiler construction (pp 265–266): ACM.
Sui, Y., & Xue, J. (2018). Value-flow-based demand-driven pointer analysis for C and C++. IEEE Transactions on Software Engineering.
Trivedi, K.S., Mansharamani, R., Kim, D.S., Grottke, M., Nambiar, M. (2011). Recovery from failures due to mandelbugs in it systems. In 2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing (pp. 224–233): IEEE.
Wang, C., & Mahadevan, S. (2011). Heterogeneous domain adaptation using manifold alignment. In Twenty-Second International Joint Conference on Artificial Intelligence.
Weiss, K., Khoshgoftaar, T.M., Wang, D. (2016). A survey of transfer learning. Journal of Big data, 3(1), 9.
Article Google Scholar
Wen, W., Yu, T., Hayes, J.H. (2016). Colua: automatically predicting configuration bug reports and extracting configuration options. In 2016 IEEE 27Th international symposium on software reliability engineering (ISSRE) (pp. 150–161): IEEE.
Wu, F., Jing, X.Y., Sun, Y., Sun, J., Huang, L., Cui, F., Sun, Y. (2018). Cross-project and within-project semisupervised software defect prediction: a unified approach. IEEE Transactions on Reliability, 67(2), 581–597.
Article Google Scholar
Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Philip, S.Y., et al. (2008). Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1–37.
Article Google Scholar
Xia, X., Lo, D., Wang, X., Zhou, B. (2014). Automatic defect categorization based on fault triggering conditions. In 2014 19th International Conference on Engineering of Complex Computer Systems (pp. 39–48): IEEE.
Xiao, G., Zheng, Z., Yin, B., Trivedi, K.S., Du, X., Cai, K. (2017). Experience report: fault triggers in linux operating system: from evolution perspective. In 2017 IEEE 28Th international symposium on software reliability engineering (ISSRE) (pp. 101–111): IEEE.
Xiao, G., Zheng, Z., Jiang, B., Sui, Y. (2019). An empirical study of regression bug chains in linux. IEEE Transactions on Reliability.
Xu, Y., Yin, B., Zheng, Z., Zhang, X., Li, C., Yang, S. (2019). Robustness of spectrum-based fault localisation in environments with labelling perturbations. Journal of Systems and Software, 147, 172–214.
Article Google Scholar
Yang, Q., & Wu, X. (2006). 10 challenging problems in data mining research. International Journal of Information Technology & Decision Making, 5(04), 597–604.
Article Google Scholar
Yang, X., Lo, D., Xia, X., Bao, L., Sun, J. (2016). Combining word embedding with information retrieval to recommend similar bug reports. In 2016 IEEE 27Th international symposium on software reliability engineering (ISSRE) (pp. 127–137): IEEE.
Zadrozny, B., & Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (pp 694–699): ACM.
Zhang, X.Y., Zheng, Z., Cai, K.Y. (2018). Exploring the usefulness of unlabelled test cases in software fault localization. Journal of Systems and Software, 136, 278–290.
Article Google Scholar
Zhou, J.T., Pan, S.J., Tsang, I.W., Yan, Y. (2014). Hybrid heterogeneous transfer learning through deep learning. In Twenty-eighth AAAI conference on artificial intelligence.
Zhou, Y., Tong, Y., Gu, R., Gall, H. (2016). Combining text mining and data mining for bug report classification. Journal of Software: Evolution and Process, 28(3), 150–176.
Google Scholar

Download references

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 61772055 and Grant 61872169, in part by the Technical Foundation Project of Ministry of Industry and Information Technology of China under Grant JSZL2016601B003, and in part by the State Key Laboratory of Software Development Environment under Grant SKLSDE-2018ZX-09.

Author information

Authors and Affiliations

School of Automation Science and Electrical Engineering, Beihang University, Beijing, China
Xiaoting Du, Zenghui Zhou, Beibei Yin & Guanping Xiao

Authors

Xiaoting Du
View author publications
You can also search for this author in PubMed Google Scholar
Zenghui Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Beibei Yin
View author publications
You can also search for this author in PubMed Google Scholar
Guanping Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoting Du.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Du, X., Zhou, Z., Yin, B. et al. Cross-project bug type prediction based on transfer learning. Software Qual J 28, 39–57 (2020). https://doi.org/10.1007/s11219-019-09467-0

Download citation

Published: 16 September 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s11219-019-09467-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-project bug type prediction based on transfer learning

Abstract

Access this article

Similar content being viewed by others

An empirical study of factors affecting cross-project aging-related bug prediction with TLAP

TroBo: A Novel Deep Transfer Model for Enhancing Cross-Project Bug Localization

Cross-Project Defect Prediction: Leveraging Knowledge Transfer for Improved Software Quality Assurance

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cross-project bug type prediction based on transfer learning

Abstract

Access this article

Similar content being viewed by others

An empirical study of factors affecting cross-project aging-related bug prediction with TLAP

TroBo: A Novel Deep Transfer Model for Enhancing Cross-Project Bug Localization

Cross-Project Defect Prediction: Leveraging Knowledge Transfer for Improved Software Quality Assurance

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation