MHCPDP: multi-source heterogeneous cross-project defect prediction via multi-source transfer learning and autoencoder

Wu, Jie; Wu , Yingbo; Niu, Nan; Zhou, Min

doi:10.1007/s11219-021-09553-2

MHCPDP: multi-source heterogeneous cross-project defect prediction via multi-source transfer learning and autoencoder

Published: 27 April 2021

Volume 29, pages 405–430, (2021)
Cite this article

Software Quality Journal Aims and scope Submit manuscript

Jie Wu¹,
Yingbo Wu ORCID: orcid.org/0000-0002-6180-8762¹,
Nan Niu² &
…
Min Zhou¹

853 Accesses
17 Citations
Explore all metrics

Abstract

Heterogeneous cross-project defect prediction (HCPDP) is aimed at building a defect prediction model for the target project by reusing datasets from source projects, where the source project datasets and target project dataset have different features. Most existing HCPDP methods only remove redundant or unrelated features without exploring the underlying features of cross-project datasets. Additionally, when the transfer learning method is used in HCPDP, these methods ignore the negative effect of transfer learning. In this paper, we propose a novel HCPDP method called multi-source heterogeneous cross-project defect prediction (MHCPDP). To reduce the gap between the target datasets and the source datasets, MHCPDP uses the autoencoder to extract the intermediate features from the original datasets instead of simply removing redundant and unrelated features and adopts a modified autoencoder algorithm to make instance selection for eliminating irrelevant instances from the source domain datasets. Furthermore, by incorporating multiple source projects to increase the number of source datasets, MHCPDP develops a multi-source transfer learning algorithm to reduce the impact of negative transfers and upgrade the performance of the classifier. We comprehensively evaluate MHCPDP on five open source datasets; our experimental results show that MHCPDP not only has significant improvement in two performance metrics but also overcomes the shortcomings of the conventional HCPDP methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

Iqbal H. Sarker

A comprehensive literature review of the applications of AI techniques through the lifecycle of industrial equipment

Article Open access 07 December 2023

Mahboob Elahi, Samuel Olaiya Afolaranmi, … Jose Antonio Perez Garcia

A survey of transfer learning

Article Open access 28 May 2016

Karl Weiss, Taghi M. Khoshgoftaar & DingDing Wang

References

Briand, L. C., Melo, W. L., & Wust, J. (2002). Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Transactions on Software Engineering, 28(7), 706–720.
Chen, H., Jing, X. Y., Li, Z., Wu, D., Peng, Y., & Huang, Z. (2020). An empirical study on heterogeneous defect prediction approaches. IEEE Transactions on Software Engineering, pp. 1–1. https://doi.org/10.1109/TSE.2020.2968520
D’Ambros, M., Lanza, M., & Robbes, R. (2010). An extensive comparison of bug prediction approaches. In Proceedings of the Working Conferences on Mining Software Repositories, pp. 31–41.
D’Ambros, M., Lanza, M., & Robbes, R. (2012). Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empirical Software Engineering,17(4), 531-577.
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 1-30.
Du, X., Zhou, Z., Yin, B., & Xiao, G. (2020). Cross-project bug type prediction based on transfer learning. Software Quality Journal, 39-57.
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognit. Lett., 27, 861874.
Google Scholar
Ghotra, B., McIntosh, S., & Hassan, A. E. (2015). Revisiting the impact of classification techniques on the performances of defect prediction model. In Proceedings of the International Conference on Software Engineering, pp. 789–800.
He, Z., Shu, F., Yang, Y., Li, M., & Wang, Q. (2012). An investigation on the feasibility of cross-project defect prediction. Automated Software Engineering, 19(2), 167-199.
Herzig, K., Just, S., Rau, A., & Zeller, A. (2013a). Predicting defects using change genealogies. Proceedings In 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE) pp.370–381.
Herzig, K., Just, S., Rau, A., & Zeller, A. (2013b). Classifying code changes and predicting defects using change genealogie.
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural network. Science, 313, 504–506.
Article MathSciNet Google Scholar
Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression, 398.
Hosseini, Seyedrebvar, Turhan, et al. (2019). A systematic literature review and meta-analysis on cross project defect prediction. IEEE Transactions on Software Engineering, 45, pp. 111–147.
Jiang, Y., Cukic, B., & Ma, Y. (2008). Techniques for evaluating fault prediction models. Empirical Software Engineering,13(5), 561-595.
Jing, X., Wu, F., Dong, X., Qi, F., & Xu, B. (2015). Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, pp.496–507.
Jureczko, M., & Madeyski, L. (2010). Towards identifying software project clusters with regard to defect prediction. In Proceedings of the International Conference on Predictive Models in Software Engineering, pp. 1–10.
Knapp, T. R. (1978). Canonical correlation analysis: A general parametric significance-testing system. Psychological Bulletin, 85, 410.
Article Google Scholar
Lessmann, S., Baseens, B., Mues, C., & Pietsch, S. (2008). Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Transactions on Software Engineering, 34, pp. 485–496.
Lee, T., Nam, J., Han, D., Kim, S., & In, H. P. (2011). Micro interaction metrics for defect prediction. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering (pp. 311-321).
Li, Z., Jing, X. Y., Wu, F., et al. (2017). Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction [J]. Automated Software Engineering.
Li, Z., Jing, X. Y., Zhu, X., et al. (2017) On the multiple sources and privacy preservation issues for heterogeneous defect prediction [J]. IEEE Transactions on Software Engineering, 1–1.
Li, Z., Jing, X. Y., Zhu, X., et al. (2019). Heterogeneous defect prediction with two-stage ensemble learning [J]. Automated Software Engineering, 26(3), 599–651.
Article Google Scholar
Ma, Y., Luo, G., Zeng, X., & Chen, A. (2012). Transfer learning for cross-company software defect prediction. Information and Software Technology, 54(3), 248-256.
Menzies, T., Greenwald, J., & Frank, A. (2006). Data mining static code attributes to learn defect predictors. IEEE transactions on software engineering, 33(1), 2-13.
Menzies, T., Greenwald, J., & Frank, A. (2007). Data Mining Static Code Attributes to Learn Defect Predictors. IEEE Transactions on Software Engineering, 33, pp 2–13.
Nam, J., Fu, W., Kim, S., Menzies, T., & Tan, L. (2015). Heterogeneous defect prediction. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering 508–519: ACM.
Nam, J., & Kim, S. (2015). CLAML: Defect prediction on unlabeled datasets. In Proceedings of the International Conference on Automated Software Engineering, pp. 452–463.
Nam, J., Pan, S. J., & Kim, S. (2013). Transfer defect learning. Proceedings International Conference on Software Engineering Transfer defect learning. Proceedings International Conference on Software Engineering, pp. 382-391.
Pan, S. J., Tsang, I. W., Kwork, J. T., & Yang, Q. (2011). Domain adaption via transfer component analysis. IEEE Transactions on Neural Networks, 22, 199–210.
Article Google Scholar
Pan., S. J., & Yang., Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22, pp. 1345–1359.
Peters, F., & Menzies, T. (2012). Privacy and utility for defect prediction: Experiments with morph. In 2012 34th International Conference on Software Engineering (ICSE) pp. 189-199. IEEE.
Pingclasai, N., Hata, H., & Matsumoto, K. I. (2013). Classifying code changes and predicting defects using change genealogies. Proceedings 20th Asia-Pacific Software Engineering Conference (APSEC), pp. 13–18.
Rahman, F., Posnett, D., & Devanbu, P. (2012). Recalling the" imprecision" of cross-project defect prediction. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (pp. 1-11).
Tan, B. Y. et al. (2017) Distant domain transfer learning, In 2013 35th International Conference on Software Engineering (ICSE) pp. 382–391: IEEE.
Tantithamthavorn, C., McIntosh, S., Hassan, A. E., Ihara, A., & Matsumoto, K. (2015). The impact of mislabelling on the performance and interpretation of defect prediction models. Proc IEEE International Conference on Software Engineering (ICSE), pp. 812–823.
Turhan, B., Menzies, T., Bener, A. B., & Di Stefano, J. (2008). On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering, 14(5), 540-578.
Turhan, B., Menzies, T., Bener, A. B., & Di Stefano, J. (2009). On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering, 14(5), 540-578.
Wu, R., Zhang, H., Kim, S., & Cheung, S. C. (2011). Relink: Recovering links between bugs and changes. In Proceedings of the Joint Meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering Szeged, pp. 2–13.
Xu, Z., Xuan, J., Liu, J., & Cui, X. (2016). MICHAC: Defect prediction via feature selection based on maximal information coefficient with hierarchical agglomerative clustering. Proceedings IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).
Xu, Z., Yuan, P., & Zhang, T. (2018). HDA: Cross-project defect prediction via heterogeneous domain adaptation with dictionary learning. IEEE Access, 6, 57597–65761.
Article Google Scholar
Yao, Y., & Doretto, G. (2010). Boosting for transfer learning with multiple sources. IEEE conference on computer vision and pattern recognition, pp. 1855–1862.
Zhang, F. Z., Luo P., He Q., & Shi Z. Z. (2015). Survey on transfer, pp.26–39.
Zimmermann, T., Nagappan, N., Gall, H., Giger, E., & Murphy, B. (2009). Cross-project defect prediction: A large scale experiment on data vs. domain vs. process. Proceeding of the 7th Joint Meeting European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 91–100.

Download references

Funding

This work was supported in part by National Key Research and Development Project under grant 2019YFB1706101 and in part by the Science-Technology Foundation of Chongqing, China, under grant cstc2019jscx-mbdx0083.

Author information

Authors and Affiliations

School of Software Engineering, Chongqing University, Chongqing, China
Jie Wu, Yingbo Wu & Min Zhou
Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, USA
Nan Niu

Authors

Jie Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yingbo Wu
View author publications
You can also search for this author in PubMed Google Scholar
Nan Niu
View author publications
You can also search for this author in PubMed Google Scholar
Min Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yingbo Wu .

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

We provide the datasets and the source code of the proposed approach that are used to conduct this study at https://github.com/SE-CQU/sdp.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, J., Wu , Y., Niu, N. et al. MHCPDP: multi-source heterogeneous cross-project defect prediction via multi-source transfer learning and autoencoder. Software Qual J 29, 405–430 (2021). https://doi.org/10.1007/s11219-021-09553-2

Download citation

Accepted: 11 March 2021
Published: 27 April 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11219-021-09553-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MHCPDP: multi-source heterogeneous cross-project defect prediction via multi-source transfer learning and autoencoder

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

A comprehensive literature review of the applications of AI techniques through the lifecycle of industrial equipment

A survey of transfer learning

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

A comprehensive literature review of the applications of AI techniques through the lifecycle of industrial equipment

A survey of transfer learning

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation