research-article

Combining Deep Learning and Kernel PCA for Software Defect Prediction

Authors:
Anh Ho

School of Information and Communication Technology, Hanoi University of Science and Technology, Viet Nam

School of Information and Communication Technology, Hanoi University of Science and Technology, Viet Nam

0000-0001-7483-7119
View Profile

,
Nguyen Nhat Hai

School of Information and Communication Technology, Hanoi University of Science and Technology, Viet Nam

School of Information and Communication Technology, Hanoi University of Science and Technology, Viet Nam

0000-0002-7724-3612
View Profile

,
Bui Thi-Mai-Anh

School of Information and Communication Technology, Hanoi University of Science and Technology, Viet Nam

School of Information and Communication Technology, Hanoi University of Science and Technology, Viet Nam

0000-0001-7877-9438
View Profile

SoICT '22: Proceedings of the 11th International Symposium on Information and Communication TechnologyDecember 2022Pages 360–367https://doi.org/10.1145/3568562.3568587

Published:01 December 2022Publication History

SoICT '22: Proceedings of the 11th International Symposium on Information and Communication Technology

Pages 360–367

ABSTRACT

Software defect prediction aims to automatically determine the most likely location of defective program elements (i.e., statement, method, class, module etc.). Previous studies for software defect prediction mainly focus on exploring designing features such as source code complexity, object oriented design metrics etc. to classify program elements into two categories: (i) defective and (ii) non-defective. Although these approaches have obtained promising results, there exists two significant challenges in this research field: (i) removing irrelevant and redundant information from designing structures ; (ii) reducing the impact of skewed data distribution on learning models. In this paper, we aim to address these two issues by firstly applying kernel PCA to extract essential information from designing features and secondly proposing a deep neural network model which investigates the non-linear relationship among features. In order to mitigate the class imbalance, we apply a weighted loss function combined with a bootstrapping method to handle batch training mechanism of our model. We conducted some experiments to assess the performance of our proposed approach over NASA (with 10 projects) and PROMISE (with 34 projects) datasets. In order to leverage the efficiency of kernel PCA technique in software defect prediction, we compared it to some traditional feature selection approaches over a high-dimensional dataset ECLIPSE. The empirical results showed that our proposed method has outperformed these other state-of-the-art models by effectively predicting defective source files.

References

Wasif Afzal and Richard Torkar. 2016. Towards benchmarking feature subset selection methods for software fault prediction. In Computational intelligence and quantitative software engineering. Springer, 33–58.Google Scholar
Bui Thi Mai Anh and Nguyen Viet Luyen. 2021. An Imbalanced Deep Learning Model for Bug Localization. In 2021 28th Asia-Pacific Software Engineering Conference Workshops (APSEC Workshops). IEEE, 32–40.Google Scholar
Ömer Faruk Arar and Kürşat Ayan. 2015. Software defect prediction using cost-sensitive neural network. Applied Soft Computing 33 (2015), 263–277.Google ScholarDigital Library
Girish Chandrashekar and Ferat Sahin. 2014. A survey on feature selection methods. Computers & Electrical Engineering 40, 1 (2014), 16–28.Google ScholarDigital Library
Laila Cheikhi and Alain Abran. 2013. PROMISE and ISBSG Software Engineering data repositories: A survey. In 2013 Joint Conference of the 23rd International Workshop on Software Measurement and the 8th International Conference on Software Process and Product Measurement. IEEE, 17–24.Google ScholarDigital Library
Deyu Chen, Xiang Chen, Hao Li, Junfeng Xie, and Yanzhou Mu. 2019. Deepcpdp: Deep learning based cross-project defect prediction. IEEE Access 7(2019), 184832–184848.Google ScholarCross Ref
Shyam R Chidamber and Chris F Kemerer. 1994. A metrics suite for object oriented design. IEEE Transactions on software engineering 20, 6 (1994), 476–493.Google ScholarDigital Library
Radoslaw M Cichy and Daniel Kaiser. 2019. Deep neural networks as scientific models. Trends in cognitive sciences 23, 4 (2019), 305–317.Google Scholar
Baljinder Ghotra, Shane McIntosh, and Ahmed E Hassan. 2015. Revisiting the impact of classification techniques on the performance of defect prediction models. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. IEEE, 789–800.Google ScholarCross Ref
Somya Goyal. 2022. Effective software defect prediction using support vector machines (SVMs). International Journal of System Assurance Engineering and Management 13, 2(2022), 681–696.Google Scholar
Maurice H Halstead. 1977. Elements of Software Science (Operating and programming systems series). Elsevier Science Inc.Google ScholarDigital Library
Dyana Rashid Ibrahim, Rawan Ghnemat, and Amjad Hudaib. 2017. Software defect prediction using feature selection and random forest algorithm. In 2017 International Conference on New Trends in Computing Sciences (ICTCS). IEEE, 252–257.Google ScholarCross Ref
Patrick Knab, Martin Pinzger, and Abraham Bernstein. 2006. Predicting defect densities in source code files with decision tree learners. In Proceedings of the 2006 international workshop on Mining software repositories. 119–125.Google ScholarDigital Library
Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P Trevino, Jiliang Tang, and Huan Liu. 2017. Feature selection: A data perspective. ACM computing surveys (CSUR) 50, 6 (2017), 1–45.Google Scholar
Jian Li, Pinjia He, Jieming Zhu, and Michael R Lyu. 2017. Software defect prediction via convolutional neural network. In 2017 IEEE international conference on software quality, reliability and security (QRS). IEEE, 318–328.Google ScholarCross Ref
P Lingden, Abeer Alsadoon, PWC Prasad, Omar Hisham Alsadoon, Rasha S Ali, and Vinh Tran Quoc Nguyen. 2019. A novel modified undersampling (MUS) technique for software defect prediction. Computational Intelligence 35, 4 (2019), 1003–1020.Google ScholarCross Ref
Ruchika Malhotra. 2015. A systematic review of machine learning techniques for software fault prediction. Applied Soft Computing 27 (2015), 504–518.Google ScholarDigital Library
Thomas J McCabe. 1976. A complexity measure. IEEE Transactions on software Engineering4 (1976), 308–320.Google ScholarDigital Library
Mohammad Mahdi NezhadShokouhi, Mohammad Ali Majidi, and Abbas Rasoolzadegan. 2020. Software defect prediction using over-sampling and feature extraction based on Mahalanobis distance. The Journal of Supercomputing 76, 1 (2020), 602–635.Google ScholarDigital Library
Cong Pan, Minyan Lu, Biao Xu, and Houleng Gao. 2019. An improved CNN model for within-project software defect prediction. Applied Sciences 9, 10 (2019), 2138.Google ScholarCross Ref
Xiaotao Rong, Feixiang Li, and Zhihua Cui. 2016. A model for software defect prediction using support vector machine based on CBA. International Journal of Intelligent Systems Technologies and Applications 15, 1(2016), 19–34.Google ScholarDigital Library
Martin Shepperd, Qinbao Song, Zhongbin Sun, and Carolyn Mair. 2013. Data quality: Some comments on the nasa software defect datasets. IEEE Transactions on Software Engineering 39, 9 (2013), 1208–1215.Google ScholarDigital Library
Qinbao Song, Yuchen Guo, and Martin Shepperd. 2018. A comprehensive investigation of the role of imbalanced learning for software defect prediction. IEEE Transactions on Software Engineering 45, 12 (2018), 1253–1269.Google ScholarCross Ref
Fei Wu, Xiao-Yuan Jing, Ying Sun, Jing Sun, Lin Huang, Fangyi Cui, and Yanfei Sun. 2018. Cross-project and within-project semisupervised software defect prediction: A unified approach. IEEE Transactions on Reliability 67, 2 (2018), 581–597.Google ScholarCross Ref
Zhou Xu, Jin Liu, Xiapu Luo, Zijiang Yang, Yifeng Zhang, Peipei Yuan, Yutian Tang, and Tao Zhang. 2019. Software defect prediction based on kernel PCA and weighted extreme learning machine. Information and Software Technology 106 (2019), 182–200.Google ScholarCross Ref
Zhou Xu, Jin Liu, Zijiang Yang, Gege An, and Xiangyang Jia. 2016. The impact of feature selection on defect prediction performance: An empirical comparison. In 2016 IEEE 27th international symposium on software reliability engineering (ISSRE). IEEE, 309–320.Google ScholarCross Ref
Thomas Zimmermann, Rahul Premraj, and Andreas Zeller. 2007. Predicting defects for eclipse. In Third International Workshop on Predictor Models in Software Engineering (PROMISE’07: ICSE Workshops 2007). IEEE, 9–9.Google ScholarDigital Library

Index Terms

Combining Deep Learning and Kernel PCA for Software Defect Prediction
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis

Recommendations

Software visualization and deep transfer learning for effective software defect prediction
ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering

Software defect prediction aims to automatically locate defective code modules to better focus testing resources and human effort. Typically, software defect prediction pipelines are comprised of two parts: the first extracts program features, like ...
Read More
Efficient deep feature selection for remote sensing image recognition with fused deep learning architectures
Abstract
Convolutional neural networks (CNNs) have recently emerged as a popular topic for machine learning in various academic and industrial fields. It is often an important problem to obtain a dataset with an appropriate size for CNN training. However, ...
Read More
The impact of feature reduction techniques on defect prediction models

Defect prediction is an important task for preserving software quality. Most prior work on defect prediction uses software features, such as the number of lines of code, to predict whether a file or commit will be defective in the future. There are ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

SoICT '22: Proceedings of the 11th International Symposium on Information and Communication Technology
December 2022
474 pages
ISBN:9781450397254
DOI:10.1145/3568562

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 December 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep neural network
feature reduction
kernel PCA
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate147of318submissions,46%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 76
  Total Downloads
- Downloads (Last 12 months)44
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Combining Deep Learning and Kernel PCA for Software Defect Prediction

SoICT '22: Proceedings of the 11th International Symposium on Information and Communication Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Software visualization and deep transfer learning for effective software defect prediction

Efficient deep feature selection for remote sensing image recognition with fused deep learning architectures

The impact of feature reduction techniques on defect prediction models

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Combining Deep Learning and Kernel PCA for Software Defect Prediction

SoICT '22: Proceedings of the 11th International Symposium on Information and Communication Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

Software visualization and deep transfer learning for effective software defect prediction

Efficient deep feature selection for remote sensing image recognition with fused deep learning architectures

The impact of feature reduction techniques on defect prediction models

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media