ABSTRACT
Software defect prediction aims to automatically determine the most likely location of defective program elements (i.e., statement, method, class, module etc.). Previous studies for software defect prediction mainly focus on exploring designing features such as source code complexity, object oriented design metrics etc. to classify program elements into two categories: (i) defective and (ii) non-defective. Although these approaches have obtained promising results, there exists two significant challenges in this research field: (i) removing irrelevant and redundant information from designing structures ; (ii) reducing the impact of skewed data distribution on learning models. In this paper, we aim to address these two issues by firstly applying kernel PCA to extract essential information from designing features and secondly proposing a deep neural network model which investigates the non-linear relationship among features. In order to mitigate the class imbalance, we apply a weighted loss function combined with a bootstrapping method to handle batch training mechanism of our model. We conducted some experiments to assess the performance of our proposed approach over NASA (with 10 projects) and PROMISE (with 34 projects) datasets. In order to leverage the efficiency of kernel PCA technique in software defect prediction, we compared it to some traditional feature selection approaches over a high-dimensional dataset ECLIPSE. The empirical results showed that our proposed method has outperformed these other state-of-the-art models by effectively predicting defective source files.
- Wasif Afzal and Richard Torkar. 2016. Towards benchmarking feature subset selection methods for software fault prediction. In Computational intelligence and quantitative software engineering. Springer, 33–58.Google Scholar
- Bui Thi Mai Anh and Nguyen Viet Luyen. 2021. An Imbalanced Deep Learning Model for Bug Localization. In 2021 28th Asia-Pacific Software Engineering Conference Workshops (APSEC Workshops). IEEE, 32–40.Google Scholar
- Ömer Faruk Arar and Kürşat Ayan. 2015. Software defect prediction using cost-sensitive neural network. Applied Soft Computing 33 (2015), 263–277.Google ScholarDigital Library
- Girish Chandrashekar and Ferat Sahin. 2014. A survey on feature selection methods. Computers & Electrical Engineering 40, 1 (2014), 16–28.Google ScholarDigital Library
- Laila Cheikhi and Alain Abran. 2013. PROMISE and ISBSG Software Engineering data repositories: A survey. In 2013 Joint Conference of the 23rd International Workshop on Software Measurement and the 8th International Conference on Software Process and Product Measurement. IEEE, 17–24.Google ScholarDigital Library
- Deyu Chen, Xiang Chen, Hao Li, Junfeng Xie, and Yanzhou Mu. 2019. Deepcpdp: Deep learning based cross-project defect prediction. IEEE Access 7(2019), 184832–184848.Google ScholarCross Ref
- Shyam R Chidamber and Chris F Kemerer. 1994. A metrics suite for object oriented design. IEEE Transactions on software engineering 20, 6 (1994), 476–493.Google ScholarDigital Library
- Radoslaw M Cichy and Daniel Kaiser. 2019. Deep neural networks as scientific models. Trends in cognitive sciences 23, 4 (2019), 305–317.Google Scholar
- Baljinder Ghotra, Shane McIntosh, and Ahmed E Hassan. 2015. Revisiting the impact of classification techniques on the performance of defect prediction models. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. IEEE, 789–800.Google ScholarCross Ref
- Somya Goyal. 2022. Effective software defect prediction using support vector machines (SVMs). International Journal of System Assurance Engineering and Management 13, 2(2022), 681–696.Google Scholar
- Maurice H Halstead. 1977. Elements of Software Science (Operating and programming systems series). Elsevier Science Inc.Google ScholarDigital Library
- Dyana Rashid Ibrahim, Rawan Ghnemat, and Amjad Hudaib. 2017. Software defect prediction using feature selection and random forest algorithm. In 2017 International Conference on New Trends in Computing Sciences (ICTCS). IEEE, 252–257.Google ScholarCross Ref
- Patrick Knab, Martin Pinzger, and Abraham Bernstein. 2006. Predicting defect densities in source code files with decision tree learners. In Proceedings of the 2006 international workshop on Mining software repositories. 119–125.Google ScholarDigital Library
- Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P Trevino, Jiliang Tang, and Huan Liu. 2017. Feature selection: A data perspective. ACM computing surveys (CSUR) 50, 6 (2017), 1–45.Google Scholar
- Jian Li, Pinjia He, Jieming Zhu, and Michael R Lyu. 2017. Software defect prediction via convolutional neural network. In 2017 IEEE international conference on software quality, reliability and security (QRS). IEEE, 318–328.Google ScholarCross Ref
- P Lingden, Abeer Alsadoon, PWC Prasad, Omar Hisham Alsadoon, Rasha S Ali, and Vinh Tran Quoc Nguyen. 2019. A novel modified undersampling (MUS) technique for software defect prediction. Computational Intelligence 35, 4 (2019), 1003–1020.Google ScholarCross Ref
- Ruchika Malhotra. 2015. A systematic review of machine learning techniques for software fault prediction. Applied Soft Computing 27 (2015), 504–518.Google ScholarDigital Library
- Thomas J McCabe. 1976. A complexity measure. IEEE Transactions on software Engineering4 (1976), 308–320.Google ScholarDigital Library
- Mohammad Mahdi NezhadShokouhi, Mohammad Ali Majidi, and Abbas Rasoolzadegan. 2020. Software defect prediction using over-sampling and feature extraction based on Mahalanobis distance. The Journal of Supercomputing 76, 1 (2020), 602–635.Google ScholarDigital Library
- Cong Pan, Minyan Lu, Biao Xu, and Houleng Gao. 2019. An improved CNN model for within-project software defect prediction. Applied Sciences 9, 10 (2019), 2138.Google ScholarCross Ref
- Xiaotao Rong, Feixiang Li, and Zhihua Cui. 2016. A model for software defect prediction using support vector machine based on CBA. International Journal of Intelligent Systems Technologies and Applications 15, 1(2016), 19–34.Google ScholarDigital Library
- Martin Shepperd, Qinbao Song, Zhongbin Sun, and Carolyn Mair. 2013. Data quality: Some comments on the nasa software defect datasets. IEEE Transactions on Software Engineering 39, 9 (2013), 1208–1215.Google ScholarDigital Library
- Qinbao Song, Yuchen Guo, and Martin Shepperd. 2018. A comprehensive investigation of the role of imbalanced learning for software defect prediction. IEEE Transactions on Software Engineering 45, 12 (2018), 1253–1269.Google ScholarCross Ref
- Fei Wu, Xiao-Yuan Jing, Ying Sun, Jing Sun, Lin Huang, Fangyi Cui, and Yanfei Sun. 2018. Cross-project and within-project semisupervised software defect prediction: A unified approach. IEEE Transactions on Reliability 67, 2 (2018), 581–597.Google ScholarCross Ref
- Zhou Xu, Jin Liu, Xiapu Luo, Zijiang Yang, Yifeng Zhang, Peipei Yuan, Yutian Tang, and Tao Zhang. 2019. Software defect prediction based on kernel PCA and weighted extreme learning machine. Information and Software Technology 106 (2019), 182–200.Google ScholarCross Ref
- Zhou Xu, Jin Liu, Zijiang Yang, Gege An, and Xiangyang Jia. 2016. The impact of feature selection on defect prediction performance: An empirical comparison. In 2016 IEEE 27th international symposium on software reliability engineering (ISSRE). IEEE, 309–320.Google ScholarCross Ref
- Thomas Zimmermann, Rahul Premraj, and Andreas Zeller. 2007. Predicting defects for eclipse. In Third International Workshop on Predictor Models in Software Engineering (PROMISE’07: ICSE Workshops 2007). IEEE, 9–9.Google ScholarDigital Library
Index Terms
- Combining Deep Learning and Kernel PCA for Software Defect Prediction
Recommendations
Software visualization and deep transfer learning for effective software defect prediction
ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software EngineeringSoftware defect prediction aims to automatically locate defective code modules to better focus testing resources and human effort. Typically, software defect prediction pipelines are comprised of two parts: the first extracts program features, like ...
Efficient deep feature selection for remote sensing image recognition with fused deep learning architectures
AbstractConvolutional neural networks (CNNs) have recently emerged as a popular topic for machine learning in various academic and industrial fields. It is often an important problem to obtain a dataset with an appropriate size for CNN training. However, ...
The impact of feature reduction techniques on defect prediction models
Defect prediction is an important task for preserving software quality. Most prior work on defect prediction uses software features, such as the number of lines of code, to predict whether a file or commit will be defective in the future. There are ...
Comments