skip to main content
10.1145/3568562.3568587acmotherconferencesArticle/Chapter ViewAbstractPublication PagessoictConference Proceedingsconference-collections
research-article

Combining Deep Learning and Kernel PCA for Software Defect Prediction

Authors Info & Claims
Published:01 December 2022Publication History

ABSTRACT

Software defect prediction aims to automatically determine the most likely location of defective program elements (i.e., statement, method, class, module etc.). Previous studies for software defect prediction mainly focus on exploring designing features such as source code complexity, object oriented design metrics etc. to classify program elements into two categories: (i) defective and (ii) non-defective. Although these approaches have obtained promising results, there exists two significant challenges in this research field: (i) removing irrelevant and redundant information from designing structures ; (ii) reducing the impact of skewed data distribution on learning models. In this paper, we aim to address these two issues by firstly applying kernel PCA to extract essential information from designing features and secondly proposing a deep neural network model which investigates the non-linear relationship among features. In order to mitigate the class imbalance, we apply a weighted loss function combined with a bootstrapping method to handle batch training mechanism of our model. We conducted some experiments to assess the performance of our proposed approach over NASA (with 10 projects) and PROMISE (with 34 projects) datasets. In order to leverage the efficiency of kernel PCA technique in software defect prediction, we compared it to some traditional feature selection approaches over a high-dimensional dataset ECLIPSE. The empirical results showed that our proposed method has outperformed these other state-of-the-art models by effectively predicting defective source files.

References

  1. Wasif Afzal and Richard Torkar. 2016. Towards benchmarking feature subset selection methods for software fault prediction. In Computational intelligence and quantitative software engineering. Springer, 33–58.Google ScholarGoogle Scholar
  2. Bui Thi Mai Anh and Nguyen Viet Luyen. 2021. An Imbalanced Deep Learning Model for Bug Localization. In 2021 28th Asia-Pacific Software Engineering Conference Workshops (APSEC Workshops). IEEE, 32–40.Google ScholarGoogle Scholar
  3. Ömer Faruk Arar and Kürşat Ayan. 2015. Software defect prediction using cost-sensitive neural network. Applied Soft Computing 33 (2015), 263–277.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Girish Chandrashekar and Ferat Sahin. 2014. A survey on feature selection methods. Computers & Electrical Engineering 40, 1 (2014), 16–28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Laila Cheikhi and Alain Abran. 2013. PROMISE and ISBSG Software Engineering data repositories: A survey. In 2013 Joint Conference of the 23rd International Workshop on Software Measurement and the 8th International Conference on Software Process and Product Measurement. IEEE, 17–24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Deyu Chen, Xiang Chen, Hao Li, Junfeng Xie, and Yanzhou Mu. 2019. Deepcpdp: Deep learning based cross-project defect prediction. IEEE Access 7(2019), 184832–184848.Google ScholarGoogle ScholarCross RefCross Ref
  7. Shyam R Chidamber and Chris F Kemerer. 1994. A metrics suite for object oriented design. IEEE Transactions on software engineering 20, 6 (1994), 476–493.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Radoslaw M Cichy and Daniel Kaiser. 2019. Deep neural networks as scientific models. Trends in cognitive sciences 23, 4 (2019), 305–317.Google ScholarGoogle Scholar
  9. Baljinder Ghotra, Shane McIntosh, and Ahmed E Hassan. 2015. Revisiting the impact of classification techniques on the performance of defect prediction models. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. IEEE, 789–800.Google ScholarGoogle ScholarCross RefCross Ref
  10. Somya Goyal. 2022. Effective software defect prediction using support vector machines (SVMs). International Journal of System Assurance Engineering and Management 13, 2(2022), 681–696.Google ScholarGoogle Scholar
  11. Maurice H Halstead. 1977. Elements of Software Science (Operating and programming systems series). Elsevier Science Inc.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dyana Rashid Ibrahim, Rawan Ghnemat, and Amjad Hudaib. 2017. Software defect prediction using feature selection and random forest algorithm. In 2017 International Conference on New Trends in Computing Sciences (ICTCS). IEEE, 252–257.Google ScholarGoogle ScholarCross RefCross Ref
  13. Patrick Knab, Martin Pinzger, and Abraham Bernstein. 2006. Predicting defect densities in source code files with decision tree learners. In Proceedings of the 2006 international workshop on Mining software repositories. 119–125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P Trevino, Jiliang Tang, and Huan Liu. 2017. Feature selection: A data perspective. ACM computing surveys (CSUR) 50, 6 (2017), 1–45.Google ScholarGoogle Scholar
  15. Jian Li, Pinjia He, Jieming Zhu, and Michael R Lyu. 2017. Software defect prediction via convolutional neural network. In 2017 IEEE international conference on software quality, reliability and security (QRS). IEEE, 318–328.Google ScholarGoogle ScholarCross RefCross Ref
  16. P Lingden, Abeer Alsadoon, PWC Prasad, Omar Hisham Alsadoon, Rasha S Ali, and Vinh Tran Quoc Nguyen. 2019. A novel modified undersampling (MUS) technique for software defect prediction. Computational Intelligence 35, 4 (2019), 1003–1020.Google ScholarGoogle ScholarCross RefCross Ref
  17. Ruchika Malhotra. 2015. A systematic review of machine learning techniques for software fault prediction. Applied Soft Computing 27 (2015), 504–518.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Thomas J McCabe. 1976. A complexity measure. IEEE Transactions on software Engineering4 (1976), 308–320.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mohammad Mahdi NezhadShokouhi, Mohammad Ali Majidi, and Abbas Rasoolzadegan. 2020. Software defect prediction using over-sampling and feature extraction based on Mahalanobis distance. The Journal of Supercomputing 76, 1 (2020), 602–635.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Cong Pan, Minyan Lu, Biao Xu, and Houleng Gao. 2019. An improved CNN model for within-project software defect prediction. Applied Sciences 9, 10 (2019), 2138.Google ScholarGoogle ScholarCross RefCross Ref
  21. Xiaotao Rong, Feixiang Li, and Zhihua Cui. 2016. A model for software defect prediction using support vector machine based on CBA. International Journal of Intelligent Systems Technologies and Applications 15, 1(2016), 19–34.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Martin Shepperd, Qinbao Song, Zhongbin Sun, and Carolyn Mair. 2013. Data quality: Some comments on the nasa software defect datasets. IEEE Transactions on Software Engineering 39, 9 (2013), 1208–1215.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Qinbao Song, Yuchen Guo, and Martin Shepperd. 2018. A comprehensive investigation of the role of imbalanced learning for software defect prediction. IEEE Transactions on Software Engineering 45, 12 (2018), 1253–1269.Google ScholarGoogle ScholarCross RefCross Ref
  24. Fei Wu, Xiao-Yuan Jing, Ying Sun, Jing Sun, Lin Huang, Fangyi Cui, and Yanfei Sun. 2018. Cross-project and within-project semisupervised software defect prediction: A unified approach. IEEE Transactions on Reliability 67, 2 (2018), 581–597.Google ScholarGoogle ScholarCross RefCross Ref
  25. Zhou Xu, Jin Liu, Xiapu Luo, Zijiang Yang, Yifeng Zhang, Peipei Yuan, Yutian Tang, and Tao Zhang. 2019. Software defect prediction based on kernel PCA and weighted extreme learning machine. Information and Software Technology 106 (2019), 182–200.Google ScholarGoogle ScholarCross RefCross Ref
  26. Zhou Xu, Jin Liu, Zijiang Yang, Gege An, and Xiangyang Jia. 2016. The impact of feature selection on defect prediction performance: An empirical comparison. In 2016 IEEE 27th international symposium on software reliability engineering (ISSRE). IEEE, 309–320.Google ScholarGoogle ScholarCross RefCross Ref
  27. Thomas Zimmermann, Rahul Premraj, and Andreas Zeller. 2007. Predicting defects for eclipse. In Third International Workshop on Predictor Models in Software Engineering (PROMISE’07: ICSE Workshops 2007). IEEE, 9–9.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Combining Deep Learning and Kernel PCA for Software Defect Prediction

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      SoICT '22: Proceedings of the 11th International Symposium on Information and Communication Technology
      December 2022
      474 pages
      ISBN:9781450397254
      DOI:10.1145/3568562

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 December 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate147of318submissions,46%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format