Skip to main content

The Effect of Class Noise on Continuous Test Case Selection: A Controlled Experiment on Industrial Data

  • Conference paper
  • First Online:
Product-Focused Software Process Improvement (PROFES 2020)

Abstract

Continuous integration and testing produce a large amount of data about defects in code revisions, which can be utilized for training a predictive learner to effectively select a subset of test suites. One challenge in using predictive learners lies in the noise that comes in the training data, which often leads to a decrease in classification performances. This study examines the impact of one type of noise, called class noise, on a learner’s ability for selecting test cases. Understanding the impact of class noise on the performance of a learner for test case selection would assist testers decide on the appropriateness of different noise handling strategies. For this purpose, we design and implement a controlled experiment using an industrial data-set to measure the impact of class noise at six different levels on the predictive performance of a learner. We measure the learning performance using the Precision, Recall, F-score, and Mathew Correlation Coefficient (MCC) metrics. The results show a statistically significant relationship between class noise and the learner’s performance for test case selection. Particularly, a significant difference between the three performance measures (Precision, F-score, and MCC) under all the six noise levels and at 0% level was found, whereas a similar relationship between recall and class noise was found at a level above 30%. We conclude that higher class noise ratios lead to missing out more tests in the predicted subset of test suite and increases the rate of false alarms when the class noise ratio exceeds 30%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Due to non-disclosure agreements with our industrial partner, our data-set can unfortunately not be made public for replication.

  2. 2.

    https://github.com/khaledwalidsabbagh/noise_free_set.git.

References

  1. Abellán, J., Masegosa, A.R.: Bagging decision trees on data sets with classification noise. In: Link, S., Prade, H. (eds.) FoIKS 2010. LNCS, vol. 5956, pp. 248–265. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11829-6_17

    Chapter  Google Scholar 

  2. Al-Sabbagh, K.W., Staron, M., Hebig, R., Meding, W.: Predicting test case verdicts using textual analysis of committed code churns. In: Joint Proceedings of the International Workshop on Software Measurement and the International Conference on Software Process and Product Measurement (IWSM Mensura 2019), vol. 2476, pp. 138–153 (2019)

    Google Scholar 

  3. Aversano, L., Cerulo, L., Del Grosso, C.: Learning from bug-introducing changes to prevent fault prone code. In: Ninth International Workshop on Principles of Software Evolution: In Conjunction with the 6th ESEC/FSE Joint Meeting, pp. 19–26. ACM (2007)

    Google Scholar 

  4. Boughorbel, S., Jarray, F., El-Anbari, M.: Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PloS one 12(6), e0177678 (2017)

    Google Scholar 

  5. Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2013)

    Article  Google Scholar 

  6. Gamberger, D., Lavrac, N., Dzeroski, S.: Noise detection and elimination in data preprocessing: experiments in medical domains. Appl. Artif. Intell. 14(2), 205–223 (2000)

    Article  Google Scholar 

  7. Guan, D., Yuan, W., Shen, L.: Class noise detection by multiple voting. In: 2013 Ninth International Conference on Natural Computation (ICNC), pp. 906–911. IEEE (2013)

    Google Scholar 

  8. Hata, H., Mizuno, O., Kikuno, T.: Fault-prone module detection using large-scale text features based on spam filtering. Empir. Softw. Eng. 15(2), 147–165 (2010). https://doi.org/10.1007/s10664-009-9117-9

    Article  Google Scholar 

  9. John, G.H.: Robust decision trees: removing outliers from databases. KDD 95, 174–179 (1995)

    Google Scholar 

  10. Kim, S., Whitehead Jr., E.J., Zhang, Y.: Classifying software changes: clean or buggy? IEEE Trans. Softw. Eng. 34(2), 181–196 (2008)

    Article  Google Scholar 

  11. Mizuno, O., Ikami, S., Nakaichi, S., Kikuno, T.: Spam filter based approach for finding fault-prone software modules. In: Proceedings of the Fourth International Workshop on Mining Software Repositories, p. 4. IEEE Computer Society (2007)

    Google Scholar 

  12. Nettleton, D.F., Orriols-Puig, A., Fornells, A.: A study of the effect of different types of noise on the precision of supervised learning techniques. Artif. Intell. Rev. 33(4), 275–306 (2010). https://doi.org/10.1007/s10462-010-9156-z

    Article  Google Scholar 

  13. Ochodek, M., Staron, M., Bargowski, D., Meding, W., Hebig, R.: Using machine learning to design a flexible LOC counter. In: 2017 IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE), pp. 14–20. IEEE (2017)

    Google Scholar 

  14. Pechenizkiy, M., Tsymbal, A., Puuronen, S., Pechenizkiy, O.: Class noise and supervised learning in medical domains: The effect of feature extraction. In: 19th IEEE Symposium on Computer-Based Medical Systems (CBMS 2006), pp. 708–713. IEEE (2006)

    Google Scholar 

  15. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  16. Sáez, J.A., Luengo, J., Herrera, F.: Evaluating the classifier behavior with noisy data considering performance and robustness: the equalized loss of accuracy measure. Neurocomputing 176, 26–35 (2016)

    Article  Google Scholar 

  17. Sluban, B., Lavrač, N.: Relating ensemble diversity and performance: a study in class noise detection. Neurocomputing 160, 120–131 (2015)

    Article  Google Scholar 

  18. Wohlin, C., Runeson, P., Höst, M., Ohlsson, M.C., Regnell, B., Wesslén, A.: Experimentation in Software Engineering. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29044-2

  19. Zhang, J., Yang, Y.: Robustness of regularized linear classification methods in text categorization. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 190–197 (2003)

    Google Scholar 

  20. Zhao, Q., Nishida, T.: Using qualitative hypotheses to identify inaccurate data. J. Artif. Intell. Res. 3, 119–145 (1995)

    Article  Google Scholar 

  21. Zhu, X., Wu, X.: Class noise vs. attribute noise: a quantitative study. Artif. Intell. Rev. 22(3), 177–210 (2004). https://doi.org/10.1007/s10462-004-0751-8

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khaled Walid Al-Sabbagh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Al-Sabbagh, K.W., Hebig, R., Staron, M. (2020). The Effect of Class Noise on Continuous Test Case Selection: A Controlled Experiment on Industrial Data. In: Morisio, M., Torchiano, M., Jedlitschka, A. (eds) Product-Focused Software Process Improvement. PROFES 2020. Lecture Notes in Computer Science(), vol 12562. Springer, Cham. https://doi.org/10.1007/978-3-030-64148-1_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-64148-1_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-64147-4

  • Online ISBN: 978-3-030-64148-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics