Skip to main content
Log in

ContextAug: model-domain failing test augmentation with contextual information

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

In the process of software development, the ability to localize faults is crucial for improving the efficiency of debugging. Generally speaking, detecting and repairing errant behavior at an early stage of the development cycle considerably reduces costs and development time. Researchers have tried to utilize various methods to locate the faulty codes. However, failing test cases usually account for a small portion of the test suite, which inevitably leads to the class-imbalance phenomenon and hampers the effectiveness of fault localization.

Accordingly, in this work, we propose a new fault localization approach named ContextAug. After obtaining dynamic execution through test cases, ContextAug traces these executions to build an information model; subsequently, it constructs a failure context with propagation dependencies to intersect with new model-domain failing test samples synthesized by the minimum variability of the minority feature space. In contrast to traditional test generation directly from the input domain, ContextAug seeks a new perspective to synthesize failing test samples from the model domain, which is much easier to augment test suites. Through conducting empirical research on real large-sized programs with 13 state-of-the-art fault localization approaches, ContextAug could significantly improve fault localization effectiveness with up to 54.53%. Thus, ContextAug is verified as able to improve fault localization effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Wong W E, Gao R, Li Y, Abreu R, Wotawa F. A survey on software fault localization. IEEE Transactions on Software Engineering, 2016, 42(8): 707–740

    Article  Google Scholar 

  2. Pearson S, Campos J, Just R, Fraser G, Abreu R, Ernst M D, Pang D, Keller B. Evaluating and improving fault localization. In: Proceedings of the 39th IEEE/ACM International Conference on Software Engineering. 2017, 609–620

  3. Xie X, Chen T Y, Kuo F C, Xu B. A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization. ACM Transactions on Software Engineering and Methodology, 2013, 22(4): 31

    Article  Google Scholar 

  4. Naish L, Lee H J, Ramamohanarao K. A model for spectra-based software diagnosis. ACM Transactions on Software Engineering and Methodology, 2011, 20(3): 11

    Article  Google Scholar 

  5. Zhang Z, Lei Y, Mao X, Li P. CNN-FL: an effective approach for localizing faults using convolutional neural networks. In: Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering. 2019, 445–455

  6. Zhang Z, Lei Y, Mao X, Yan M, Xu L, Wen J. Improving deep-learning-based fault localization with resampling. Journal of Software: Evolution and Process, 2021, 33(3): e2312

    Google Scholar 

  7. Li X, Li W, Zhang Y, Zhang L. DeepFL: integrating multiple fault diagnosis dimensions for deep fault localization. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 2019, 169–180

  8. Sohn J, Yoo S. FLUCCS: using code and change metrics to improve fault localization. In: Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis. 2017, 273–283

  9. Lee H J, Naish L, Ramamohanarao K. Effective software bug localization using spectral frequency weighting function. In: Proceedings of the 34th IEEE Annual Computer Software and Applications Conference. 2010, 218–227

  10. Lei Y, Mao X, Zhang M, Ren J, Jiang Y. Toward understanding information models of fault localization: elaborate is not always better. In: Proceedings of the 41st IEEE Annual Computer Software and Applications Conference. 2017, 57–66

  11. Cheng G, Zheng Z, Wei L, Hao P. Effects of class imbalance in test suites: an empirical study of spectrum-based fault localization. In: Proceedings of the 36th IEEE Annual Computer Software and Applications Conference Workshops. 2012, 470–475

  12. Zhang L, Yan L, Zhang Z, Zhang J, Chan W K, Zheng Z. A theoretical analysis on cloning the failed test cases to improve spectrum-based fault localization. Journal of Systems and Software, 2017, 129: 35–57

    Article  Google Scholar 

  13. Jin W, Orso A. F3: fault localization for field failures. In: Proceedings of 2013 International Symposium on Software Testing and Analysis. 2013, 213–223

  14. Jin W, Orso A. BugRedux: reproducing field failures for in-house debugging. In: Proceedings of the 34th International Conference on Software Engineering. 2012, 474–484

  15. Soltani M, Derakhshanfar P, Panichella A, Devroey X, Zaidman A, van Deursen A. Single-objective versus multi-objectivized optimization for evolutionary crash reproduction. In: Proceedings of the 10th International Symposium on Search Based Software Engineering. 2018, 325–340

  16. Soltani M, Derakhshanfar P, Devroey X, van Deursen A. A benchmark-based evaluation of search-based crash reproduction. Empirical Software Engineering, 2020, 25(1): 96–138

    Article  Google Scholar 

  17. Böhme M, Geethal C, Pham V T. Human-in-the-loop automatic program repair. In: Proceedings of the 13th IEEE International Conference on Software Testing, Validation and Verification. 2020, 274–285

  18. An G, Yoo S. Human-in-the-loop fault localisation using efficient test prioritisation of generated tests. 2021, arXiv preprint arXiv: 2104.06641

  19. Baudry B, Fleurey F, Le Traon Y. Improving test suites for efficient fault localization. In: Proceedings of the 28th International Conference on Software Engineering. 2006, 82–91

  20. Hao D, Pan Y, Zhang L, Zhao W, Mei H, Sun J. A similarity-aware approach to testing based fault localization. In: Proceedings of the 20th IEEE/ACM International Conference on Automated software Engineering. 2005, 291–294

  21. Lei Y, Sun C, Mao X, Su Z. How test suites impact fault localisation starting from the size. IET Software, 2018, 12(3): 190–205

    Article  Google Scholar 

  22. He H, Garcia E A. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1263–1284

    Article  Google Scholar 

  23. Krawczyk B. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 2016, 5(4): 221–232

    Article  Google Scholar 

  24. Shorten C, Khoshgoftaar T M. A survey on image data augmentation for deep learning. Journal of Big Data, 2019, 6(1): 60

    Article  Google Scholar 

  25. Xian Y, Lorenz T, Schiele B, Akata Z. Feature generating networks for zero-shot learning. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 5542–5551

  26. Xian Y, Sharma S, Schiele B, Akata Z. F-VAEGAN-D2: a feature generating framework for any-shot learning. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 10276–10276

  27. Zhou F, Huang S, Xing Y. Deep semantic dictionary learning for multi-label image classification. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence. 2021, 3572–3580

  28. Tantithamthavorn C, Hassan A E, Matsumoto K. The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Transactions on Software Engineering, 2020, 46(11): 1200–1219

    Article  Google Scholar 

  29. Agrawal H, Horgan J R. Dynamic program slicing. ACM SIGPLAN Notices, 1990, 25(6): 246–256

    Article  Google Scholar 

  30. Xu B, Qian J, Zhang X, Wu Z, Chen L. A brief survey of program slicing. ACM SIGSOFT Software Engineering Notes, 2005, 30(2): 1–36

    Article  Google Scholar 

  31. Zhang Z, Lei Y, Mao X, Yan M, Xu L, Zhang X. A study of effectiveness of deep learning in locating real faults. Information and Software Technology, 2021, 131: 106486

    Article  Google Scholar 

  32. Wang H, Du B, He J, Liu Y, Chen X. IETCR: an information entropy based test case reduction strategy for mutation-based fault localization. IEEE Access, 2020, 8: 124297–124310

    Article  Google Scholar 

  33. Zhang Z, Lei Y, Mao X, Yan M, Xia X. Improving fault localization using model-domain synthesized failing test generation. In: Proceedings of 2022 IEEE International Conference on Software Maintenance and Evolution. 2022, 199–210

  34. Xie X, Kuo F C, Chen T, Yoo S, Harman M. Provably optimal and human-competitive results in SBSE for spectrum based fault localisation. In: Proceedings of the 5th International Symposium on Search Based Software Engineering. 2013, 224–238

  35. Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 2002, 16: 321–357

    Article  MATH  Google Scholar 

  36. Just R, Jalali D, Ernst M D. Defects4J: a database of existing faults to enable controlled testing studies for Java programs. In: Proceedings of 2014 International Symposium on Software Testing and Analysis. 2014, 437–440

  37. Li Y, Wang S, Nguyen T. Fault localization with code coverage representation learning. In: Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering. 2021, 661–673

  38. Parnin C, Orso A. Are automated debugging techniques actually helping programmers? In: Proceedings of 2011 International Symposium on Software Testing and Analysis. 2011, 199–209

  39. Debroy V, Wong W E, Xu X, Choi B. A grouping-based strategy to improve the effectiveness of fault localization techniques. In: Proceedings of the 10th International Conference on Quality Software. 2010, 13–22

  40. Briand L C, Labiche Y, Liu X. Using machine learning to support debugging with tarantula. In: Proceedings of the 18th IEEE International Symposium on Software Reliability. 2017, 137–146

  41. Lei Y, Mao X, Dai Z, Wang C. Effective statistical fault localization using program slices. In: Proceedings of the 36th IEEE Annual Computer Software and Applications Conference. 2012, 1–10

  42. Richardson A. Nonparametric statistics for non-statisticians: a step-by-step approach. International Statistical Review, 2010, 78(3): 451–452

    Article  Google Scholar 

  43. Jones J A, Bowring J F, Harrold M J. Debugging in parallel. In: Proceedings of 2007 International Symposium on Software Testing and Analysis. 2007, 16–26

  44. Wong E, Wei T, Qi Y, Zhao L. A crosstab-based statistical method for effective fault localization. In: Proceedings of the 1st International Conference on Software Testing, Verification, and Validation. 2008, 42–51

  45. Japkowicz N, Stephen S. The class imbalance problem: a systematic study. Intelligent Data Analysis, 2002, 6(5): 429–449

    Article  MATH  Google Scholar 

  46. Yu Y, Jones J A, Harrold M J. An empirical study of the effects of test-suite reduction on fault localization. In: Proceedings of the 30th International Conference on Software Engineering. 2008, 201–210

  47. Wong W E, Qi Y. BP neural network-based effective fault localization. International Journal of Software Engineering and Knowledge Engineering, 2009, 19(4): 573–597

    Article  Google Scholar 

  48. Wong W E, Debroy V, Golden R, Xu X, Thuraisingham B. Effective software fault localization using an RBF neural network. IEEE Transactions on Reliability, 2012, 61(1): 149–169

    Article  Google Scholar 

  49. Zhang Z, Lei Y, Tan Q, Mao X, Zeng P, Chang X. Deep Learning-based fault localization with contextual information. IEICE Transactions on Information and Systems, 2017, E100.D(12): 3027–3031

    Article  Google Scholar 

  50. Troya J, Segura S, Parejo J A, Ruiz-Cortés A. Spectrum-based fault localization in model transformations. ACM Transactions on Software Engineering and Methodology, 2018, 27(3): 13

    Article  Google Scholar 

  51. Zhang M, Li Y, Li X, Chen L, Zhang Y, Zhang L, Khurshid S. An empirical study of boosting spectrum-based fault localization via PageRank. IEEE Transactions on Software Engineering, 2021, 47(6): 1089–1113

    Article  Google Scholar 

  52. Jiang J, Wang R, Xiong Y, Chen X, Zhang L. Combining spectrum-based fault localization and statistical debugging: an empirical study. In: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering. 2019, 502–514

  53. Chen M Y, Kiciman E, Fratkin E, Fox A, Brewer E. Pinpoint: problem determination in large, dynamic internet services. In: Proceedings of International Conference on Dependable Systems and Networks. 2002, 595–604

  54. Jones J A. Fault localization using visualization of test information. In: Proceedings of the 26th International Conference on Software Engineering. 2004, 54–56

  55. Abreu R, Zoeteweij P, van Gemund A J C. An evaluation of similarity coefficients for software fault localization. In: Proceedings of the 12th Pacific Rim International Symposium on Dependable Computing. 2006, 39–46

  56. Wong W E, Qi Y, Zhao L, Cai K Y. Effective fault localization using code coverage. In: Proceedings of the 31st Annual International Computer Software and Applications Conference. 2007, 449–456

  57. Wong W E, Debroy V, Choi B. A family of code coverage-based heuristics for effective fault localization. Journal of Systems and Software, 2010, 83(2): 188–208

    Article  Google Scholar 

  58. Wong W E, Debroy V, Li Y, Gao R. Software fault localization using DStar (D*). In: Proceedings of the 6th IEEE International Conference on Software Security and Reliability. 2012, 21–30

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianxin Xue.

Additional information

Zhuo Zhang received the BA in computer science and technology, MA and PhD degrees in software engineering, all from the National University of Defense Technology, China. His research interests include fault localization, intelligent software technology, etc.

Jianxin Xue received his MS in software engineering from National University of Defense Technology, China. His PhD in computer software and theory is from Shanghai Jiao Tong University, China. Jianxin Xue has been an associate professor in School of Computer and Information Engineering, Institute for Artificial Intelligence, Shanghai Polytechnic University, China. His primary research interest is concurrency theory and analysis of concurrent program, etc.

Deheng Yang is currently a Master student at National University of Defense Technology, China under the supervision of Dr. Xiaoguang Mao. He received the BA in computer science and technology from the National University of Defense Technology, China. His research interests include fault localization, automated program repair, etc.

Xiaoguang Mao is a professor at College of Computer, National University of Defense Technology, China. His research interests include high confidence software, software development methodology, software assurance, software service engineering, etc.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Z., Xue, J., Yang, D. et al. ContextAug: model-domain failing test augmentation with contextual information. Front. Comput. Sci. 18, 182202 (2024). https://doi.org/10.1007/s11704-023-2521-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-023-2521-2

Keywords

Navigation