Abstract
In the process of software development, the ability to localize faults is crucial for improving the efficiency of debugging. Generally speaking, detecting and repairing errant behavior at an early stage of the development cycle considerably reduces costs and development time. Researchers have tried to utilize various methods to locate the faulty codes. However, failing test cases usually account for a small portion of the test suite, which inevitably leads to the class-imbalance phenomenon and hampers the effectiveness of fault localization.
Accordingly, in this work, we propose a new fault localization approach named ContextAug. After obtaining dynamic execution through test cases, ContextAug traces these executions to build an information model; subsequently, it constructs a failure context with propagation dependencies to intersect with new model-domain failing test samples synthesized by the minimum variability of the minority feature space. In contrast to traditional test generation directly from the input domain, ContextAug seeks a new perspective to synthesize failing test samples from the model domain, which is much easier to augment test suites. Through conducting empirical research on real large-sized programs with 13 state-of-the-art fault localization approaches, ContextAug could significantly improve fault localization effectiveness with up to 54.53%. Thus, ContextAug is verified as able to improve fault localization effectiveness.
Similar content being viewed by others
References
Wong W E, Gao R, Li Y, Abreu R, Wotawa F. A survey on software fault localization. IEEE Transactions on Software Engineering, 2016, 42(8): 707–740
Pearson S, Campos J, Just R, Fraser G, Abreu R, Ernst M D, Pang D, Keller B. Evaluating and improving fault localization. In: Proceedings of the 39th IEEE/ACM International Conference on Software Engineering. 2017, 609–620
Xie X, Chen T Y, Kuo F C, Xu B. A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization. ACM Transactions on Software Engineering and Methodology, 2013, 22(4): 31
Naish L, Lee H J, Ramamohanarao K. A model for spectra-based software diagnosis. ACM Transactions on Software Engineering and Methodology, 2011, 20(3): 11
Zhang Z, Lei Y, Mao X, Li P. CNN-FL: an effective approach for localizing faults using convolutional neural networks. In: Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering. 2019, 445–455
Zhang Z, Lei Y, Mao X, Yan M, Xu L, Wen J. Improving deep-learning-based fault localization with resampling. Journal of Software: Evolution and Process, 2021, 33(3): e2312
Li X, Li W, Zhang Y, Zhang L. DeepFL: integrating multiple fault diagnosis dimensions for deep fault localization. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 2019, 169–180
Sohn J, Yoo S. FLUCCS: using code and change metrics to improve fault localization. In: Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis. 2017, 273–283
Lee H J, Naish L, Ramamohanarao K. Effective software bug localization using spectral frequency weighting function. In: Proceedings of the 34th IEEE Annual Computer Software and Applications Conference. 2010, 218–227
Lei Y, Mao X, Zhang M, Ren J, Jiang Y. Toward understanding information models of fault localization: elaborate is not always better. In: Proceedings of the 41st IEEE Annual Computer Software and Applications Conference. 2017, 57–66
Cheng G, Zheng Z, Wei L, Hao P. Effects of class imbalance in test suites: an empirical study of spectrum-based fault localization. In: Proceedings of the 36th IEEE Annual Computer Software and Applications Conference Workshops. 2012, 470–475
Zhang L, Yan L, Zhang Z, Zhang J, Chan W K, Zheng Z. A theoretical analysis on cloning the failed test cases to improve spectrum-based fault localization. Journal of Systems and Software, 2017, 129: 35–57
Jin W, Orso A. F3: fault localization for field failures. In: Proceedings of 2013 International Symposium on Software Testing and Analysis. 2013, 213–223
Jin W, Orso A. BugRedux: reproducing field failures for in-house debugging. In: Proceedings of the 34th International Conference on Software Engineering. 2012, 474–484
Soltani M, Derakhshanfar P, Panichella A, Devroey X, Zaidman A, van Deursen A. Single-objective versus multi-objectivized optimization for evolutionary crash reproduction. In: Proceedings of the 10th International Symposium on Search Based Software Engineering. 2018, 325–340
Soltani M, Derakhshanfar P, Devroey X, van Deursen A. A benchmark-based evaluation of search-based crash reproduction. Empirical Software Engineering, 2020, 25(1): 96–138
Böhme M, Geethal C, Pham V T. Human-in-the-loop automatic program repair. In: Proceedings of the 13th IEEE International Conference on Software Testing, Validation and Verification. 2020, 274–285
An G, Yoo S. Human-in-the-loop fault localisation using efficient test prioritisation of generated tests. 2021, arXiv preprint arXiv: 2104.06641
Baudry B, Fleurey F, Le Traon Y. Improving test suites for efficient fault localization. In: Proceedings of the 28th International Conference on Software Engineering. 2006, 82–91
Hao D, Pan Y, Zhang L, Zhao W, Mei H, Sun J. A similarity-aware approach to testing based fault localization. In: Proceedings of the 20th IEEE/ACM International Conference on Automated software Engineering. 2005, 291–294
Lei Y, Sun C, Mao X, Su Z. How test suites impact fault localisation starting from the size. IET Software, 2018, 12(3): 190–205
He H, Garcia E A. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1263–1284
Krawczyk B. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 2016, 5(4): 221–232
Shorten C, Khoshgoftaar T M. A survey on image data augmentation for deep learning. Journal of Big Data, 2019, 6(1): 60
Xian Y, Lorenz T, Schiele B, Akata Z. Feature generating networks for zero-shot learning. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 5542–5551
Xian Y, Sharma S, Schiele B, Akata Z. F-VAEGAN-D2: a feature generating framework for any-shot learning. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 10276–10276
Zhou F, Huang S, Xing Y. Deep semantic dictionary learning for multi-label image classification. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence. 2021, 3572–3580
Tantithamthavorn C, Hassan A E, Matsumoto K. The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Transactions on Software Engineering, 2020, 46(11): 1200–1219
Agrawal H, Horgan J R. Dynamic program slicing. ACM SIGPLAN Notices, 1990, 25(6): 246–256
Xu B, Qian J, Zhang X, Wu Z, Chen L. A brief survey of program slicing. ACM SIGSOFT Software Engineering Notes, 2005, 30(2): 1–36
Zhang Z, Lei Y, Mao X, Yan M, Xu L, Zhang X. A study of effectiveness of deep learning in locating real faults. Information and Software Technology, 2021, 131: 106486
Wang H, Du B, He J, Liu Y, Chen X. IETCR: an information entropy based test case reduction strategy for mutation-based fault localization. IEEE Access, 2020, 8: 124297–124310
Zhang Z, Lei Y, Mao X, Yan M, Xia X. Improving fault localization using model-domain synthesized failing test generation. In: Proceedings of 2022 IEEE International Conference on Software Maintenance and Evolution. 2022, 199–210
Xie X, Kuo F C, Chen T, Yoo S, Harman M. Provably optimal and human-competitive results in SBSE for spectrum based fault localisation. In: Proceedings of the 5th International Symposium on Search Based Software Engineering. 2013, 224–238
Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 2002, 16: 321–357
Just R, Jalali D, Ernst M D. Defects4J: a database of existing faults to enable controlled testing studies for Java programs. In: Proceedings of 2014 International Symposium on Software Testing and Analysis. 2014, 437–440
Li Y, Wang S, Nguyen T. Fault localization with code coverage representation learning. In: Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering. 2021, 661–673
Parnin C, Orso A. Are automated debugging techniques actually helping programmers? In: Proceedings of 2011 International Symposium on Software Testing and Analysis. 2011, 199–209
Debroy V, Wong W E, Xu X, Choi B. A grouping-based strategy to improve the effectiveness of fault localization techniques. In: Proceedings of the 10th International Conference on Quality Software. 2010, 13–22
Briand L C, Labiche Y, Liu X. Using machine learning to support debugging with tarantula. In: Proceedings of the 18th IEEE International Symposium on Software Reliability. 2017, 137–146
Lei Y, Mao X, Dai Z, Wang C. Effective statistical fault localization using program slices. In: Proceedings of the 36th IEEE Annual Computer Software and Applications Conference. 2012, 1–10
Richardson A. Nonparametric statistics for non-statisticians: a step-by-step approach. International Statistical Review, 2010, 78(3): 451–452
Jones J A, Bowring J F, Harrold M J. Debugging in parallel. In: Proceedings of 2007 International Symposium on Software Testing and Analysis. 2007, 16–26
Wong E, Wei T, Qi Y, Zhao L. A crosstab-based statistical method for effective fault localization. In: Proceedings of the 1st International Conference on Software Testing, Verification, and Validation. 2008, 42–51
Japkowicz N, Stephen S. The class imbalance problem: a systematic study. Intelligent Data Analysis, 2002, 6(5): 429–449
Yu Y, Jones J A, Harrold M J. An empirical study of the effects of test-suite reduction on fault localization. In: Proceedings of the 30th International Conference on Software Engineering. 2008, 201–210
Wong W E, Qi Y. BP neural network-based effective fault localization. International Journal of Software Engineering and Knowledge Engineering, 2009, 19(4): 573–597
Wong W E, Debroy V, Golden R, Xu X, Thuraisingham B. Effective software fault localization using an RBF neural network. IEEE Transactions on Reliability, 2012, 61(1): 149–169
Zhang Z, Lei Y, Tan Q, Mao X, Zeng P, Chang X. Deep Learning-based fault localization with contextual information. IEICE Transactions on Information and Systems, 2017, E100.D(12): 3027–3031
Troya J, Segura S, Parejo J A, Ruiz-Cortés A. Spectrum-based fault localization in model transformations. ACM Transactions on Software Engineering and Methodology, 2018, 27(3): 13
Zhang M, Li Y, Li X, Chen L, Zhang Y, Zhang L, Khurshid S. An empirical study of boosting spectrum-based fault localization via PageRank. IEEE Transactions on Software Engineering, 2021, 47(6): 1089–1113
Jiang J, Wang R, Xiong Y, Chen X, Zhang L. Combining spectrum-based fault localization and statistical debugging: an empirical study. In: Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering. 2019, 502–514
Chen M Y, Kiciman E, Fratkin E, Fox A, Brewer E. Pinpoint: problem determination in large, dynamic internet services. In: Proceedings of International Conference on Dependable Systems and Networks. 2002, 595–604
Jones J A. Fault localization using visualization of test information. In: Proceedings of the 26th International Conference on Software Engineering. 2004, 54–56
Abreu R, Zoeteweij P, van Gemund A J C. An evaluation of similarity coefficients for software fault localization. In: Proceedings of the 12th Pacific Rim International Symposium on Dependable Computing. 2006, 39–46
Wong W E, Qi Y, Zhao L, Cai K Y. Effective fault localization using code coverage. In: Proceedings of the 31st Annual International Computer Software and Applications Conference. 2007, 449–456
Wong W E, Debroy V, Choi B. A family of code coverage-based heuristics for effective fault localization. Journal of Systems and Software, 2010, 83(2): 188–208
Wong W E, Debroy V, Li Y, Gao R. Software fault localization using DStar (D*). In: Proceedings of the 6th IEEE International Conference on Software Security and Reliability. 2012, 21–30
Author information
Authors and Affiliations
Corresponding author
Additional information
Zhuo Zhang received the BA in computer science and technology, MA and PhD degrees in software engineering, all from the National University of Defense Technology, China. His research interests include fault localization, intelligent software technology, etc.
Jianxin Xue received his MS in software engineering from National University of Defense Technology, China. His PhD in computer software and theory is from Shanghai Jiao Tong University, China. Jianxin Xue has been an associate professor in School of Computer and Information Engineering, Institute for Artificial Intelligence, Shanghai Polytechnic University, China. His primary research interest is concurrency theory and analysis of concurrent program, etc.
Deheng Yang is currently a Master student at National University of Defense Technology, China under the supervision of Dr. Xiaoguang Mao. He received the BA in computer science and technology from the National University of Defense Technology, China. His research interests include fault localization, automated program repair, etc.
Xiaoguang Mao is a professor at College of Computer, National University of Defense Technology, China. His research interests include high confidence software, software development methodology, software assurance, software service engineering, etc.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Zhang, Z., Xue, J., Yang, D. et al. ContextAug: model-domain failing test augmentation with contextual information. Front. Comput. Sci. 18, 182202 (2024). https://doi.org/10.1007/s11704-023-2521-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-023-2521-2