Robustness-Enhanced Assertion Generation Method Based on Code Mutation and Attack Defense

Li, Min; Chen, Shizhan; Fan, Guodong; Zhang, Lu; Wu, Hongyue; Xue, Xiao; Feng, Zhiyong

doi:10.1007/978-3-031-54528-3_16

Min Li¹⁸,
Shizhan Chen¹⁸,
Guodong Fan¹⁸,
Lu Zhang¹⁸,
Hongyue Wu¹⁸,
Xiao Xue¹⁸ &
…
Zhiyong Feng¹⁸

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 562))

Included in the following conference series:

International Conference on Collaborative Computing: Networking, Applications and Worksharing

125 Accesses

Abstract

Writing high-quality unit tests plays a crucial role in discovering and diagnosing early-stage errors and preventing their further propagation throughout the development cycle. However, the low readability of existing automated test case tools hinders developers from directly using them. In addition, current approaches exhibit sensitivity to individual words in the input code, often producing completely different results for minor changes in the input code. To tackle these problems, we propose AssertGen, a powerful Java assertion generation model that maintains consistent output for minor variations in code snippets. Inspired by software mutation testing, we propose 11 heuristic strategies for code mutation, aiming to generate variant code that is human-readable but misleading to the model, by making minor changes to code text or structural information. Then, we use the variant code to attack the model to test the model’s robustness. We observe that the variant based on variable names (VM), the mutation based on method names (FM), and the mutation method False_Control_Flow, which adds additional control flow, have the greatest impact on the quality of generated assertions by the model. To enhance the robustness of AssertGen, we use multiple mutations to expand the original dataset, allowing the model to learn how to counter the instability caused by mutations during the training process. Experiment results show our assertion generation model achieves a BLEU score of 60.08 and a perfect prediction rate of 47.91%, surpassing previous work significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Zhu, H., Hall, P.A., May, J.H.: Software unit test coverage and adequacy. ACM Comput. Surv. (CSUR) 29(4), 366–427 (1997)
Article Google Scholar
Cohn, M.: Succeeding with agile: software development using Scrum. Pearson Education (2010)
Google Scholar
Runeson, P.: A survey of unit testing practices. IEEE Softw. 23(4), 22–29 (2006)
Article Google Scholar
Olan, M.: Unit testing: test early, test often. J. Comput. Sci. Coll. 19(2), 319–328 (2003)
Google Scholar
Watson, C., Tufano, M., Moran, K., Bavota, G., Poshyvanyk, D.: On learning meaningful assert statements for unit test cases. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, 2020, pp. 1398–1409 (2020)
Google Scholar
Klammer, C., Kern, A.: Writing unit tests: It’s now or never! In: 2015 IEEE Eighth International Conference on Software Testing, Verification and Validation Workshops (ICSTW). IEEE, pp. 1–4 (2015)
Google Scholar
Fraser, G., Arcuri, A.: Evosuite: automatic test suite generation for object-oriented software. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, pp. 416–419 (2011)
Google Scholar
Pacheco, C., Ernst, M.D.: Randoop: feedback-directed random testing for Java. In: Companion to the 22nd ACM SIGPLAN Conference on Object- Oriented Programming Systems and Applications Companion, pp. 815–816 (2007)
Google Scholar
Almasi, M.M., Hemmati, H., Fraser, G., Arcuri, A., Benefelds, J.: An industrial evaluation of unit test generation: finding real faults in a financial application. In: 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP). IEEE, pp. 263–272 (2017)
Google Scholar
Shamshiri, S.: Automated unit test generation for evolving software. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, pp. 1038–1041 (2015)
Google Scholar
Zhang, J., Panthaplackel, S., Nie, P., Li, J.J., Gligoric, M.: Coditt5: pretraining for source code and natural language editing. In: 37th IEEE/ACM International Conference on Automated Software Engineering, pp. 1–12 (2022)
Google Scholar
Fukumoto, D., Kashiwa, Y., Hirao, T., Fujiwara, K., Iida, H.: An empirical investigation on the performance of domain adaptation for t5 code completion. In: 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 693–697. IEEE (2023)
Google Scholar
Xia, C.S., Wei, Y., Zhang, L.: Automated program repair in the era of large pre-trained language models. In: Proceedings of the 45th International Conference on Software Engineering (ICSE 2023). Association for Computing Machinery (2023)
Google Scholar
Kolak, S.D., Martins, R., Le Goues, C., Hellendoorn, V.J.: Patch generation with language models: Feasibility and scaling behavior. In: Deep Learning for Code Workshop (2022)
Google Scholar
Prenner, J.A., Babii, H., Robbes, R.: Can openai’s codex fix bugs? an evaluation on quixbugs. In: Proceedings of the Third International Workshop on Automated Program Repair, pp. 69–75 (2022)
Google Scholar
White, J., Hays, S., Fu, Q., Spencer-Smith, J., Schmidt, D.C.: Chatgpt prompt patterns for improving code quality, refactoring, requirements elicitation, and software design, arXiv preprint arXiv:2303.07839 (2023)
Jiang, X., Zheng, Z., Lyu, C., Li, L., Lyu, L.: Treebert: a tree-based pre-trained model for programming language. In: Uncertainty in Artificial Intelligence. PMLR, pp. 54–63 (2021)
Google Scholar
Wan, Y., Zhao, W., Zhang, H., Sui, Y., Xu, G., Jin, H.: What do they capture? a structural analysis of pre-trained language models for source code. In: Proceedings of the 44th International Conference on Software Engineering, pp. 2377–2388 (2022)
Google Scholar
Wang, Y., Wang, W., Joty, S., Hoi, S.C.: Codet5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859 (2021)
Lu, S., et al.: Codexglue: a machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664 (2021)
Feng, Z., et al.: Codebert: a pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020)
Fu, M., Tantithamthavorn, C., Le, T., Nguyen, V., Phung, D.: Vulrepair: a t5-based automated software vulnerability repair. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 935–947 (2022)
Google Scholar
Fan, G., et al.: Dialog summarization for software collaborative platform via tuning pre-trained models. J. Syst. Softw., 111763 (2023)
Google Scholar
Imai, S.: Is github copilot a substitute for human pair-programming? an empirical study. In: Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings, pp. 319–321 (2022)
Google Scholar
Pearce, H., Tan, B., Ahmad, B., Karri, R., Dolan-Gavitt, B.: Can openai codex and other large language models help us fix security bugs? arXiv preprint arXiv:2112.02125 (2021)
Pearce, H., Tan, B., Krishnamurthy, P., Khorrami, F., Karri, R., Dolan Gavitt, B.: Pop quiz! can a large language model help with reverse engineering? arXiv preprint arXiv:2202.01142 (2022)
Sarsa, S., Denny, P., Hellas, A., Leinonen, J.: Automatic generation of programming exercises and code explanations using large language models. In: Proceedings of the 2022 ACM Conference on International Computing Education Research-Volume 1, pp. 27–43 (2022)
Google Scholar
Zhang, Z., Zhang, H., Shen, B., Gu, X.: Diet code is healthy: simplifying programs for pre-trained models of code. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1073–1084 (2022)
Google Scholar
Li, Z., Wang, C., Liu, Z., Wang, H., Wang, S., Gao, C.: Cctest: testing and repairing code completion systems. arXiv preprint arXiv:2208.08289 (2022)
Ojdanic, M., Soremekun, E., Degiovanni, R., Papadakis, M., Le Traon, Y.: Mutation testing in evolving systems: studying the relevance of mutants to code evolution. ACM Trans. Softw. Eng. Methodol. 32(1), 1–39 (2023)
Article Google Scholar
Harman, M., McMinn, P.: A theoretical and empirical study of search-based testing: Local, global, and hybrid search. IEEE Trans. Software Eng. 36(2), 226–247 (2009)
Article Google Scholar
Blasi, A., Gorla, A., Ernst, M.D., Pezz‘e, M.: Call me maybe: using nlp to automatically generate unit test cases respecting temporal constraints. In: 37th IEEE/ACM International Conference on Automated Software Engineering, pp. 1–11 (2022)
Google Scholar
Delgado-Perez, A., Ramirez, A., Valle-Gomez, K.J., Medina-Bulo, I., Romero, J.R.: Interevo-tr: Interactive evolutionary test generation with readability assessment. IEEE Trans. Softw. Eng. (2022)
Google Scholar
Ernst, M.D., et al.: The daikon system for dynamic detection of likely invariants. Sci. Comput. Program. 69(1–3), 35–45 (2007)
Article MathSciNet Google Scholar
Csallner, C., Tillmann, N., Smaragdakis, Y.: Dysy: dynamic symbolic execution for invariant inference. In: Proceedings of the 30th International Conference on Software Engineering, pp. 281–290 (2008)
Google Scholar
Xiao, X., Li, S., Xie, T., Tillmann, N.: Characteristic studies of loop problems for structural test generation via symbolic execution. In: 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, pp. 246–256 (2013)
Google Scholar
Zeller, A., Gopinath, R., B̄ohme, M., Fraser, G., Holler, C.: The fuzzing book (2019)
Google Scholar
Pacheco, C., Lahiri, S.K., Ernst, M.D., Ball, T.: Feedback-directed random test generation. In: 29th International Conference on Software Engineering (ICSE’07), pp. 75–84. IEEE (2007)
Google Scholar
Shamshiri, S., Just, R., Rojas, J.M., Fraser, G., McMinn, P., Arcuri, A.: Do automatically generated unit tests find real faults? an empirical study of effectiveness and challenges (t). In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, pp. 201–211 (2015)
Google Scholar
White, R., Krinke, J.: Testnmt: function-to-test neural machine translation. In: Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering, pp. 30–33 (2018)
Google Scholar
Tufano, M., Drain, D., Svyatkovskiy, A., Deng, S.K., Sundaresan, N.: Unit test case generation with transformers and focal context
Google Scholar
White, R., Krinke, J: Reassert: deep learning for assert generation. arXiv preprint arXiv:2011.09784 (2020)
Villmow, J., Depoix, J., Ulges, A.: Contest: a unit test completion benchmark featuring context. In: Proceedings of the 1st Workshop on Natural Language Processing for Programming (NLP4Prog 2021), pp. 17–25 (2021)
Google Scholar
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International conference on machine learning. Pmlr, pp. 1310–1318 (2013)
Google Scholar
Yu, S., Wang, T., Wang, J.: Data augmentation by program transformation. J. Syst. Softw. 190, 111304 (2022)
Article Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Google Scholar
Lin, C.-Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)
Google Scholar
Yang, G., Zhou, Y., Yang, W., Yue, T., Chen, X., Chen, T.: How important are good method names in neural code generation? a model robustness perspective. arXiv preprint arXiv:2211.15844 (2022)
Dinella, E., Ryan, G., Mytkowicz, T., Lahiri, S.K.: Toga: a neural method for test oracle generation. In: Proceedings of the 44th International Conference on Software Engineering, pp. 2130–2141 (2022)
Google Scholar

Download references

Acknowledgments

This project was funded by the National Natural Science Foundation of China (62032016, 61832014).

Author information

Authors and Affiliations

Tianjin University, Tianjin, China
Min Li, Shizhan Chen, Guodong Fan, Lu Zhang, Hongyue Wu, Xiao Xue & Zhiyong Feng

Authors

Min Li
View author publications
You can also search for this author in PubMed Google Scholar
Shizhan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Guodong Fan
View author publications
You can also search for this author in PubMed Google Scholar
Lu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hongyue Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Xue
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyong Feng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongyue Wu .

Editor information

Editors and Affiliations

Shanghai University, Shanghai, China
Honghao Gao
Xi’an Jiaotong-Liverpool, Suzhou, China
Xinheng Wang
University of Peloponnese, Patra, Greece
Nikolaos Voros

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, M. et al. (2024). Robustness-Enhanced Assertion Generation Method Based on Code Mutation and Attack Defense. In: Gao, H., Wang, X., Voros, N. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 562. Springer, Cham. https://doi.org/10.1007/978-3-031-54528-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-54528-3_16
Published: 23 February 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54527-6
Online ISBN: 978-3-031-54528-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Robustness-Enhanced Assertion Generation Method Based on Code Mutation and Attack Defense