Abstract
Writing high-quality unit tests plays a crucial role in discovering and diagnosing early-stage errors and preventing their further propagation throughout the development cycle. However, the low readability of existing automated test case tools hinders developers from directly using them. In addition, current approaches exhibit sensitivity to individual words in the input code, often producing completely different results for minor changes in the input code. To tackle these problems, we propose AssertGen, a powerful Java assertion generation model that maintains consistent output for minor variations in code snippets. Inspired by software mutation testing, we propose 11 heuristic strategies for code mutation, aiming to generate variant code that is human-readable but misleading to the model, by making minor changes to code text or structural information. Then, we use the variant code to attack the model to test the model’s robustness. We observe that the variant based on variable names (VM), the mutation based on method names (FM), and the mutation method False_Control_Flow, which adds additional control flow, have the greatest impact on the quality of generated assertions by the model. To enhance the robustness of AssertGen, we use multiple mutations to expand the original dataset, allowing the model to learn how to counter the instability caused by mutations during the training process. Experiment results show our assertion generation model achieves a BLEU score of 60.08 and a perfect prediction rate of 47.91%, surpassing previous work significantly.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhu, H., Hall, P.A., May, J.H.: Software unit test coverage and adequacy. ACM Comput. Surv. (CSUR) 29(4), 366–427 (1997)
Cohn, M.: Succeeding with agile: software development using Scrum. Pearson Education (2010)
Runeson, P.: A survey of unit testing practices. IEEE Softw. 23(4), 22–29 (2006)
Olan, M.: Unit testing: test early, test often. J. Comput. Sci. Coll. 19(2), 319–328 (2003)
Watson, C., Tufano, M., Moran, K., Bavota, G., Poshyvanyk, D.: On learning meaningful assert statements for unit test cases. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, 2020, pp. 1398–1409 (2020)
Klammer, C., Kern, A.: Writing unit tests: It’s now or never! In: 2015 IEEE Eighth International Conference on Software Testing, Verification and Validation Workshops (ICSTW). IEEE, pp. 1–4 (2015)
Fraser, G., Arcuri, A.: Evosuite: automatic test suite generation for object-oriented software. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, pp. 416–419 (2011)
Pacheco, C., Ernst, M.D.: Randoop: feedback-directed random testing for Java. In: Companion to the 22nd ACM SIGPLAN Conference on Object- Oriented Programming Systems and Applications Companion, pp. 815–816 (2007)
Almasi, M.M., Hemmati, H., Fraser, G., Arcuri, A., Benefelds, J.: An industrial evaluation of unit test generation: finding real faults in a financial application. In: 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP). IEEE, pp. 263–272 (2017)
Shamshiri, S.: Automated unit test generation for evolving software. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, pp. 1038–1041 (2015)
Zhang, J., Panthaplackel, S., Nie, P., Li, J.J., Gligoric, M.: Coditt5: pretraining for source code and natural language editing. In: 37th IEEE/ACM International Conference on Automated Software Engineering, pp. 1–12 (2022)
Fukumoto, D., Kashiwa, Y., Hirao, T., Fujiwara, K., Iida, H.: An empirical investigation on the performance of domain adaptation for t5 code completion. In: 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 693–697. IEEE (2023)
Xia, C.S., Wei, Y., Zhang, L.: Automated program repair in the era of large pre-trained language models. In: Proceedings of the 45th International Conference on Software Engineering (ICSE 2023). Association for Computing Machinery (2023)
Kolak, S.D., Martins, R., Le Goues, C., Hellendoorn, V.J.: Patch generation with language models: Feasibility and scaling behavior. In: Deep Learning for Code Workshop (2022)
Prenner, J.A., Babii, H., Robbes, R.: Can openai’s codex fix bugs? an evaluation on quixbugs. In: Proceedings of the Third International Workshop on Automated Program Repair, pp. 69–75 (2022)
White, J., Hays, S., Fu, Q., Spencer-Smith, J., Schmidt, D.C.: Chatgpt prompt patterns for improving code quality, refactoring, requirements elicitation, and software design, arXiv preprint arXiv:2303.07839 (2023)
Jiang, X., Zheng, Z., Lyu, C., Li, L., Lyu, L.: Treebert: a tree-based pre-trained model for programming language. In: Uncertainty in Artificial Intelligence. PMLR, pp. 54–63 (2021)
Wan, Y., Zhao, W., Zhang, H., Sui, Y., Xu, G., Jin, H.: What do they capture? a structural analysis of pre-trained language models for source code. In: Proceedings of the 44th International Conference on Software Engineering, pp. 2377–2388 (2022)
Wang, Y., Wang, W., Joty, S., Hoi, S.C.: Codet5: identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859 (2021)
Lu, S., et al.: Codexglue: a machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664 (2021)
Feng, Z., et al.: Codebert: a pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020)
Fu, M., Tantithamthavorn, C., Le, T., Nguyen, V., Phung, D.: Vulrepair: a t5-based automated software vulnerability repair. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 935–947 (2022)
Fan, G., et al.: Dialog summarization for software collaborative platform via tuning pre-trained models. J. Syst. Softw., 111763 (2023)
Imai, S.: Is github copilot a substitute for human pair-programming? an empirical study. In: Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings, pp. 319–321 (2022)
Pearce, H., Tan, B., Ahmad, B., Karri, R., Dolan-Gavitt, B.: Can openai codex and other large language models help us fix security bugs? arXiv preprint arXiv:2112.02125 (2021)
Pearce, H., Tan, B., Krishnamurthy, P., Khorrami, F., Karri, R., Dolan Gavitt, B.: Pop quiz! can a large language model help with reverse engineering? arXiv preprint arXiv:2202.01142 (2022)
Sarsa, S., Denny, P., Hellas, A., Leinonen, J.: Automatic generation of programming exercises and code explanations using large language models. In: Proceedings of the 2022 ACM Conference on International Computing Education Research-Volume 1, pp. 27–43 (2022)
Zhang, Z., Zhang, H., Shen, B., Gu, X.: Diet code is healthy: simplifying programs for pre-trained models of code. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1073–1084 (2022)
Li, Z., Wang, C., Liu, Z., Wang, H., Wang, S., Gao, C.: Cctest: testing and repairing code completion systems. arXiv preprint arXiv:2208.08289 (2022)
Ojdanic, M., Soremekun, E., Degiovanni, R., Papadakis, M., Le Traon, Y.: Mutation testing in evolving systems: studying the relevance of mutants to code evolution. ACM Trans. Softw. Eng. Methodol. 32(1), 1–39 (2023)
Harman, M., McMinn, P.: A theoretical and empirical study of search-based testing: Local, global, and hybrid search. IEEE Trans. Software Eng. 36(2), 226–247 (2009)
Blasi, A., Gorla, A., Ernst, M.D., Pezz‘e, M.: Call me maybe: using nlp to automatically generate unit test cases respecting temporal constraints. In: 37th IEEE/ACM International Conference on Automated Software Engineering, pp. 1–11 (2022)
Delgado-Perez, A., Ramirez, A., Valle-Gomez, K.J., Medina-Bulo, I., Romero, J.R.: Interevo-tr: Interactive evolutionary test generation with readability assessment. IEEE Trans. Softw. Eng. (2022)
Ernst, M.D., et al.: The daikon system for dynamic detection of likely invariants. Sci. Comput. Program. 69(1–3), 35–45 (2007)
Csallner, C., Tillmann, N., Smaragdakis, Y.: Dysy: dynamic symbolic execution for invariant inference. In: Proceedings of the 30th International Conference on Software Engineering, pp. 281–290 (2008)
Xiao, X., Li, S., Xie, T., Tillmann, N.: Characteristic studies of loop problems for structural test generation via symbolic execution. In: 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, pp. 246–256 (2013)
Zeller, A., Gopinath, R., B̄ohme, M., Fraser, G., Holler, C.: The fuzzing book (2019)
Pacheco, C., Lahiri, S.K., Ernst, M.D., Ball, T.: Feedback-directed random test generation. In: 29th International Conference on Software Engineering (ICSE’07), pp. 75–84. IEEE (2007)
Shamshiri, S., Just, R., Rojas, J.M., Fraser, G., McMinn, P., Arcuri, A.: Do automatically generated unit tests find real faults? an empirical study of effectiveness and challenges (t). In: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, pp. 201–211 (2015)
White, R., Krinke, J.: Testnmt: function-to-test neural machine translation. In: Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering, pp. 30–33 (2018)
Tufano, M., Drain, D., Svyatkovskiy, A., Deng, S.K., Sundaresan, N.: Unit test case generation with transformers and focal context
White, R., Krinke, J: Reassert: deep learning for assert generation. arXiv preprint arXiv:2011.09784 (2020)
Villmow, J., Depoix, J., Ulges, A.: Contest: a unit test completion benchmark featuring context. In: Proceedings of the 1st Workshop on Natural Language Processing for Programming (NLP4Prog 2021), pp. 17–25 (2021)
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International conference on machine learning. Pmlr, pp. 1310–1318 (2013)
Yu, S., Wang, T., Wang, J.: Data augmentation by program transformation. J. Syst. Softw. 190, 111304 (2022)
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318 (2002)
Lin, C.-Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)
Yang, G., Zhou, Y., Yang, W., Yue, T., Chen, X., Chen, T.: How important are good method names in neural code generation? a model robustness perspective. arXiv preprint arXiv:2211.15844 (2022)
Dinella, E., Ryan, G., Mytkowicz, T., Lahiri, S.K.: Toga: a neural method for test oracle generation. In: Proceedings of the 44th International Conference on Software Engineering, pp. 2130–2141 (2022)
Acknowledgments
This project was funded by the National Natural Science Foundation of China (62032016, 61832014).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Li, M. et al. (2024). Robustness-Enhanced Assertion Generation Method Based on Code Mutation and Attack Defense. In: Gao, H., Wang, X., Voros, N. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 562. Springer, Cham. https://doi.org/10.1007/978-3-031-54528-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-54528-3_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54527-6
Online ISBN: 978-3-031-54528-3
eBook Packages: Computer ScienceComputer Science (R0)