Abstract
The paper studies how code generation by LLMs can be combined with formal verification to produce critical embedded software. The first contribution is a general framework, spec2code, in which LLMs are combined with different types of critics that produce feedback for iterative backprompting and fine-tuning. The second contribution presents a first feasibility study, where a minimalistic instantiation of spec2code, without iterative backprompting and fine-tuning, is empirically evaluated using three industrial case studies from the heavy vehicle manufacturer Scania. The goal is to automatically generate industrial-quality code from specifications only. Different combinations of formal ACSL specifications and natural language specifications are explored. The results indicate that formally correct code can be generated even without the application of iterative backprompting and fine-tuning.
M. S. Patil—Work was done while the author was at Scania.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ahrendt, W., Beckert, B., Bubel, R., Hähnle, R., Schmitt, P.H., Ulbrich, M.: Deductive software verification-the key book. Lect. Notes Comput. Sci. 10001 (2016)
Ahrendt, W., Gurov, D., Johansson, M., Rümmer, P.: Trico-triple co-piloting of implementation, specification and tests. In: International Symposium on Leveraging Applications of Formal Methods, pp. 174–187. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-19849-6_11
Baudin, P., Bobot, F., Correnson, L., Dargaye, Z., Blanchard, A.: WP Plug-in Manual. CEA LIST, Inria (2020). https://www.frama-c.com/download/frama-c-wp-manual.pdf
Baudin, P., Filliâtre, J.C., Marché, C., Monate, B., Moy, Y., Prevosto, V.: Acsl: Ansi/iso c specification (2021). https://frama-c.com/html/acsl.html
Chen, M., et al.: Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021)
Conchon, S., Coquereau, A., Iguernlala, M., Mebsout, A.: Alt-ergo 2.2. In: SMT Workshop: International Workshop on Satisfiability Modulo Theories (2018)
Correnson, L., et al.: Frama-C User Manual. CEA LIST, Inria (2020). http://frama-c.com/download/frama-c-user-manual.pdf
Cosler, M., Hahn, C., Mendoza, D., Schmitt, F., Trippel, C.: nl2spec: Interactively translating unstructured natural language to temporal logics with large language models. In: International Conference on Computer Aided Verification, pp. 383–396. Springer, Heidelberg (2023). https://doi.org/10.1007/978-3-031-37703-7_18
Cosler, M., Schmitt, F., Hahn, C., Finkbeiner, B.: Iterative circuit repair against formal specifications. arXiv preprint arXiv:2303.01158 (2023)
de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78800-3_24
Hahn, C., Schmitt, F., Kreber, J.U., Rabe, M.N., Finkbeiner, B.: Teaching temporal logics to neural networks. arXiv preprint arXiv:2003.04218 (2020)
Hahn, C., Schmitt, F., Tillman, J.J., Metzger, N., Siber, J., Finkbeiner, B.: Formal specifications from natural language. arXiv preprint arXiv:2206.01962 (2022)
Hähnle, R., Huisman, M.: Deductive software verification: from pen-and-paper proofs to industrial tools. In: Computing and Software Science: State of the Art and Perspectives, pp. 345–373 (2019)
Holzmann, G.J.: The power of 10: rules for developing safety-critical code. Computer 39(6), 95–99 (2006)
International Organization for Standardization: Programming languages—C. ISO/IEC 9899:1999 (1999)
International Organization for Standardization: Road vehicles controller area network (CAN). ISO 11898-1:2015 (2015)
Kambhampati, S., et al.: Llms can’t plan, but can help planning in llm-modulo frameworks. arXiv preprint arXiv:2402.01817 (2024)
Khattab, O., et al.: Dspy: compiling declarative language model calls into self-improving pipelines. arXiv preprint arXiv:2310.03714 (2023)
Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are zero-shot reasoners. Adv. Neural. Inf. Process. Syst. 35, 22199–22213 (2022)
Leino, K.R.M.: Efficient weakest preconditions. Inf. Process. Lett. 93(6), 281–288 (2005)
Lewkowycz, A., et al.: Solving quantitative reasoning problems with language models. Adv. Neural. Inf. Process. Syst. 35, 3843–3857 (2022)
Malík, V., Vojnar, T.: Automatically checking semantic equivalence between versions of large-scale c projects. In: 2021 14th IEEE Conference on Software Testing, Verification and Validation (ICST), pp. 329–339. IEEE (2021). https://github.com/diffkemp/diffkemp
MIRA Ltd: MISRA-C:2004 Guidelines for the use of the C language in critical systems (2004). https://misra.org.uk/misra-c
Olausson, T.X., Inala, J.P., Wang, C., Gao, J., Solar-Lezama, A.: Demystifying gpt self-repair for code generation. arXiv preprint arXiv:2306.09896 (2023)
OpenAI: gpt-3.5-turbo-0125 (2022). https://www.openai.com. Accessed 25 Apr 2024
OpenAI: gpt-4-turbo (2023). https://www.openai.com. Accessed 25 Apr 2024
Ouyang, L., et al.: Training language models to follow instructions with human feedback. Adv. Neural. Inf. Process. Syst. 35, 27730–27744 (2022)
Rafailov, R., Sharma, A., Mitchell, E., Manning, C.D., Ermon, S., Finn, C.: Direct preference optimization: your language model is secretly a reward model. Adv. Neural Inf. Process. Syst. 36 (2024)
Ross, S.I., Martinez, F., Houde, S., Muller, M., Weisz, J.D.: The programmer’s assistant: conversational interaction with a large language model for software development. In: Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 491–514 (2023)
Schäfer, M., Nadi, S., Eghbali, A., Tip, F.: An empirical evaluation of using large language models for automated unit test generation. IEEE Trans. Softw. Eng. (2023)
Shi, F., et al.: Language models are multilingual chain-of-thought reasoners. arXiv preprint arXiv:2210.03057 (2022)
For Standardization (ISO), I.O.: Iso 26262-1: 2018–road vehicles-functional safety. Geneva, Switzerland (2018)
Tambon, F., Dakhel, A.M., Nikanjam, A., Khomh, F., Desmarais, M.C., Antoniol, G.: Bugs in large language models generated code. arXiv preprint arXiv:2403.08937 (2024)
Ung, G., Amilon, J., Gurov, D., Lidström, C., Nyberg, M., Palmskog, K.: Post-hoc formal verification of automotive software with informal requirements: an experience report. In: Accepted at 2024 IEEE 32nd International Requirements Engineering Conference (RE). IEEE (2024)
Vaithilingam, P., Zhang, T., Glassman, E.L.: Expectation vs. experience: evaluating the usability of code generation tools powered by large language models. In: Chi Conference on Human Factors in Computing Systems Extended Abstracts, pp. 1–7 (2022)
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Wang, B., et al.: Towards understanding chain-of-thought prompting: an empirical study of what matters. arXiv preprint arXiv:2212.10001 (2022)
Wei, X., et al.: Zero-shot information extraction via chatting with chatgpt. arXiv preprint arXiv:2302.10205 (2023)
Wikipedia contributors: Assertion (software development)—Wikipedia, the free encyclopedia (2023). https://en.wikipedia.org/w/index.php?title=Assertion_(software_development)&oldid=1179241560. Accessed 27 May 2024
Yuksekgonul, M., et al.: Textgrad: Automatic “differentiation" via text (2024). arXiv preprint arXiv:2406.07496
Zhong, L., Wang, Z.: Can llm replace stack overflow? a study on robustness and reliability of large language model code generation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 21841–21849 (2024)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix A
Appendix A
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Patil, M.S., Ung, G., Nyberg, M. (2025). Towards Specification-Driven LLM-Based Generation of Embedded Automotive Software. In: Steffen, B. (eds) Bridging the Gap Between AI and Reality. AISoLA 2024. Lecture Notes in Computer Science, vol 15217. Springer, Cham. https://doi.org/10.1007/978-3-031-75434-0_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-75434-0_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-75433-3
Online ISBN: 978-3-031-75434-0
eBook Packages: Computer ScienceComputer Science (R0)