Skip to main content

Towards Specification-Driven LLM-Based Generation of Embedded Automotive Software

  • Conference paper
  • First Online:
Bridging the Gap Between AI and Reality (AISoLA 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15217))

Included in the following conference series:

Abstract

The paper studies how code generation by LLMs can be combined with formal verification to produce critical embedded software. The first contribution is a general framework, spec2code, in which LLMs are combined with different types of critics that produce feedback for iterative backprompting and fine-tuning. The second contribution presents a first feasibility study, where a minimalistic instantiation of spec2code, without iterative backprompting and fine-tuning, is empirically evaluated using three industrial case studies from the heavy vehicle manufacturer Scania. The goal is to automatically generate industrial-quality code from specifications only. Different combinations of formal ACSL specifications and natural language specifications are explored. The results indicate that formally correct code can be generated even without the application of iterative backprompting and fine-tuning.

M. S. Patil—Work was done while the author was at Scania.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ahrendt, W., Beckert, B., Bubel, R., Hähnle, R., Schmitt, P.H., Ulbrich, M.: Deductive software verification-the key book. Lect. Notes Comput. Sci. 10001 (2016)

    Google Scholar 

  2. Ahrendt, W., Gurov, D., Johansson, M., Rümmer, P.: Trico-triple co-piloting of implementation, specification and tests. In: International Symposium on Leveraging Applications of Formal Methods, pp. 174–187. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-19849-6_11

  3. Baudin, P., Bobot, F., Correnson, L., Dargaye, Z., Blanchard, A.: WP Plug-in Manual. CEA LIST, Inria (2020). https://www.frama-c.com/download/frama-c-wp-manual.pdf

  4. Baudin, P., Filliâtre, J.C., Marché, C., Monate, B., Moy, Y., Prevosto, V.: Acsl: Ansi/iso c specification (2021). https://frama-c.com/html/acsl.html

  5. Chen, M., et al.: Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021)

  6. Conchon, S., Coquereau, A., Iguernlala, M., Mebsout, A.: Alt-ergo 2.2. In: SMT Workshop: International Workshop on Satisfiability Modulo Theories (2018)

    Google Scholar 

  7. Correnson, L., et al.: Frama-C User Manual. CEA LIST, Inria (2020). http://frama-c.com/download/frama-c-user-manual.pdf

  8. Cosler, M., Hahn, C., Mendoza, D., Schmitt, F., Trippel, C.: nl2spec: Interactively translating unstructured natural language to temporal logics with large language models. In: International Conference on Computer Aided Verification, pp. 383–396. Springer, Heidelberg (2023). https://doi.org/10.1007/978-3-031-37703-7_18

  9. Cosler, M., Schmitt, F., Hahn, C., Finkbeiner, B.: Iterative circuit repair against formal specifications. arXiv preprint arXiv:2303.01158 (2023)

  10. de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78800-3_24

    Chapter  MATH  Google Scholar 

  11. Hahn, C., Schmitt, F., Kreber, J.U., Rabe, M.N., Finkbeiner, B.: Teaching temporal logics to neural networks. arXiv preprint arXiv:2003.04218 (2020)

  12. Hahn, C., Schmitt, F., Tillman, J.J., Metzger, N., Siber, J., Finkbeiner, B.: Formal specifications from natural language. arXiv preprint arXiv:2206.01962 (2022)

  13. Hähnle, R., Huisman, M.: Deductive software verification: from pen-and-paper proofs to industrial tools. In: Computing and Software Science: State of the Art and Perspectives, pp. 345–373 (2019)

    Google Scholar 

  14. Holzmann, G.J.: The power of 10: rules for developing safety-critical code. Computer 39(6), 95–99 (2006)

    MATH  Google Scholar 

  15. International Organization for Standardization: Programming languages—C. ISO/IEC 9899:1999 (1999)

    Google Scholar 

  16. International Organization for Standardization: Road vehicles controller area network (CAN). ISO 11898-1:2015 (2015)

    Google Scholar 

  17. Kambhampati, S., et al.: Llms can’t plan, but can help planning in llm-modulo frameworks. arXiv preprint arXiv:2402.01817 (2024)

  18. Khattab, O., et al.: Dspy: compiling declarative language model calls into self-improving pipelines. arXiv preprint arXiv:2310.03714 (2023)

  19. Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are zero-shot reasoners. Adv. Neural. Inf. Process. Syst. 35, 22199–22213 (2022)

    Google Scholar 

  20. Leino, K.R.M.: Efficient weakest preconditions. Inf. Process. Lett. 93(6), 281–288 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  21. Lewkowycz, A., et al.: Solving quantitative reasoning problems with language models. Adv. Neural. Inf. Process. Syst. 35, 3843–3857 (2022)

    MATH  Google Scholar 

  22. Malík, V., Vojnar, T.: Automatically checking semantic equivalence between versions of large-scale c projects. In: 2021 14th IEEE Conference on Software Testing, Verification and Validation (ICST), pp. 329–339. IEEE (2021). https://github.com/diffkemp/diffkemp

  23. MIRA Ltd: MISRA-C:2004 Guidelines for the use of the C language in critical systems (2004). https://misra.org.uk/misra-c

  24. Olausson, T.X., Inala, J.P., Wang, C., Gao, J., Solar-Lezama, A.: Demystifying gpt self-repair for code generation. arXiv preprint arXiv:2306.09896 (2023)

  25. OpenAI: gpt-3.5-turbo-0125 (2022). https://www.openai.com. Accessed 25 Apr 2024

  26. OpenAI: gpt-4-turbo (2023). https://www.openai.com. Accessed 25 Apr 2024

  27. Ouyang, L., et al.: Training language models to follow instructions with human feedback. Adv. Neural. Inf. Process. Syst. 35, 27730–27744 (2022)

    MATH  Google Scholar 

  28. Rafailov, R., Sharma, A., Mitchell, E., Manning, C.D., Ermon, S., Finn, C.: Direct preference optimization: your language model is secretly a reward model. Adv. Neural Inf. Process. Syst. 36 (2024)

    Google Scholar 

  29. Ross, S.I., Martinez, F., Houde, S., Muller, M., Weisz, J.D.: The programmer’s assistant: conversational interaction with a large language model for software development. In: Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 491–514 (2023)

    Google Scholar 

  30. Schäfer, M., Nadi, S., Eghbali, A., Tip, F.: An empirical evaluation of using large language models for automated unit test generation. IEEE Trans. Softw. Eng. (2023)

    Google Scholar 

  31. Shi, F., et al.: Language models are multilingual chain-of-thought reasoners. arXiv preprint arXiv:2210.03057 (2022)

  32. For Standardization (ISO), I.O.: Iso 26262-1: 2018–road vehicles-functional safety. Geneva, Switzerland (2018)

    Google Scholar 

  33. Tambon, F., Dakhel, A.M., Nikanjam, A., Khomh, F., Desmarais, M.C., Antoniol, G.: Bugs in large language models generated code. arXiv preprint arXiv:2403.08937 (2024)

  34. Ung, G., Amilon, J., Gurov, D., Lidström, C., Nyberg, M., Palmskog, K.: Post-hoc formal verification of automotive software with informal requirements: an experience report. In: Accepted at 2024 IEEE 32nd International Requirements Engineering Conference (RE). IEEE (2024)

    Google Scholar 

  35. Vaithilingam, P., Zhang, T., Glassman, E.L.: Expectation vs. experience: evaluating the usability of code generation tools powered by large language models. In: Chi Conference on Human Factors in Computing Systems Extended Abstracts, pp. 1–7 (2022)

    Google Scholar 

  36. Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  37. Wang, B., et al.: Towards understanding chain-of-thought prompting: an empirical study of what matters. arXiv preprint arXiv:2212.10001 (2022)

  38. Wei, X., et al.: Zero-shot information extraction via chatting with chatgpt. arXiv preprint arXiv:2302.10205 (2023)

  39. Wikipedia contributors: Assertion (software development)—Wikipedia, the free encyclopedia (2023). https://en.wikipedia.org/w/index.php?title=Assertion_(software_development)&oldid=1179241560. Accessed 27 May 2024

  40. Yuksekgonul, M., et al.: Textgrad: Automatic “differentiation" via text (2024). arXiv preprint arXiv:2406.07496

  41. Zhong, L., Wang, Z.: Can llm replace stack overflow? a study on robustness and reliability of large language model code generation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 21841–21849 (2024)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minal Suresh Patil .

Editor information

Editors and Affiliations

Appendix A

Appendix A

figure d
figure e

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Patil, M.S., Ung, G., Nyberg, M. (2025). Towards Specification-Driven LLM-Based Generation of Embedded Automotive Software. In: Steffen, B. (eds) Bridging the Gap Between AI and Reality. AISoLA 2024. Lecture Notes in Computer Science, vol 15217. Springer, Cham. https://doi.org/10.1007/978-3-031-75434-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-75434-0_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-75433-3

  • Online ISBN: 978-3-031-75434-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics