Skip to main content

Learning Temporal Task Specifications From Demonstrations

  • Conference paper
  • First Online:
Explainable and Transparent AI and Multi-Agent Systems (EXTRAAMAS 2024)

Abstract

As we progress towards real-world deployment, the critical need for interpretability in reinforcement learning algorithms grows more pivotal, ensuring the safety and reliability of intelligent agents. This paper tackles the challenge of acquiring task specifications in linear temporal logic through expert demonstrations, aiming to alleviate the burdensome task of specification engineering. The rich semantics of temporal logics serve as an interpretable framework for delineating intricate, multi-stage tasks. We propose a method which iteratively learns a task specification and a nominal policy solving this task. In each iteration, the task specification is refined to better distinguish expert trajectories from trajectories sampled from the nominal policy. With this process we obtain a concise and interpretable task specification. Unlike previous work, our method is capable of learning directly from trajectories in the original state space and does not require predefined atomic propositions. We showcase the effectiveness of our method on multiple tasks in both an office and a Minecraft-inspired environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://gitlab.ilabt.imec.be/mwbaert/l-ltl-fd.

References

  1. Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 1 (2004)

    Google Scholar 

  2. Aksaray, D., Jones, A., Kong, Z., Schwager, M., Belta, C.: Q-learning for robust satisfaction of signal temporal logic specifications. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 6565–6570. IEEE (2016)

    Google Scholar 

  3. Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Mané, D.: Concrete problems in AI safety. arXiv preprint arXiv:1606.06565 (2016)

  4. Andreas, J., Klein, D., Levine, S.: Modular multitask reinforcement learning with policy sketches. In: International conference on machine learning, pp. 166–175. PMLR (2017)

    Google Scholar 

  5. Baert, M., Mazzaglia, P., Leroux, S., Simoens, P.: Maximum causal entropy inverse constrained reinforcement learning. arXiv preprint arXiv:2305.02857 (2023)

  6. Bellman, R.: A Markovian decision process. J. Math. Mech. 679–684 (1957)

    Google Scholar 

  7. Bombara, G., Belta, C.: Offline and online learning of signal temporal logic formulae using decision trees. ACM Trans. Cyber-Phys. Syst. 5(3), 1–23 (2021)

    Article  Google Scholar 

  8. Buchi, J.R.: On a decision method in restricted second order arithmetic. In: Proceedings of the International Congress on Logic, Methodology and Philosophy of Science (1960)

    Google Scholar 

  9. Camacho, A., Icarte, R.T., Klassen, T.Q., Valenzano, R.A., McIlraith, S.A.: LTL and beyond: Formal languages for reward function specification in reinforcement learning. In: IJCAI, vol. 19, pp. 6065–6073 (2019)

    Google Scholar 

  10. Camacho, A., Varley, J., Jain, D., Iscen, A., Kalashnikov, D.: Disentangled planning and control in vision based robotics via reward machines. arXiv preprint arXiv:2012.14464 (2020)

  11. Chiu, T.Y., Le Ny, J., David, J.P.: Temporal logic explanations for dynamic decision systems using anchors and monte Carlo tree search. Artif. Intell. 318, 103897 (2023)

    Article  MathSciNet  Google Scholar 

  12. Chou, G., Ozay, N., Berenson, D.: Learning temporal logic formulas from suboptimal demonstrations: theory and experiments. Auton. Robot. 46(1), 149–174 (2022)

    Article  Google Scholar 

  13. Duret-Lutz, A., Poitrenaud, D.: SPOT: an extensible model checking library using transition-based generalized büchi automata. In: DeGroot, D., Harrison, P.G., Wijshoff, H.A.G., Segall, Z. (eds.) 12th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2004), 4-8 October 2004, Vollendam, The Netherlands, pp. 76–83. IEEE Computer Society (2004). https://doi.org/10.1109/MASCOT.2004.1348184

  14. Dwyer, M.B., Avrunin, G.S., Corbett, J.C.: Patterns in property specifications for finite-state verification. In: Proceedings of the 21st International Conference on Software Engineering, pp. 411–420 (1999)

    Google Scholar 

  15. Fronda, N., Abbas, H.: Differentiable inference of temporal logic formulas. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 41(11), 4193–4204 (2022)

    Article  Google Scholar 

  16. Furelos-Blanco, D., Law, M., Jonsson, A., Broda, K., Russo, A.: Hierarchies of reward machines. In: International Conference on Machine Learning, pp. 10494–10541. PMLR (2023)

    Google Scholar 

  17. Ghiorzi, E., Colledanchise, M., Piquet, G., Bernagozzi, S., Tacchella, A., Natale, L.: Learning linear temporal properties for autonomous robotic systems. IEEE Robot. Autom. Lett. 8(5), 2930–2937 (2023)

    Article  Google Scholar 

  18. Ho, J., Ermon, S.: Generative adversarial imitation learning. In: Advances in Neural Information Processing Systems, vol. 29 (2016)

    Google Scholar 

  19. Icarte, R.T., Klassen, T.Q., Valenzano, R., McIlraith, S.A.: Reward machines: exploiting reward function structure in reinforcement learning. J. Artif. Intell. Res. 73, 173–208 (2022)

    Article  MathSciNet  Google Scholar 

  20. Jha, S., Tiwari, A., Seshia, S.A., Sahai, T., Shankar, N.: Telex: learning signal temporal logic from positive examples using tightness metric. Formal Methods Syst. Des. 54, 364–387 (2019)

    Article  Google Scholar 

  21. Kasenberg, D., Scheutz, M.: Interpretable apprenticeship learning with temporal logic specifications. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 4914–4921. IEEE (2017)

    Google Scholar 

  22. Kong, Z., Jones, A., Belta, C.: Temporal logics for learning and detection of anomalous behavior. IEEE Trans. Autom. Control 62(3), 1210–1222 (2016)

    Article  MathSciNet  Google Scholar 

  23. Kuo, Y.L., Katz, B., Barbu, A.: Encoding formulas as deep networks: reinforcement learning for zero-shot execution of LTL formulas. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5604–5610. IEEE (2020)

    Google Scholar 

  24. Leung, K., Aréchiga, N., Pavone, M.: Backpropagation through signal temporal logic specifications: infusing logical structure into gradient-based methods. Int. J. Robot. Res. 42(6), 356–370 (2023)

    Article  Google Scholar 

  25. Li, D., Cai, M., Vasile, C.I., Tron, R.: Learning signal temporal logic through neural network for interpretable classification. In: 2023 American Control Conference (ACC), pp. 1907–1914. IEEE (2023)

    Google Scholar 

  26. Li, X., Vasile, C.I., Belta, C.: Reinforcement learning with temporal logic rewards. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3834–3839. IEEE (2017)

    Google Scholar 

  27. Littman, M.L., Topcu, U., Fu, J., Isbell, C., Wen, M., MacGlashan, J.: Environment-independent task specifications via GLTL. arXiv preprint arXiv:1704.04341 (2017)

  28. Pnueli, A.: The temporal logic of programs. In: 18th Annual Symposium on Foundations of Computer Science, Providence, Rhode Island, USA, 31 October - 1 November 1977, pp. 46–57. IEEE Computer Society (1977). https://doi.org/10.1109/SFCS.1977.32

  29. Roy, R., Gaglione, J.R., Baharisangari, N., Neider, D., Xu, Z., Topcu, U.: Learning interpretable temporal properties from positive examples only. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 6507–6515 (2023)

    Google Scholar 

  30. Shah, A., Kamath, P., Shah, J.A., Li, S.: Bayesian inference of temporal task specifications from demonstrations. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

    Google Scholar 

  31. Vazquez-Chanlatte, M., Jha, S., Tiwari, A., Ho, M.K., Seshia, S.: Learning task specifications from demonstrations. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

    Google Scholar 

  32. Voloshin, C., Le, H., Chaudhuri, S., Yue, Y.: Policy optimization with linear temporal logic constraints. Adv. Neural. Inf. Process. Syst. 35, 17690–17702 (2022)

    Google Scholar 

  33. Xiong, Z., Eappen, J., Qureshi, A.H., Jagannathan, S.: Constrained hierarchical deep reinforcement learning with differentiable formal specifications (2022)

    Google Scholar 

  34. Xu, Z., Gavran, I., Ahmad, Y., Majumdar, R., Neider, D., Topcu, U., Wu, B.: Joint inference of reward machines and policies for reinforcement learning. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 30, pp. 590–598 (2020)

    Google Scholar 

  35. Xu, Z., Topcu, U.: Transfer of temporal logic formulas in reinforcement learning. In: IJCAI: Proceedings of the Conference, vol. 28, p. 4010. NIH Public Access (2019)

    Google Scholar 

  36. Yan, R., Julius, A.: Neural network for weighted signal temporal logic. arXiv preprint arXiv:2104.05435 (2021)

  37. Yan, R., Ma, T., Fokoue, A., Chang, M., Julius, A.: Neuro-symbolic models for interpretable time series classification using temporal logic description. In: 2022 IEEE International Conference on Data Mining (ICDM), pp. 618–627. IEEE (2022)

    Google Scholar 

Download references

Acknowledgements

S.L. and P.S. acknowledge the financial support from the Flanders AI Research Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mattijs Baert .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Baert, M., Leroux, S., Simoens, P. (2024). Learning Temporal Task Specifications From Demonstrations. In: Calvaresi, D., et al. Explainable and Transparent AI and Multi-Agent Systems. EXTRAAMAS 2024. Lecture Notes in Computer Science(), vol 14847. Springer, Cham. https://doi.org/10.1007/978-3-031-70074-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70074-3_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70073-6

  • Online ISBN: 978-3-031-70074-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics