Abstract
As we progress towards real-world deployment, the critical need for interpretability in reinforcement learning algorithms grows more pivotal, ensuring the safety and reliability of intelligent agents. This paper tackles the challenge of acquiring task specifications in linear temporal logic through expert demonstrations, aiming to alleviate the burdensome task of specification engineering. The rich semantics of temporal logics serve as an interpretable framework for delineating intricate, multi-stage tasks. We propose a method which iteratively learns a task specification and a nominal policy solving this task. In each iteration, the task specification is refined to better distinguish expert trajectories from trajectories sampled from the nominal policy. With this process we obtain a concise and interpretable task specification. Unlike previous work, our method is capable of learning directly from trajectories in the original state space and does not require predefined atomic propositions. We showcase the effectiveness of our method on multiple tasks in both an office and a Minecraft-inspired environment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 1 (2004)
Aksaray, D., Jones, A., Kong, Z., Schwager, M., Belta, C.: Q-learning for robust satisfaction of signal temporal logic specifications. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 6565–6570. IEEE (2016)
Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Mané, D.: Concrete problems in AI safety. arXiv preprint arXiv:1606.06565 (2016)
Andreas, J., Klein, D., Levine, S.: Modular multitask reinforcement learning with policy sketches. In: International conference on machine learning, pp. 166–175. PMLR (2017)
Baert, M., Mazzaglia, P., Leroux, S., Simoens, P.: Maximum causal entropy inverse constrained reinforcement learning. arXiv preprint arXiv:2305.02857 (2023)
Bellman, R.: A Markovian decision process. J. Math. Mech. 679–684 (1957)
Bombara, G., Belta, C.: Offline and online learning of signal temporal logic formulae using decision trees. ACM Trans. Cyber-Phys. Syst. 5(3), 1–23 (2021)
Buchi, J.R.: On a decision method in restricted second order arithmetic. In: Proceedings of the International Congress on Logic, Methodology and Philosophy of Science (1960)
Camacho, A., Icarte, R.T., Klassen, T.Q., Valenzano, R.A., McIlraith, S.A.: LTL and beyond: Formal languages for reward function specification in reinforcement learning. In: IJCAI, vol. 19, pp. 6065–6073 (2019)
Camacho, A., Varley, J., Jain, D., Iscen, A., Kalashnikov, D.: Disentangled planning and control in vision based robotics via reward machines. arXiv preprint arXiv:2012.14464 (2020)
Chiu, T.Y., Le Ny, J., David, J.P.: Temporal logic explanations for dynamic decision systems using anchors and monte Carlo tree search. Artif. Intell. 318, 103897 (2023)
Chou, G., Ozay, N., Berenson, D.: Learning temporal logic formulas from suboptimal demonstrations: theory and experiments. Auton. Robot. 46(1), 149–174 (2022)
Duret-Lutz, A., Poitrenaud, D.: SPOT: an extensible model checking library using transition-based generalized büchi automata. In: DeGroot, D., Harrison, P.G., Wijshoff, H.A.G., Segall, Z. (eds.) 12th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2004), 4-8 October 2004, Vollendam, The Netherlands, pp. 76–83. IEEE Computer Society (2004). https://doi.org/10.1109/MASCOT.2004.1348184
Dwyer, M.B., Avrunin, G.S., Corbett, J.C.: Patterns in property specifications for finite-state verification. In: Proceedings of the 21st International Conference on Software Engineering, pp. 411–420 (1999)
Fronda, N., Abbas, H.: Differentiable inference of temporal logic formulas. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 41(11), 4193–4204 (2022)
Furelos-Blanco, D., Law, M., Jonsson, A., Broda, K., Russo, A.: Hierarchies of reward machines. In: International Conference on Machine Learning, pp. 10494–10541. PMLR (2023)
Ghiorzi, E., Colledanchise, M., Piquet, G., Bernagozzi, S., Tacchella, A., Natale, L.: Learning linear temporal properties for autonomous robotic systems. IEEE Robot. Autom. Lett. 8(5), 2930–2937 (2023)
Ho, J., Ermon, S.: Generative adversarial imitation learning. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Icarte, R.T., Klassen, T.Q., Valenzano, R., McIlraith, S.A.: Reward machines: exploiting reward function structure in reinforcement learning. J. Artif. Intell. Res. 73, 173–208 (2022)
Jha, S., Tiwari, A., Seshia, S.A., Sahai, T., Shankar, N.: Telex: learning signal temporal logic from positive examples using tightness metric. Formal Methods Syst. Des. 54, 364–387 (2019)
Kasenberg, D., Scheutz, M.: Interpretable apprenticeship learning with temporal logic specifications. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 4914–4921. IEEE (2017)
Kong, Z., Jones, A., Belta, C.: Temporal logics for learning and detection of anomalous behavior. IEEE Trans. Autom. Control 62(3), 1210–1222 (2016)
Kuo, Y.L., Katz, B., Barbu, A.: Encoding formulas as deep networks: reinforcement learning for zero-shot execution of LTL formulas. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5604–5610. IEEE (2020)
Leung, K., Aréchiga, N., Pavone, M.: Backpropagation through signal temporal logic specifications: infusing logical structure into gradient-based methods. Int. J. Robot. Res. 42(6), 356–370 (2023)
Li, D., Cai, M., Vasile, C.I., Tron, R.: Learning signal temporal logic through neural network for interpretable classification. In: 2023 American Control Conference (ACC), pp. 1907–1914. IEEE (2023)
Li, X., Vasile, C.I., Belta, C.: Reinforcement learning with temporal logic rewards. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3834–3839. IEEE (2017)
Littman, M.L., Topcu, U., Fu, J., Isbell, C., Wen, M., MacGlashan, J.: Environment-independent task specifications via GLTL. arXiv preprint arXiv:1704.04341 (2017)
Pnueli, A.: The temporal logic of programs. In: 18th Annual Symposium on Foundations of Computer Science, Providence, Rhode Island, USA, 31 October - 1 November 1977, pp. 46–57. IEEE Computer Society (1977). https://doi.org/10.1109/SFCS.1977.32
Roy, R., Gaglione, J.R., Baharisangari, N., Neider, D., Xu, Z., Topcu, U.: Learning interpretable temporal properties from positive examples only. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 6507–6515 (2023)
Shah, A., Kamath, P., Shah, J.A., Li, S.: Bayesian inference of temporal task specifications from demonstrations. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Vazquez-Chanlatte, M., Jha, S., Tiwari, A., Ho, M.K., Seshia, S.: Learning task specifications from demonstrations. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Voloshin, C., Le, H., Chaudhuri, S., Yue, Y.: Policy optimization with linear temporal logic constraints. Adv. Neural. Inf. Process. Syst. 35, 17690–17702 (2022)
Xiong, Z., Eappen, J., Qureshi, A.H., Jagannathan, S.: Constrained hierarchical deep reinforcement learning with differentiable formal specifications (2022)
Xu, Z., Gavran, I., Ahmad, Y., Majumdar, R., Neider, D., Topcu, U., Wu, B.: Joint inference of reward machines and policies for reinforcement learning. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 30, pp. 590–598 (2020)
Xu, Z., Topcu, U.: Transfer of temporal logic formulas in reinforcement learning. In: IJCAI: Proceedings of the Conference, vol. 28, p. 4010. NIH Public Access (2019)
Yan, R., Julius, A.: Neural network for weighted signal temporal logic. arXiv preprint arXiv:2104.05435 (2021)
Yan, R., Ma, T., Fokoue, A., Chang, M., Julius, A.: Neuro-symbolic models for interpretable time series classification using temporal logic description. In: 2022 IEEE International Conference on Data Mining (ICDM), pp. 618–627. IEEE (2022)
Acknowledgements
S.L. and P.S. acknowledge the financial support from the Flanders AI Research Program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Baert, M., Leroux, S., Simoens, P. (2024). Learning Temporal Task Specifications From Demonstrations. In: Calvaresi, D., et al. Explainable and Transparent AI and Multi-Agent Systems. EXTRAAMAS 2024. Lecture Notes in Computer Science(), vol 14847. Springer, Cham. https://doi.org/10.1007/978-3-031-70074-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-70074-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70073-6
Online ISBN: 978-3-031-70074-3
eBook Packages: Computer ScienceComputer Science (R0)