Skip to main content

An Approach of Transforming Non-Markovian Reward to Markovian Reward

  • Conference paper
  • First Online:
Structured Object-Oriented Formal Language and Method (SOFL+MSVL 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13854))

Abstract

In many decision-making problems, a rational reward function is required, which can correctly guide agents to make ideal operations. For example, an intelligent robot needs to check its power before sweeping. This kind of reward functions involves historical states, rather than a single current state. It is referred to as non-Markovian reward. However, state-of-the-art MDP (Markov Decision Process) planners only support Markovian reward. In this paper, we present an approach to transform non-Markovian reward expressed in \({LTL}_{f}\) (Linear Temporal Logic over Finite Traces) into Markovian reward. \({LTL}_{f}\) is converted into an automaton which is compiled to standard MDP model. Then the reward function of the model is further optimized through reward shaping in order to speed up planning. The reshaped reward function can be exploited by MDP planners to guide search and produce good training results. Finally, experiments with augmented International Probabilistic Planning Competition (IPPC) domain demonstrates the effectiveness and feasibility of our approach, especially the reshaped reward function can significantly improve the performance of planners.

This research is supported by National Natural Science Foundation of China (61806158); China Postdoctoral Science Foundation (2019T120881, 2018M643585); Fundamental Research Funds for the Central Universities (XJS220304); Special scientific Research Project of Education Department of Shaanxi Province (21JK0844)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Li, J., Pu, G., Zhang, Y., et al.: Sat-based explicit ltlf satisfiability checking. Artif. Intell. 289, 103369 (2020)

    Article  MATH  Google Scholar 

  2. Sanner, S.: Relational dynamic influence diagram language (RDDL): Language description. Unpublished ms. Australian National University 32, 27 (2010)

    Google Scholar 

  3. Li, X., Vasile, C.I., Belta, C.: Reinforcement learning with temporal logic rewards. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3834–3839. IEEE (2017)

    Google Scholar 

  4. Icarte, R.T., Klassen, T., Valenzano, R., et al.: Using reward machines for high-level task specification and decomposition in reinforcement learning. In: International Conference on Machine Learning. PMLR, pp. 2107–2116 (2018)

    Google Scholar 

  5. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley 2014)

    Google Scholar 

  6. Thiébaux, S., Kabanza, F., Slanley, J.: Anytime state-based solution methods for decision processes with non-Markovian rewards. arXiv preprint arXiv:1301.0606, 2012

  7. Bonet, B., Geffner, H.: mGPT: a probabilistic planner based on heuristic search. J. Artif. Intell. Res. 24, 933–944 (2005)

    Article  MATH  Google Scholar 

  8. Pulver, H., Eiras, F., Carozza, L., et al.: PILOT: Efficient planning by imitation learning and optimisation for safe autonomous driving. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1442–1449. IEEE (2021)

    Google Scholar 

  9. Keller, T., Eyerich, P.: PROST: probabilistic planning based on UCT. In: Twenty-Second International Conference on Automated Planning and Scheduling (2012)

    Google Scholar 

  10. Geißer, F., Speck, D., Keller, T.: An analysis of the probabilistic track of the IPC 2018. In: ICAPS 2019 Workshop on the International Planning Competition (WIPC), pp. 27–35 (2019)

    Google Scholar 

  11. Camacho, A., Chen, O., Sanner, S., et al.: Non-Markovian rewards expressed in LTL: Guiding search via reward shaping (extended version). In: GoalsRL, a Workshop Collocated with ICML/IJCAI/AAMAS (2018)

    Google Scholar 

  12. Lucas, S.M., Reynolds, T.J.: Learning deterministic finite automata with a smart state labeling evolutionary algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 27(7), 1063–1074 (2005)

    Article  Google Scholar 

  13. Brafman, R., De Giacomo, G., Patrizi, F.: LTLf/LDLf non-Markovian rewards. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1 (2018)

    Google Scholar 

  14. Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: Theory and application to reward shaping. Icml 99, 278–287 (1999)

    Google Scholar 

  15. Sohrabi, S., Baier, J.A., Mcilraith, S.A.: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence Preferred Explanations: Theory and Generation via Planning (2014)

    Google Scholar 

  16. Ng, A.Y., Harada, D., Russell, S.: Theory and application to reward shaping. In: Proceedings of the Sixteenth International Conference on Machine Learning (1999)

    Google Scholar 

  17. Little, I., Thiebaux, S.: Probabilistic planning vs. replanning. In: ICAPS Workshop on IPC: Past, Present and Future (2007)

    Google Scholar 

  18. Bacchus, F., Boutilier, C., Grove, A.: Rewarding behaviors. In: Proceedings of the National Conference on Artificial Intelligence, pp. 1160–1167 (1996)

    Google Scholar 

  19. Levesque, H.J., Reiter, R., Lespérance, Y., et al.: GOLOG: a logic programming language for dynamic domains. J. Logic Program. 31(1–3), 59–83 (1997)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xu Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Miao, R., Lu, X., Cui, J. (2023). An Approach of Transforming Non-Markovian Reward to Markovian Reward. In: Liu, S., Duan, Z., Liu, A. (eds) Structured Object-Oriented Formal Language and Method. SOFL+MSVL 2022. Lecture Notes in Computer Science, vol 13854. Springer, Cham. https://doi.org/10.1007/978-3-031-29476-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-29476-1_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-29475-4

  • Online ISBN: 978-3-031-29476-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics