Abstract
The formal synthesis of control policies is a classic problem that entails the computation of optimal strategies for an agent interacting in some environment such that some formal guarantees of behavior are met. These guarantees are specified as part of the problem description and can be supplied by the end user in the form of various logics (e.g., Linear Temporal Logic and Computation Tree Logic) or imposed via constraints on agent-environment interactions. The latter has received significant attention in recent years within the context of constraints on the asymptotic frequency with which an agent visits states of interest. This is captured by the steady-state distribution of the agent. The formal synthesis of stochastic policies satisfying constraints on this distribution has been studied. However, the derivation of deterministic policies for the same has received little attention. In this paper, we focus on this deterministic steady-state control problem, i.e., the problem of obtaining a deterministic policy for optimal expected-reward behavior in the presence of linear constraints representing the desired steady-state behavior. Two integer linear programs are proposed and validated experimentally to solve this problem in unichain and multichain Markov decision processes. Finally, we prove that this problem is NP-hard even in the restricted setting where deterministic transitions are enforced on the MDP and there are only two actions.














Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Akshay, S., Bertrand, N., Haddad, S., Helouet, L.: The steady-state control problem for Markov decision processes. In: International conference on quantitative evaluation of systems, pp 290–304. Springer (2013)
Altman, E.: Constrained Markov Decision Processes, vol. 7. CRC Press, Boca Raton (1999)
Altman, E.: Constrained Markov Decision Processes, vol. 7. CRC Press, Boca Raton (1999)
Atia, G.K., Beckus, A., Alkhouri, I., Velasquez, A.: Steady-state policy synthesis in multichain Markov decision processes. In: Proceedings of the 29th international joint conference on artificial intelligence (IJCAI), pp 4069–4075. International joint conferences on artificial intelligence organization, 7 (2020)
Atia, G.K., Beckus, A., Alkhouri, I., Velasquez, A.: Steady-state planning in expected reward multichain MDPS. J. Artif. Intell. Res. 72, 1029–1082 (2021)
Baier, C., Haverkort, B, Hermanns, H., Katoen, J.-P.: Model checking continuous-time markov chains by transient analysis. In: International conference on computer aided verification, pp 358–372. Springer (2000)
Baumgartner, P., Thiébaux, S., Trevizan, F.: Heuristic search planning with multi-objective probabilistic ltl constraints. In: Sixteenth international conference on principles of knowledge representation and reasoning (2018)
Bhatnagar, S., Lakshmanan, K.: An online actor–critic algorithm with function approximation for constrained Markov decision processes. J. Optim. Theory Appl. 153(3), 688–708 (2012)
Boussemart, M., Limnios, N.: Markov decision processes with asymptotic average failure rate constraint. Commun. Stat. Theory Methods 33(7), 1689–1714 (2004)
Boussemart, M., Limnios, N., Fillion, J.C.: Non-ergodic Markov decision processes with a constraint on the asymptotic failure rate: general class of policies. Stoch. Models 18(1), 173–191 (2002)
Brázdil, T., Brozek, V., Chatterjee, K., Forejt, V., Kucera, A.: Two views on multiple mean-payoff objectives in Markov decision processes. Log. Methods Comput. Sci. 10(1), 158 (2014)
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: Openai gym. (2016) arXiv preprint arXiv:1606.01540
Camacho, A., McIlraith, S.A.: Strong fully observable non-deterministic planning with ltl and ltlf goals. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI-19, pp 5523–5531. International joint conferences on artificial intelligence organization, 7 (2019)
Chatterjee, K., Kretínsk, Z., Kretínský, J.: Unifying two views on multiple mean-payoff objectives in Markov decision processes. Log. Methods Comput. Sci. 113(2), 258 (2017)
Christos, H.: Papadimitriou: Computational Complexity. Addison-Wesley, New York (1994)
Courcoubetis, C., Yannakakis, M.: The complexity of probabilistic verification. J. ACM 42(4), 857–907 (1995)
Dufour, F., Prieto-Rumeau, T.: Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities. Stochast. Int. J. Probab. Stoch. Process. 87(2), 273–307 (2015)
Engesser, T., Bolander, T., Nebel, B.: Cooperative epistemic multi-agent planning with implicit coordination. In: Distributed and multi-agent planning (DMAP-15), p 68
Etessami, K., Kwiatkowska, M.Z., Vardi, M.Y., Yannakakis, M.: Multi-objective model checking of Markov decision processes. Log. Methods Comput. Sci. 4, 4 (2008)
Guo, M., Zavlanos, M.M.: Probabilistic motion planning under temporal tasks and soft constraints. IEEE Trans. Autom. Control 63(12), 4051–4066 (2018)
Ibm ilog cplex optimization studio v12.8.0 documentation. IBM Knowledge Center (2017)
Kallenberg, L.C.M.: Linear Programming and Finite Markovian Control Problems. Mathematisch Centrum, Amsterdam (1983)
Kazemi, M., Perez, M., Somenzi, F., Soudjani, S., Trivedi, A., Velasquez, A.: Translating omega-regular specifications to average objectives for model-free reinforcement learning. In: Proceedings of the 21st international conference on autonomous agents and multiagent systems, pp 732–741 (2022)
Konstantopoulos, T: Markov Chains and Random Walks. Lecture notes (2009)
Koopman, P., Wagner, M.: Challenges in autonomous vehicle testing and validation. SAE Int. J. Transp. Saf. 4(1), 15–24 (2016)
Krass, Dmitry, Vrieze, O.J.: Achieving target state-action frequencies in multichain average-reward Markov decision processes. Math. Oper. Res. 27(3), 545–566 (2002)
Křetínskỳ, J.: Ltl-constrained steady-state policy synthesis. (2021) arXiv preprint arXiv:2105.14894
Kwiatkowska, M., Parker, D.: Automated verification and strategy synthesis for probabilistic systems. In: Automated technology for verification and analysis, pp 5–22. Springer (2013)
Lakshmanan, K., Bhatnagar, S.: A novel q-learning algorithm with function approximation for constrained Markov decision processes. In 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp 400–405. IEEE (2012)
Lazar, Andreas: Optimal flow control of a class of queueing networks in equilibrium. IEEE Trans. Automatic Control 28(11), 1001–1007 (1983)
Medina Ayala, A.I., Andersson, S.B., Belta, C.: Probabilistic control from time-bounded temporal logic specifications in dynamic environments. In: 2012 IEEE international conference on robotics and automation, pp 4705–4710. IEEE (2012)
Norris, J.R.: Markov Chains. Cambridge University Press, Cambridge (1997)
Pak, Igor, Yang, Jed: Tiling simply connected regions with rectangles. J. Combinat. Theory Ser. A 120(7), 1804–1816 (2013)
Pistore, M., Bettin, R., Traverso, P.: Symbolic techniques for planning with extended goals in non-deterministic domains. In: Sixth European conference on planning (2014)
Puterman, Martin: Markov Decision Processes : Discrete Stochastic Dynamic Programming. Wiley, New York (1994)
Saldi, N.: Finite-state approximations to discounted and average cost constrained Markov decision processes. IEEE Transactions on Automatic Control (2019)
Sarathy, V., Kasenberg, D.l, Goel, S., Sinapov, J., Scheutz, M.: Spotter: extending symbolic planning operators through targeted reinforcement learning. In: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pp 1118–1126 (2021)
Schwarting, W., Alonso-Mor, J., Rus, D.: Planning and decision-making for autonomous vehicles. Robot. Autonom. Syst. Ann. Rev. Control 1, 187–210 (2018)
Song, L., Feng, Y. Zhang, L.: Planning for stochastic games with co-safe objectives. In: Twenty-fourth international joint conference on artificial intelligence (2015)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT Press, Cambridge (1998)
Teichteil-Königsbuch, F.: Path-constrained markov decision processes: bridging the gap between probabilistic model-checking and decision-theoretic planning. In: 20th European Conference on Artificial Intelligence (ECAI 2012) (2012)
Thiébaux, S., Gretton, C., Slaney, J., Price, D., Kabanza, F.: Decision-theoretic planning with non-markovian rewards. J. Artif. Intell. Res. 25, 17–74 (2006)
Trevizan, F.W, Thiébaux, S., Santana, P.H., Williams, B.C.: Heuristic search in dual space for constrained stochastic shortest path problems. In: ICAPS, pp 326–334 (2016)
Velasquez, A.: Steady-state policy synthesis for verifiable control. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI-19, pp. 5653–5661. International Joint Conferences on Artificial Intelligence Organization, 7 (2019)
Wan, Y., Naik, A., Sutton, R.S.: Learning and planning in average-reward markov decision processes. In: International Conference on Machine Learning, pp 10653–10662. PMLR (2021)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research was supported in part by the Air-Force Office of Scientific Research through Grant FA9550-19-1-0177 and in part by the Air-Force Research Laboratory, Rome through Contract FA8750-17-S-7007.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Velasquez, A., Alkhouri, I., Subramani, K. et al. Optimal Deterministic Controller Synthesis from Steady-State Distributions. J Autom Reasoning 67, 7 (2023). https://doi.org/10.1007/s10817-022-09657-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10817-022-09657-9