Optimal Deterministic Controller Synthesis from Steady-State Distributions

Velasquez, Alvaro; Alkhouri, Ismail; Subramani, K.; Wojciechowski, Piotr; Atia, George

doi:10.1007/s10817-022-09657-9

Optimal Deterministic Controller Synthesis from Steady-State Distributions

Published: 12 January 2023

Volume 67, article number 7, (2023)
Cite this article

Journal of Automated Reasoning Aims and scope Submit manuscript

Alvaro Velasquez¹,
Ismail Alkhouri²,
K. Subramani ORCID: orcid.org/0000-0001-5821-5117³,
Piotr Wojciechowski³ &
…
George Atia²

319 Accesses
Explore all metrics

Abstract

The formal synthesis of control policies is a classic problem that entails the computation of optimal strategies for an agent interacting in some environment such that some formal guarantees of behavior are met. These guarantees are specified as part of the problem description and can be supplied by the end user in the form of various logics (e.g., Linear Temporal Logic and Computation Tree Logic) or imposed via constraints on agent-environment interactions. The latter has received significant attention in recent years within the context of constraints on the asymptotic frequency with which an agent visits states of interest. This is captured by the steady-state distribution of the agent. The formal synthesis of stochastic policies satisfying constraints on this distribution has been studied. However, the derivation of deterministic policies for the same has received little attention. In this paper, we focus on this deterministic steady-state control problem, i.e., the problem of obtaining a deterministic policy for optimal expected-reward behavior in the presence of linear constraints representing the desired steady-state behavior. Two integer linear programs are proposed and validated experimentally to solve this problem in unichain and multichain Markov decision processes. Finally, we prove that this problem is NP-hard even in the restricted setting where deterministic transitions are enforced on the MDP and there are only two actions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Controller synthesis for linear temporal logic and steady-state specifications

Article 03 May 2024

Symblicit algorithms for mean-payoff and shortest path in monotonic Markov decision processes

Article 01 February 2016

Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Akshay, S., Bertrand, N., Haddad, S., Helouet, L.: The steady-state control problem for Markov decision processes. In: International conference on quantitative evaluation of systems, pp 290–304. Springer (2013)
Altman, E.: Constrained Markov Decision Processes, vol. 7. CRC Press, Boca Raton (1999)
MATH Google Scholar
Altman, E.: Constrained Markov Decision Processes, vol. 7. CRC Press, Boca Raton (1999)
MATH Google Scholar
Atia, G.K., Beckus, A., Alkhouri, I., Velasquez, A.: Steady-state policy synthesis in multichain Markov decision processes. In: Proceedings of the 29th international joint conference on artificial intelligence (IJCAI), pp 4069–4075. International joint conferences on artificial intelligence organization, 7 (2020)
Atia, G.K., Beckus, A., Alkhouri, I., Velasquez, A.: Steady-state planning in expected reward multichain MDPS. J. Artif. Intell. Res. 72, 1029–1082 (2021)
Article MathSciNet MATH Google Scholar
Baier, C., Haverkort, B, Hermanns, H., Katoen, J.-P.: Model checking continuous-time markov chains by transient analysis. In: International conference on computer aided verification, pp 358–372. Springer (2000)
Baumgartner, P., Thiébaux, S., Trevizan, F.: Heuristic search planning with multi-objective probabilistic ltl constraints. In: Sixteenth international conference on principles of knowledge representation and reasoning (2018)
Bhatnagar, S., Lakshmanan, K.: An online actor–critic algorithm with function approximation for constrained Markov decision processes. J. Optim. Theory Appl. 153(3), 688–708 (2012)
Article MathSciNet MATH Google Scholar
Boussemart, M., Limnios, N.: Markov decision processes with asymptotic average failure rate constraint. Commun. Stat. Theory Methods 33(7), 1689–1714 (2004)
Article MathSciNet MATH Google Scholar
Boussemart, M., Limnios, N., Fillion, J.C.: Non-ergodic Markov decision processes with a constraint on the asymptotic failure rate: general class of policies. Stoch. Models 18(1), 173–191 (2002)
Article MathSciNet MATH Google Scholar
Brázdil, T., Brozek, V., Chatterjee, K., Forejt, V., Kucera, A.: Two views on multiple mean-payoff objectives in Markov decision processes. Log. Methods Comput. Sci. 10(1), 158 (2014)
Article MATH Google Scholar
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: Openai gym. (2016) arXiv preprint arXiv:1606.01540
Camacho, A., McIlraith, S.A.: Strong fully observable non-deterministic planning with ltl and ltlf goals. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI-19, pp 5523–5531. International joint conferences on artificial intelligence organization, 7 (2019)
Chatterjee, K., Kretínsk, Z., Kretínský, J.: Unifying two views on multiple mean-payoff objectives in Markov decision processes. Log. Methods Comput. Sci. 113(2), 258 (2017)
Google Scholar
Christos, H.: Papadimitriou: Computational Complexity. Addison-Wesley, New York (1994)
Google Scholar
Courcoubetis, C., Yannakakis, M.: The complexity of probabilistic verification. J. ACM 42(4), 857–907 (1995)
Article MathSciNet MATH Google Scholar
Dufour, F., Prieto-Rumeau, T.: Approximation of average cost Markov decision processes using empirical distributions and concentration inequalities. Stochast. Int. J. Probab. Stoch. Process. 87(2), 273–307 (2015)
Article MathSciNet MATH Google Scholar
Engesser, T., Bolander, T., Nebel, B.: Cooperative epistemic multi-agent planning with implicit coordination. In: Distributed and multi-agent planning (DMAP-15), p 68
Etessami, K., Kwiatkowska, M.Z., Vardi, M.Y., Yannakakis, M.: Multi-objective model checking of Markov decision processes. Log. Methods Comput. Sci. 4, 4 (2008)
Article MathSciNet MATH Google Scholar
Guo, M., Zavlanos, M.M.: Probabilistic motion planning under temporal tasks and soft constraints. IEEE Trans. Autom. Control 63(12), 4051–4066 (2018)
Article MathSciNet MATH Google Scholar
Ibm ilog cplex optimization studio v12.8.0 documentation. IBM Knowledge Center (2017)
Kallenberg, L.C.M.: Linear Programming and Finite Markovian Control Problems. Mathematisch Centrum, Amsterdam (1983)
MATH Google Scholar
Kazemi, M., Perez, M., Somenzi, F., Soudjani, S., Trivedi, A., Velasquez, A.: Translating omega-regular specifications to average objectives for model-free reinforcement learning. In: Proceedings of the 21st international conference on autonomous agents and multiagent systems, pp 732–741 (2022)
Konstantopoulos, T: Markov Chains and Random Walks. Lecture notes (2009)
Koopman, P., Wagner, M.: Challenges in autonomous vehicle testing and validation. SAE Int. J. Transp. Saf. 4(1), 15–24 (2016)
Article Google Scholar
Krass, Dmitry, Vrieze, O.J.: Achieving target state-action frequencies in multichain average-reward Markov decision processes. Math. Oper. Res. 27(3), 545–566 (2002)
Article MathSciNet MATH Google Scholar
Křetínskỳ, J.: Ltl-constrained steady-state policy synthesis. (2021) arXiv preprint arXiv:2105.14894
Kwiatkowska, M., Parker, D.: Automated verification and strategy synthesis for probabilistic systems. In: Automated technology for verification and analysis, pp 5–22. Springer (2013)
Lakshmanan, K., Bhatnagar, S.: A novel q-learning algorithm with function approximation for constrained Markov decision processes. In 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp 400–405. IEEE (2012)
Lazar, Andreas: Optimal flow control of a class of queueing networks in equilibrium. IEEE Trans. Automatic Control 28(11), 1001–1007 (1983)
Article MathSciNet MATH Google Scholar
Medina Ayala, A.I., Andersson, S.B., Belta, C.: Probabilistic control from time-bounded temporal logic specifications in dynamic environments. In: 2012 IEEE international conference on robotics and automation, pp 4705–4710. IEEE (2012)
Norris, J.R.: Markov Chains. Cambridge University Press, Cambridge (1997)
Book MATH Google Scholar
Pak, Igor, Yang, Jed: Tiling simply connected regions with rectangles. J. Combinat. Theory Ser. A 120(7), 1804–1816 (2013)
Article MathSciNet MATH Google Scholar
Pistore, M., Bettin, R., Traverso, P.: Symbolic techniques for planning with extended goals in non-deterministic domains. In: Sixth European conference on planning (2014)
Puterman, Martin: Markov Decision Processes : Discrete Stochastic Dynamic Programming. Wiley, New York (1994)
Book MATH Google Scholar
Saldi, N.: Finite-state approximations to discounted and average cost constrained Markov decision processes. IEEE Transactions on Automatic Control (2019)
Sarathy, V., Kasenberg, D.l, Goel, S., Sinapov, J., Scheutz, M.: Spotter: extending symbolic planning operators through targeted reinforcement learning. In: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pp 1118–1126 (2021)
Schwarting, W., Alonso-Mor, J., Rus, D.: Planning and decision-making for autonomous vehicles. Robot. Autonom. Syst. Ann. Rev. Control 1, 187–210 (2018)
Article Google Scholar
Song, L., Feng, Y. Zhang, L.: Planning for stochastic games with co-safe objectives. In: Twenty-fourth international joint conference on artificial intelligence (2015)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT Press, Cambridge (1998)
MATH Google Scholar
Teichteil-Königsbuch, F.: Path-constrained markov decision processes: bridging the gap between probabilistic model-checking and decision-theoretic planning. In: 20th European Conference on Artificial Intelligence (ECAI 2012) (2012)
Thiébaux, S., Gretton, C., Slaney, J., Price, D., Kabanza, F.: Decision-theoretic planning with non-markovian rewards. J. Artif. Intell. Res. 25, 17–74 (2006)
Article MathSciNet MATH Google Scholar
Trevizan, F.W, Thiébaux, S., Santana, P.H., Williams, B.C.: Heuristic search in dual space for constrained stochastic shortest path problems. In: ICAPS, pp 326–334 (2016)
Velasquez, A.: Steady-state policy synthesis for verifiable control. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI-19, pp. 5653–5661. International Joint Conferences on Artificial Intelligence Organization, 7 (2019)
Wan, Y., Naik, A., Sutton, R.S.: Learning and planning in average-reward markov decision processes. In: International Conference on Machine Learning, pp 10653–10662. PMLR (2021)

Download references

Author information

Authors and Affiliations

University of Colorado, Boulder, CO, USA
Alvaro Velasquez
University of Central Florida, Orlando, FL, USA
Ismail Alkhouri & George Atia
West Virginia University, Morgantown, WV, USA
K. Subramani & Piotr Wojciechowski

Authors

Alvaro Velasquez
View author publications
You can also search for this author inPubMed Google Scholar
Ismail Alkhouri
View author publications
You can also search for this author inPubMed Google Scholar
K. Subramani
View author publications
You can also search for this author inPubMed Google Scholar
Piotr Wojciechowski
View author publications
You can also search for this author inPubMed Google Scholar
George Atia
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to K. Subramani.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research was supported in part by the Air-Force Office of Scientific Research through Grant FA9550-19-1-0177 and in part by the Air-Force Research Laboratory, Rome through Contract FA8750-17-S-7007.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Velasquez, A., Alkhouri, I., Subramani, K. et al. Optimal Deterministic Controller Synthesis from Steady-State Distributions. J Autom Reasoning 67, 7 (2023). https://doi.org/10.1007/s10817-022-09657-9

Download citation

Received: 27 February 2021
Accepted: 14 November 2022
Published: 12 January 2023
DOI: https://doi.org/10.1007/s10817-022-09657-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal Deterministic Controller Synthesis from Steady-State Distributions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Controller synthesis for linear temporal logic and steady-state specifications

Symblicit algorithms for mean-payoff and shortest path in monotonic Markov decision processes

Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now