References
Ross K W, Varadarajan R. Markov decision processes with sample path constraints: the communicating case. Operations Research, 1989, 37(5): 780–790
Altman E. Constrained Markov Decision Processes: Stochastic Modeling. New York: Routledge, 1999
Zhang Q, Leng S, Ma X, Liu Q, Wang X, Liang B, Liu Y, Yang J. CVaR-constrained policy optimization for safe reinforcement learning. IEEE Transactions on Neural Networks and Learning Systems, 2024
Schulman J, Levine S, Moritz P, Jordan M, Abbeel P. Trust region policy optimization. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning. 2015, 1889–1897
Yang T Y, Rosca J, Narasimhan K, Ramadge P J. Projection-based constrained policy optimization. In: Proceedings of the 8th International Conference on Learning Representations. 2020
Makoviychuk V, Wawrzyniak L, Guo Y, Lu M, Storey K, Macklin M, Hoeller D, Rudin N, Allshire A, Handa A, State G. Isaac gym: high performance GPU based physics simulation for robot learning. In: Proceedings of the 35th Conference on Neural Information Processing Systems Track on Datasets and Benchmarks 1. 2021
Acknowledgments
This work was supported by the Fundamental Research Funds for the Central Universities (No. 2023JBZX011) and the Aeronautical Science Foundation of China (No. 202300010M5001).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.
Electronic Supplementary Material
Rights and permissions
About this article
Cite this article
Han, S., Zhang, H., Wu, H. et al. Multi-constraint reinforcement learning in complex robot environments. Front. Comput. Sci. 19, 198353 (2025). https://doi.org/10.1007/s11704-024-40682-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-024-40682-6