References
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap T, Hui F, Sifre L, van den Driessche G, Graepel T, Hassabis D. Mastering the game of Go without human knowledge. Nature, 2017, 550(7676): 354–359
Wachi A, Sui Y. Safe reinforcement learning in constrained Markov decision processes. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 908
Yu M, Yang Z, Kolar M, Wang Z. Convergent policy optimization for safe reinforcement learning. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 281–293
Xiao S, Guo L, Jiang Z, Lv L, Chen Y, Zhu J, Yang S. Model-based constrained MDP for budget allocation in sequential incentive marketing. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019, 971–980
Le H M, Voloshin C, Yue Y. Batch policy learning under constraints. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 3703–3712
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant No. 61303108); Natural Science Foundation of Jiangsu Province (BK20211102); Suzhou Key Industries Technological Innovation-Prospective Applied Research Project (SYG201804); A Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supporting information The supporting information is available online at https://journal.hep.com.cn and https://link.springer.com.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Zhao, P., Zhu, F., Liu, Q. et al. A stable actor-critic algorithm for solving robotic tasks with multiple constraints. Front. Comput. Sci. 17, 174328 (2023). https://doi.org/10.1007/s11704-022-1612-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-022-1612-9