A stable actor-critic algorithm for solving robotic tasks with multiple constraints

Zhao, Peiyao; Zhu, Fei; Liu, Quan; Ling, Xinghong

doi:10.1007/s11704-022-1612-9

A stable actor-critic algorithm for solving robotic tasks with multiple constraints

Letter
Published: 06 December 2022

Volume 17, article number 174328, (2023)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Peiyao Zhao¹,
Fei Zhu¹,
Quan Liu¹ &
…
Xinghong Ling¹

21 Accesses
2 Citations
1 Altmetric
Explore all metrics

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap T, Hui F, Sifre L, van den Driessche G, Graepel T, Hassabis D. Mastering the game of Go without human knowledge. Nature, 2017, 550(7676): 354–359
Article Google Scholar
Wachi A, Sui Y. Safe reinforcement learning in constrained Markov decision processes. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 908
Yu M, Yang Z, Kolar M, Wang Z. Convergent policy optimization for safe reinforcement learning. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 281–293
Xiao S, Guo L, Jiang Z, Lv L, Chen Y, Zhu J, Yang S. Model-based constrained MDP for budget allocation in sequential incentive marketing. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019, 971–980
Le H M, Voloshin C, Yue Y. Batch policy learning under constraints. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 3703–3712

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 61303108); Natural Science Foundation of Jiangsu Province (BK20211102); Suzhou Key Industries Technological Innovation-Prospective Applied Research Project (SYG201804); A Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions.

Author information

Authors and Affiliations

School of Computer Science and Technology, Soochow University, Suzhou, 215006, China
Peiyao Zhao, Fei Zhu, Quan Liu & Xinghong Ling

Authors

Peiyao Zhao
View author publications
Search author on:PubMed Google Scholar
Fei Zhu
View author publications
Search author on:PubMed Google Scholar
Quan Liu
View author publications
Search author on:PubMed Google Scholar
Xinghong Ling
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Fei Zhu.

Additional information

Supporting information The supporting information is available online at https://journal.hep.com.cn and https://link.springer.com.

Electronic supplementary material

Appendix