Reinforcement learning architecture for cyber–physical–social AI: state-of-the-art and perspectives

Li, Xue; Wang, Puming; Jin, Xin; Jiang, Qian; Zhou, Wei; Yao, Saowen

doi:10.1007/s10462-023-10450-2

Reinforcement learning architecture for cyber–physical–social AI: state-of-the-art and perspectives

Published: 22 March 2023

Volume 56, pages 12655–12688, (2023)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Xue Li¹,
Puming Wang ORCID: orcid.org/0000-0003-1261-8687^2,3,
Xin Jin^2,3,
Qian Jiang^2,3,
Wei Zhou^2,3 &
…
Saowen Yao^2,3

459 Accesses
Explore all metrics

Abstract

As the extension of cyber–physical systems (CPSs), cyber–physical–social systems (CPSSs) seamlessly integrate cyber space, physical space, and social space. CPSS provide a more comprehensive smart space of perception for Reinforcement Learning (RL), so as to lead a revolution in Artificial Intelligence (AI), which urgently require innovation in computing system architecture. This paper aims to provide a comprehensive review and perspectives of RL architecture for collaborative computing system in CPSS. Firstly, we analyze CPSS AI’s features from data perspective, including multi-modal data fusion from multi-spaces, and rule discover and representation, collaborative computing system architectures for multi-spaces, intelligent decision-making (policy) discovery. After that, we propose to use the action aware transition tensor to fuse the CPSS data for collaborative RL. Then the typical architectures and methods of RL are surveyed. Furthermore, a tensor based unified collaborative computing reinforcement architecture are proposed for CPSS AI, including architecture and optimal policy solution. In the end, we summarize the paper and discuss the future work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Determinantal Reinforcement Learning with Techniques to Avoid Poor Local Optima

DDMA: Discrepancy-Driven Multi-agent Reinforcement Learning

Exploiting Reward Machines with Deep Reinforcement Learning in Continuous Action Domains

Notes

References

Almasan P, Suárez-Varela J, Rusek K, Barlet-Ros P, Cabellos-Aparicio A (2022) Deep reinforcement learning meets graph neural networks: exploring a routing optimization use case. Comput Commun 196:184–194
Google Scholar
Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: a brief survey. IEEE Signal Process Mag 34(6):26–38
Google Scholar
Barto AG (2018) Some learning tasks from a control perspective. In: 1990 Lectures in complex systems. CRC Press, pp 195–224
Barto AG, Sutton RS, Anderson CW (1988) Neuronlike adaptive elements that can solve difficult learning control problems. MIT Press, Cambridge
Google Scholar
Baxter J, Bartlett PL (2001) Infinite-horizon policy-gradient estimation. J Artif Intell Res 15(1):319–350
MathSciNet MATH Google Scholar
Bellemare MG, Veness J, Bowling M (2012) Investigating contingency awareness using Atari 2600 games. In: Twenty-sixth AAAI conference on artificial intelligence, 2012, pp 864–871
Bellman R (1966) Dynamic programming. Science 153(3731):34–37
MATH Google Scholar
Blad C, Bogh S, Kallesoe CS (2022) Data-driven offline reinforcement learning for HVAC-systems. Energy 261:125290
Google Scholar
Brazell M, Li N, Navasca C, Tamon C (2013) Solving multilinear systems via tensor inversion. SIAM J Matrix Anal Appl 34(2):542–570
MathSciNet MATH Google Scholar
Busoniu L, Babuska R, De Schutter B (2008) A comprehensive survey of multiagent reinforcement learning. IEEE Trans Syst Man Cybern C 38(2):156–172
Google Scholar
Buşoniu L, de Bruin T, Tolić D, Kober J, Palunko I (2018) Reinforcement learning for control: performance, stability, and deep approximators. Annu Rev Control 46:8–28
MathSciNet Google Scholar
Cherry S (1997) Some comments on singular value decomposition analysis. J Clim 10(7):1759–1761
Google Scholar
Curran W, Brys T, Aha D, Taylor M, Smart WD (2016) Dimensionality reduced reinforcement learning for assistive robots. In: Proceedings of artificial intelligence for human–robot interaction at AAAI fall symposium series, 2016
Dönderler ME, Ulusoy Ö, Güdükbay U (2000) A rule-based approach to represent spatio-temporal relations in video data. In: International conference on advances in information systems, 2000. Springer, pp 409–418
Entezari N, Shiri ME, Moradi P (2010) A local graph clustering algorithm for discovering subgoals in reinforcement learning. Springer, Berlin
Google Scholar
Fan Y, Li B, Favorite D, Singh N, Childers T, Rich P, Allcock W, Papka ME, Lan Z (2022) DRAS: deep reinforcement learning for cluster scheduling in high performance computing. IEEE Trans Parallel Distrib Syst 33(12):4903–4917
Google Scholar
Feng J, Yang LT, Dai G, Wang W, Zou D (2019) A secure high-order Lanczos-based orthogonal tensor SVD for big data reduction in cloud environment. IEEE Trans Big Data 5(3):355–367
Google Scholar
Fujita Y, Maeda S-I (n.d.) Clipped action policy gradient. arXiv preprint. arXiv:1802.07564
Fulpagare Y, Huang K-R, Liao Y-H, Wang C-C (2022) Optimal energy management for air cooled server fans using deep reinforcement learning control method. Energy Build 277:112542
Google Scholar
Gao Y, Chen S-F, Lu X (2004) Research on reinforcement learning technology: a review. Acta Autom Sin 30(1):86–100
MathSciNet MATH Google Scholar
Garcıa J, Fernández F (2015) A comprehensive survey on safe reinforcement learning. J Mach Learn Res 16(1):1437–1480
MathSciNet MATH Google Scholar
Gibbons RS (1992) Game theory for applied economists. Princeton University Press, Princeton
Google Scholar
Girgin S, Polat F, Alhajj R (2007) Positive impact of state similarity on reinforcement learning performance. IEEE Trans Syst Man Cybern B 37(5):1256–1270
Google Scholar
Gosavi A (2009) Reinforcement learning: a tutorial survey and recent advances. INFORMS J Comput 21(2):178–192
MathSciNet MATH Google Scholar
Grondman I, Busoniu L, Lopes GA, Babuska R (2012) A survey of actor–critic reinforcement learning: standard and natural policy gradients. IEEE Trans Syst Man Cybern C 42(6):1291–1307
Google Scholar
Guo B, Wang Z, Yu Z, Wang Y, Yen NY, Huang R, Zhou X (2015) Mobile crowd sensing and computing: the review of an emerging human-powered sensing paradigm. ACM Comput Surv 48(1):7
Google Scholar
Hausknecht M, Stone P (2015) Deep recurrent Q-learning for partially observable MDPS. In: AAAI fall symposium on sequential decision making for intelligent agents, 2015
Hernandez-Leal P, Kartal B, Taylor ME (2019) A survey and critique of multiagent deep reinforcement learning. Auton Agents Multi-agent Syst 33(6):750–797
Google Scholar
Huang BQ, Cao GY, Guo M (2005) Reinforcement learning neural network to the problem of autonomous mobile robot obstacle avoidance. In: Proceedings of 2005 international conference on machine learning and cybernetics, 2005, vol 1. IEEE, pp 85–89
Jing Y, Jiang W, Su G, Zhou Z, Wang Y (2014) A learning automata-based singular value decomposition and its application in recommendation system. In: International conference on intelligent computing, 2014. Springer
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285
Google Scholar
Kim KD, Kumar PR (2012) Cyber–physical systems: a perspective at the centennial. Proc IEEE 100(Special Centennial Issue):1287–1308
Google Scholar
Kious D, Tarrs P (n.d.) Reinforcement learning in social networks. arXiv preprint. arXiv:1601.00667
Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274
Google Scholar
Konda V (1999) Actor–critic algorithms. SIAM J Control Optim 42(4):1143–1166
MathSciNet MATH Google Scholar
Kovács S (2001) SVD reduction in continuous environment reinforcement learning. In: International conference on computational intelligence, 2001. Springer, pp 719–738
Kovács S, Baranyi PZ (2003) Fuzzy q-learning in SVD reduced dynamic state-space. Prod Syst Inf Eng Publ Univ Miskolc 1:107–124
Google Scholar
Krodel M, Kuhnert KD (2002) Reinforcement learning to drive a car by pattern matching. In: IECON, 2002, pp 1728–1733
Kuang L, Hao F, Yang LT, Lin M (2014) A tensor-based approach for big data representation and dimensionality reduction. IEEE Trans Emerg Top Comput 2(3):280–291
Google Scholar
Lahat D, Adali T, Jutten C (2015) Multimodal data fusion: an overview of methods, challenges and prospects. In: Proceedings of the IEEE, 2015, pp 1449–1477
Lei L, Tan Y, Zheng K, Liu S, Zhang K, Shen X (2020) Deep reinforcement learning for autonomous internet of things: model, applications and challenges. IEEE Commun Surv Tutor 22(3):1722–1760
Google Scholar
Levine S, Wagener N, Abbeel P (2015) Learning contact-rich manipulation skills with guided policy search. In: IEEE international conference on robotics and automation, 2015, pp 156–163
Levine S, Pastor P, Krizhevsky A, Ibarz J, Quillen D (2018) Learning hand–eye coordination for robotic grasping with deep learning and large-scale data collection. Int J Robot Res 37(4–5):421–436
Google Scholar
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (n.d.) Continuous control with deep reinforcement learning. arXiv preprint. arXiv:1509.02971
Liu Z, Yang DS, Wen D, Zhang WM, Mao W (2011) Cyber–physical–social systems for command and control. IEEE Intell Syst 26(4):92–96
Google Scholar
Liu Q, Zhai JW, Zhang Z-Z, Zhong S, Zhou Q, Zhang P, Xu J (2018) A survey on deep reinforcement learning. Chin J Comput 41(1):1–27
Google Scholar
Liu H, Cai K, Li P, Qian C, Zhao P, Wu X (2023) REDRL: a review-enhanced deep reinforcement learning model for interactive recommendation. Expert Syst Appl 213:118926
Google Scholar
Luong NC, Hoang DT, Gong S, Niyato D, Wang P, Liang Y-C, Kim DI (2019) Applications of deep reinforcement learning in communications and networking: a survey. IEEE Commun Surv Tutor 21(4):3133–3174
Google Scholar
Ma J, Yang LT, Apduhan BO, Huang R, Barolli L, Takizawa M (2005) Towards a smart world and ubiquitous intelligence: a walkthrough from smart things to smart hyperspaces and UbicKids. Int J Pervasive Comput Commun 1(1):53–68
Google Scholar
Machado MC, Bowling M (n.d.) Learning purposeful behaviour in the absence of rewards. arXiv preprint. arXiv:1605.07700
Mahadevan S (1996) Average reward reinforcement learning: foundations, algorithms, and empirical results. Mach Learn 22(1–3):159–195
MATH Google Scholar
Matignon L, Laurent GJ, Le Fort-Piat N (2012) Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. Knowl Eng Rev 27(1):1–31
Google Scholar
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state by fast computing machines. J Chem Phys 21(6):1087–1092
MATH Google Scholar
Michie D, Chambers RA (1968) BOXES: an experiment in adaptive control. Mach Intell 2(2):137–152
MATH Google Scholar
Milner EC (n.d.) The theory of graphs and its applications. J Lond Math Soc s1-39(1)
Misra S, Goswami S, Taneja C (2016) Multivariate data fusion-based learning of video content and service distribution for cyber physical social systems. IEEE Trans Comput Soc Syst 3(1):1–12
Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529
Google Scholar
Naddaf Y, Naddaf Y, Veness J, Bowling M (2013) The Arcade learning environment: an evaluation platform for general agents. J Artif Intell Res 47(1):253–279
Google Scholar
Nguyen ND, Nguyen T, Nahavandi S (2017) System design perspective for human-level agents using deep reinforcement learning: a survey. IEEE Access 5:27091–27102
Google Scholar
Nguyen TT, Nguyen ND, Nahavandi S (2020) Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications. IEEE Trans Cybern 50(9):3826–3839
Google Scholar
Nie J, Haykin S (1999) A dynamic channel assignment policy through Q-learning. IEEE Trans Neural Netw 10(6):1443–1455
Google Scholar
Ning H, Liu H, Ma J, Yang LT, Huang R (2015) Cybermatics: cyber–physical–social-thinking hyperspace based science and technology. Future Gener Comput Syst 56:504–522
Google Scholar
Notsu A, Honda K, Ichihashi H, Ido A, Komori Y (2013) Information compression effect based on PCA for reinforcement learning agents’ communication. In: Joint international conference on soft computing and intelligent systems, 2013, pp 1318–1321
Ossowski S, Hernández JZ, Belmonte MV et al (2005) Decision support for traffic management based on organisational and communicative multiagent abstractions. Transp Res C 13(4):272–298
Google Scholar
Panait L, Luke S (2005) Cooperative multi-agent learning: the state of the art. Auton Agents Multi-agent Syst 11(3):387–434
Google Scholar
Peng J, Williams RJ (1996) Incremental multi-step Q-learning. Mach Learn 22(1–3):283–290
Google Scholar
Prat N, Comyn-Wattiau I, Akoka J (2011) Combining objects with rules to represent aggregation knowledge in data warehouse and OLAP systems. Data Knowl Eng 70(8):732–752
Google Scholar
Puterman ML (2009) Markov decision processes: discrete stochastic dynamic programming. Technometrics 37(3):353–353
Google Scholar
Rajkumar R, Lee I, Sha L, Stankovic J (2010) Cyber–physical systems: the next computing revolution. In: Design automation conference, 2010, pp 731–736
Romp G (1997) Game theory: introduction and applications. Oxford University Press, Oxford
Google Scholar
Rosenfeld A, Taylor ME, Kraus S (2017) Speeding up tabular reinforcement learning using state–action similarities. In: Conference on autonomous agents and multiagent systems, 2017, pp 1722–1724
Roy N, Gordon GJ (2003) Exponential family PCA for belief compression in POMDPS. In: Advances in neural information processing systems, 2003, pp 1667–1674
Rummery GA (n.d.) Problem solving with reinforcement learning. PhD Thesis, University of Cambridge
Rummery GA, Niranjan M (1994) On-line Q-learning using connectionist systems. University of Cambridge, Department of Engineering, Cambridge
Google Scholar
Sargent TJ (1987) Dynamic macroeconomic theory. Harvard University Press, Cambridge
MATH Google Scholar
Schraudolph NN, Yu J, Aberdeen D (2006) Fast online policy gradient learning with SMD gain vector adaptation. In: Advances in neural information processing systems, 2006, pp 1185–1192
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (n.d.) Proximal policy optimization algorithms. arXiv preprint. arXiv:1707.06347
Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275(5306):1593–1599
Google Scholar
Sheth A, Anantharam P, Henson C (2013) Physical–cyber–social computing: an early 21st century approach. IEEE Intell Syst 1:78–82
Google Scholar
Shi J, Wan J, Yan H, Hui S (2011) A survey of cyber physical systems. In: 2011 International conference on wireless communications and signal processing (WCSP), 2011
Shoeleh F, Asadpour M (2017) Graph based skill acquisition and transfer learning for continuous reinforcement learning domains. Pattern Recognit Lett 87:104–116
Google Scholar
Skyrms B, Pemantle R (2009) A dynamic model of social network formation. In: Adaptive networks. Springer, pp 231–251
Sorber L (n.d.) Data fusion: tensor factorizations by complex optimization. PhD Thesis, Faculty of Engineering, KU Leuven, Leuven
Sun T, Shen H, Chen T, Li D (2022) Adaptive temporal difference learning with linear function approximation. IEEE Trans Pattern Anal Mach Intell 44(12):8812–8824
Google Scholar
Sutton RS (1990) Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings of the seventh international conference, 1990
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
MATH Google Scholar
Sutton RS, Barto AG (2005) Reinforcement learning: an introduction, Bradford book. Mach Learn 16(1):285–286
Google Scholar
Sutton RS, McAllester DA, Singh SP, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, 2000, pp 1057–1063
Tamagawa D, Taniguchi E, Yamada T (2010) Evaluating city logistics measures using a multi-agent model. Procedia Soc Behav Sci 2(3):6002–6012
Google Scholar
Tan Y, Liu W, Qiu Q (2009) Adaptive power management using reinforcement learning. In: Proceedings of the 2009 international conference on computer-aided design, 2009. ACM, pp 461–467
Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 10(July):1633–1685
MathSciNet MATH Google Scholar
Tesauro G (1995) TD-Gammon: a self-teaching backgammon program. In: Applications of neural networks. Springer, Boston, pp 267–285
Thorndike EL (1998) Animal intelligence: an experimental study of the associative processes in animals. Am Psychol 53(10):1125–1127
Google Scholar
Tousi M, Hosseinian S, Jadidinejad A, Menhaj M (2008) Application of Saras learning algorithm for reactive power control in power system. In: Proceedings of the 2nd international conference on power and energy, 2008. IEEE, pp 1198–1202
Vogel A, Jurafsky D (2010) Learning to follow navigational directions. In: Proceedings of the 48th annual meeting of the Association for Computational Linguistics, 2010. Association for Computational Linguistics, pp 806–814
Wang FY (2010) The emergence of intelligent enterprises: from CPS to CPSS. IEEE Intell Syst 25(4):85–88
Google Scholar
Wang X, Yang LT, Feng J, Chen X, Deen MJ (2016) A tensor-based big service framework for enhanced living environments. IEEE Cloud Comput Mag 3(6):36–43
Google Scholar
Wang P, Yang LT, Li J (2018) An edge cloud-assisted CPSS framework for smart city. IEEE Cloud Comput 5(5):37–46
Google Scholar
Wang P, Yang LT, Li J, Chen J, Hu S (2019) Data fusion in cyber–physical–social systems: state-of-the-art and perspectives. Inf Fusion 51:42–57
Google Scholar
Wang P, Yang LT, Li J, Zhou X (2020a) MMDP: a mobile-IoT based multi-modal reinforcement learning service framework. IEEE Trans Serv Comput 13(4):675–684
Google Scholar
Wang P, Yang LT, Nie X, Ren Z, Li J, Kuang L (2020b) Data-driven software defined network attack detection: state-of-the-art and perspectives. Inf Sci 513:65–83
Google Scholar
Wang P, Yang LT, Peng Y, Li J, Xie X (2020c) \({M^2}{T^2}\): the multivariate multi-step transition tensor for user mobility pattern prediction. IEEE Trans Netw Sci Eng 7(2):907–917
Google Scholar
Wang P, Yang LT, Qian G, Li J, Yan Z (2020d) HO-OTSVD: a novel tensor decomposition and its incremental computation for cyber–physical–social networks (CPSN). IEEE Trans Netw Sci Eng 7(2):713–725
Google Scholar
Wang P, Yang LT, Qian G, Lu F (2021) The cyber–physical–social transition tensor service framework. IEEE Trans Sustain Comput 6(3):481–492
Google Scholar
Wang Z, Schaul T, Hessel M, Van Hasselt H, Lanctot M, De Freitas N (n.d.) Dueling network architectures for deep reinforcement learning. arXiv preprint. arXiv:1511.06581
Watkins C (1989a) Learning from delayed rewards. Robot Auton Syst 15(4):233–235
Google Scholar
Watkins C (1989b) Learning from delayed rewards. PhD Thesis, Cambridge University
Wei Q, Liu D, Shi G (2015) A novel dual iterative Q-learning method for optimal battery management in smart residential environments. IEEE Trans Ind Electron 62(4):2509–2518
Google Scholar
Whong C (n.d.) Foiling NYC’s taxi trip data. http://chriswhong.com/opendata/foil-nyc-taxi/
Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3–4):229–256
MATH Google Scholar
Wisetjindawat W, Sano K, Matsumoto S (2005) Supply chain simulation for modeling the interactions in freight movement. J East Asia Soc Transp Stud 6:2991–3004
Google Scholar
Xiang Y, Niu W, Liu J, Chen T, Han Z (2018a) A PCA-based model to predict adversarial examples on Q-learning of path finding. In: IEEE third international conference on data science in cyberspace, 2018, pp 773–780
Xiang Y, Niu W, Liu J, Chen T, Han Z (2018b) A PCA-based model to predict adversarial examples on q-learning of path finding. In: 2018 IEEE third international conference on data science in cyberspace (DSC), 2018, pp 773–780
Xu X, He H-G (2002) Residual-gradient-based neural reinforcement learning for the optimal control of an acrobat. In: Proceedings of the international symposium on intelligent control, 2002. IEEE, pp 758–763
Xu X, Zuo L, Huang Z (2014) Reinforcement learning algorithms with function approximation: recent advances and applications. Inf Sci 261:1–31
MathSciNet MATH Google Scholar
Yau K-LA, Qadir J, Khoo HL, Ling MH, Komisarczuk P (2017) A survey on reinforcement learning models and algorithms for traffic signal control. ACM Comput Surv (CSUR) 50(3):1–38
Google Scholar
Zeng J, Yang LT, Ma J (2016) A system-level modeling and design for cyber–physical–social systems. ACM Trans Embed Comput Syst 15(2):1–26
Google Scholar
Zhao D, Shao K, Zhu Y, Li D, Chen Y, Wang H, Liu D-R, Zhou T, Wang C-H (2016) Review of deep reinforcement learning and discussions on the development of computer go. Control Theory Appl 33(6):701–717
MATH Google Scholar
Zhou Y (n.d.) Data-driven cyber–physical–social system for knowledge discovery in smart cities. PhD Thesis, University of Surrey

Download references

Acknowledgements

This work was supported by National Natural Science Foundation Project under Grants 62166047, 62101481, 62002313 and 62261060.

Author information

Authors and Affiliations

College of Electrical and Information Engineering, Henan Institute of Technology, Xinxiang, 453003, China
Xue Li
School of Software, Yunnan University, Kunming, 650091, China
Puming Wang, Xin Jin, Qian Jiang, Wei Zhou & Saowen Yao
Engineering Research Center of Cyberspace, Yunnan University, Kunming, 650091, China
Puming Wang, Xin Jin, Qian Jiang, Wei Zhou & Saowen Yao

Authors

Xue Li
View author publications
You can also search for this author in PubMed Google Scholar
Puming Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Jin
View author publications
You can also search for this author in PubMed Google Scholar
Qian Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Saowen Yao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Puming Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, X., Wang, P., Jin, X. et al. Reinforcement learning architecture for cyber–physical–social AI: state-of-the-art and perspectives. Artif Intell Rev 56, 12655–12688 (2023). https://doi.org/10.1007/s10462-023-10450-2

Download citation

Published: 22 March 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s10462-023-10450-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement learning architecture for cyber–physical–social AI: state-of-the-art and perspectives

Abstract

Access this article

Similar content being viewed by others

Determinantal Reinforcement Learning with Techniques to Avoid Poor Local Optima

DDMA: Discrepancy-Driven Multi-agent Reinforcement Learning

Exploiting Reward Machines with Deep Reinforcement Learning in Continuous Action Domains

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Reinforcement learning architecture for cyber–physical–social AI: state-of-the-art and perspectives

Abstract

Access this article

Similar content being viewed by others

Determinantal Reinforcement Learning with Techniques to Avoid Poor Local Optima

DDMA: Discrepancy-Driven Multi-agent Reinforcement Learning

Exploiting Reward Machines with Deep Reinforcement Learning in Continuous Action Domains

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation