Skip to main content
Log in

Reinforcement learning architecture for cyber–physical–social AI: state-of-the-art and perspectives

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

As the extension of cyber–physical systems (CPSs), cyber–physical–social systems (CPSSs) seamlessly integrate cyber space, physical space, and social space. CPSS provide a more comprehensive smart space of perception for Reinforcement Learning (RL), so as to lead a revolution in Artificial Intelligence (AI), which urgently require innovation in computing system architecture. This paper aims to provide a comprehensive review and perspectives of RL architecture for collaborative computing system in CPSS. Firstly, we analyze CPSS AI’s features from data perspective, including multi-modal data fusion from multi-spaces, and rule discover and representation, collaborative computing system architectures for multi-spaces, intelligent decision-making (policy) discovery. After that, we propose to use the action aware transition tensor to fuse the CPSS data for collaborative RL. Then the typical architectures and methods of RL are surveyed. Furthermore, a tensor based unified collaborative computing reinforcement architecture are proposed for CPSS AI, including architecture and optimal policy solution. In the end, we summarize the paper and discuss the future work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23

Similar content being viewed by others

Notes

  1. https://deepmind.com/health.html.

  2. http://www.osaro.com/.

  3. https://www.venturescanner.com/.

  4. http://www0.cs.ucl.ac.uk/.

  5. http://rll.berkeley.edu/.

  6. https://katefvision.github.io/.

References

  • Almasan P, Suárez-Varela J, Rusek K, Barlet-Ros P, Cabellos-Aparicio A (2022) Deep reinforcement learning meets graph neural networks: exploring a routing optimization use case. Comput Commun 196:184–194

    Google Scholar 

  • Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: a brief survey. IEEE Signal Process Mag 34(6):26–38

    Google Scholar 

  • Barto AG (2018) Some learning tasks from a control perspective. In: 1990 Lectures in complex systems. CRC Press, pp 195–224

  • Barto AG, Sutton RS, Anderson CW (1988) Neuronlike adaptive elements that can solve difficult learning control problems. MIT Press, Cambridge

    Google Scholar 

  • Baxter J, Bartlett PL (2001) Infinite-horizon policy-gradient estimation. J Artif Intell Res 15(1):319–350

    MathSciNet  MATH  Google Scholar 

  • Bellemare MG, Veness J, Bowling M (2012) Investigating contingency awareness using Atari 2600 games. In: Twenty-sixth AAAI conference on artificial intelligence, 2012, pp 864–871

  • Bellman R (1966) Dynamic programming. Science 153(3731):34–37

    MATH  Google Scholar 

  • Blad C, Bogh S, Kallesoe CS (2022) Data-driven offline reinforcement learning for HVAC-systems. Energy 261:125290

    Google Scholar 

  • Brazell M, Li N, Navasca C, Tamon C (2013) Solving multilinear systems via tensor inversion. SIAM J Matrix Anal Appl 34(2):542–570

    MathSciNet  MATH  Google Scholar 

  • Busoniu L, Babuska R, De Schutter B (2008) A comprehensive survey of multiagent reinforcement learning. IEEE Trans Syst Man Cybern C 38(2):156–172

    Google Scholar 

  • Buşoniu L, de Bruin T, Tolić D, Kober J, Palunko I (2018) Reinforcement learning for control: performance, stability, and deep approximators. Annu Rev Control 46:8–28

    MathSciNet  Google Scholar 

  • Cherry S (1997) Some comments on singular value decomposition analysis. J Clim 10(7):1759–1761

    Google Scholar 

  • Curran W, Brys T, Aha D, Taylor M, Smart WD (2016) Dimensionality reduced reinforcement learning for assistive robots. In: Proceedings of artificial intelligence for human–robot interaction at AAAI fall symposium series, 2016

  • Dönderler ME, Ulusoy Ö, Güdükbay U (2000) A rule-based approach to represent spatio-temporal relations in video data. In: International conference on advances in information systems, 2000. Springer, pp 409–418

  • Entezari N, Shiri ME, Moradi P (2010) A local graph clustering algorithm for discovering subgoals in reinforcement learning. Springer, Berlin

    Google Scholar 

  • Fan Y, Li B, Favorite D, Singh N, Childers T, Rich P, Allcock W, Papka ME, Lan Z (2022) DRAS: deep reinforcement learning for cluster scheduling in high performance computing. IEEE Trans Parallel Distrib Syst 33(12):4903–4917

    Google Scholar 

  • Feng J, Yang LT, Dai G, Wang W, Zou D (2019) A secure high-order Lanczos-based orthogonal tensor SVD for big data reduction in cloud environment. IEEE Trans Big Data 5(3):355–367

    Google Scholar 

  • Fujita Y, Maeda S-I (n.d.) Clipped action policy gradient. arXiv preprint. arXiv:1802.07564

  • Fulpagare Y, Huang K-R, Liao Y-H, Wang C-C (2022) Optimal energy management for air cooled server fans using deep reinforcement learning control method. Energy Build 277:112542

    Google Scholar 

  • Gao Y, Chen S-F, Lu X (2004) Research on reinforcement learning technology: a review. Acta Autom Sin 30(1):86–100

    MathSciNet  MATH  Google Scholar 

  • Garcıa J, Fernández F (2015) A comprehensive survey on safe reinforcement learning. J Mach Learn Res 16(1):1437–1480

    MathSciNet  MATH  Google Scholar 

  • Gibbons RS (1992) Game theory for applied economists. Princeton University Press, Princeton

    Google Scholar 

  • Girgin S, Polat F, Alhajj R (2007) Positive impact of state similarity on reinforcement learning performance. IEEE Trans Syst Man Cybern B 37(5):1256–1270

    Google Scholar 

  • Gosavi A (2009) Reinforcement learning: a tutorial survey and recent advances. INFORMS J Comput 21(2):178–192

    MathSciNet  MATH  Google Scholar 

  • Grondman I, Busoniu L, Lopes GA, Babuska R (2012) A survey of actor–critic reinforcement learning: standard and natural policy gradients. IEEE Trans Syst Man Cybern C 42(6):1291–1307

    Google Scholar 

  • Guo B, Wang Z, Yu Z, Wang Y, Yen NY, Huang R, Zhou X (2015) Mobile crowd sensing and computing: the review of an emerging human-powered sensing paradigm. ACM Comput Surv 48(1):7

    Google Scholar 

  • Hausknecht M, Stone P (2015) Deep recurrent Q-learning for partially observable MDPS. In: AAAI fall symposium on sequential decision making for intelligent agents, 2015

  • Hernandez-Leal P, Kartal B, Taylor ME (2019) A survey and critique of multiagent deep reinforcement learning. Auton Agents Multi-agent Syst 33(6):750–797

    Google Scholar 

  • Huang BQ, Cao GY, Guo M (2005) Reinforcement learning neural network to the problem of autonomous mobile robot obstacle avoidance. In: Proceedings of 2005 international conference on machine learning and cybernetics, 2005, vol 1. IEEE, pp 85–89

  • Jing Y, Jiang W, Su G, Zhou Z, Wang Y (2014) A learning automata-based singular value decomposition and its application in recommendation system. In: International conference on intelligent computing, 2014. Springer

  • Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285

    Google Scholar 

  • Kim KD, Kumar PR (2012) Cyber–physical systems: a perspective at the centennial. Proc IEEE 100(Special Centennial Issue):1287–1308

    Google Scholar 

  • Kious D, Tarrs P (n.d.) Reinforcement learning in social networks. arXiv preprint. arXiv:1601.00667

  • Kober J, Bagnell JA, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32(11):1238–1274

    Google Scholar 

  • Konda V (1999) Actor–critic algorithms. SIAM J Control Optim 42(4):1143–1166

    MathSciNet  MATH  Google Scholar 

  • Kovács S (2001) SVD reduction in continuous environment reinforcement learning. In: International conference on computational intelligence, 2001. Springer, pp 719–738

  • Kovács S, Baranyi PZ (2003) Fuzzy q-learning in SVD reduced dynamic state-space. Prod Syst Inf Eng Publ Univ Miskolc 1:107–124

    Google Scholar 

  • Krodel M, Kuhnert KD (2002) Reinforcement learning to drive a car by pattern matching. In: IECON, 2002, pp 1728–1733

  • Kuang L, Hao F, Yang LT, Lin M (2014) A tensor-based approach for big data representation and dimensionality reduction. IEEE Trans Emerg Top Comput 2(3):280–291

    Google Scholar 

  • Lahat D, Adali T, Jutten C (2015) Multimodal data fusion: an overview of methods, challenges and prospects. In: Proceedings of the IEEE, 2015, pp 1449–1477

  • Lei L, Tan Y, Zheng K, Liu S, Zhang K, Shen X (2020) Deep reinforcement learning for autonomous internet of things: model, applications and challenges. IEEE Commun Surv Tutor 22(3):1722–1760

    Google Scholar 

  • Levine S, Wagener N, Abbeel P (2015) Learning contact-rich manipulation skills with guided policy search. In: IEEE international conference on robotics and automation, 2015, pp 156–163

  • Levine S, Pastor P, Krizhevsky A, Ibarz J, Quillen D (2018) Learning hand–eye coordination for robotic grasping with deep learning and large-scale data collection. Int J Robot Res 37(4–5):421–436

    Google Scholar 

  • Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (n.d.) Continuous control with deep reinforcement learning. arXiv preprint. arXiv:1509.02971

  • Liu Z, Yang DS, Wen D, Zhang WM, Mao W (2011) Cyber–physical–social systems for command and control. IEEE Intell Syst 26(4):92–96

    Google Scholar 

  • Liu Q, Zhai JW, Zhang Z-Z, Zhong S, Zhou Q, Zhang P, Xu J (2018) A survey on deep reinforcement learning. Chin J Comput 41(1):1–27

    Google Scholar 

  • Liu H, Cai K, Li P, Qian C, Zhao P, Wu X (2023) REDRL: a review-enhanced deep reinforcement learning model for interactive recommendation. Expert Syst Appl 213:118926

    Google Scholar 

  • Luong NC, Hoang DT, Gong S, Niyato D, Wang P, Liang Y-C, Kim DI (2019) Applications of deep reinforcement learning in communications and networking: a survey. IEEE Commun Surv Tutor 21(4):3133–3174

    Google Scholar 

  • Ma J, Yang LT, Apduhan BO, Huang R, Barolli L, Takizawa M (2005) Towards a smart world and ubiquitous intelligence: a walkthrough from smart things to smart hyperspaces and UbicKids. Int J Pervasive Comput Commun 1(1):53–68

    Google Scholar 

  • Machado MC, Bowling M (n.d.) Learning purposeful behaviour in the absence of rewards. arXiv preprint. arXiv:1605.07700

  • Mahadevan S (1996) Average reward reinforcement learning: foundations, algorithms, and empirical results. Mach Learn 22(1–3):159–195

    MATH  Google Scholar 

  • Matignon L, Laurent GJ, Le Fort-Piat N (2012) Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. Knowl Eng Rev 27(1):1–31

    Google Scholar 

  • Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state by fast computing machines. J Chem Phys 21(6):1087–1092

    MATH  Google Scholar 

  • Michie D, Chambers RA (1968) BOXES: an experiment in adaptive control. Mach Intell 2(2):137–152

    MATH  Google Scholar 

  • Milner EC (n.d.) The theory of graphs and its applications. J Lond Math Soc s1-39(1)

  • Misra S, Goswami S, Taneja C (2016) Multivariate data fusion-based learning of video content and service distribution for cyber physical social systems. IEEE Trans Comput Soc Syst 3(1):1–12

    Google Scholar 

  • Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529

    Google Scholar 

  • Naddaf Y, Naddaf Y, Veness J, Bowling M (2013) The Arcade learning environment: an evaluation platform for general agents. J Artif Intell Res 47(1):253–279

    Google Scholar 

  • Nguyen ND, Nguyen T, Nahavandi S (2017) System design perspective for human-level agents using deep reinforcement learning: a survey. IEEE Access 5:27091–27102

    Google Scholar 

  • Nguyen TT, Nguyen ND, Nahavandi S (2020) Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications. IEEE Trans Cybern 50(9):3826–3839

    Google Scholar 

  • Nie J, Haykin S (1999) A dynamic channel assignment policy through Q-learning. IEEE Trans Neural Netw 10(6):1443–1455

    Google Scholar 

  • Ning H, Liu H, Ma J, Yang LT, Huang R (2015) Cybermatics: cyber–physical–social-thinking hyperspace based science and technology. Future Gener Comput Syst 56:504–522

    Google Scholar 

  • Notsu A, Honda K, Ichihashi H, Ido A, Komori Y (2013) Information compression effect based on PCA for reinforcement learning agents’ communication. In: Joint international conference on soft computing and intelligent systems, 2013, pp 1318–1321

  • Ossowski S, Hernández JZ, Belmonte MV et al (2005) Decision support for traffic management based on organisational and communicative multiagent abstractions. Transp Res C 13(4):272–298

    Google Scholar 

  • Panait L, Luke S (2005) Cooperative multi-agent learning: the state of the art. Auton Agents Multi-agent Syst 11(3):387–434

    Google Scholar 

  • Peng J, Williams RJ (1996) Incremental multi-step Q-learning. Mach Learn 22(1–3):283–290

    Google Scholar 

  • Prat N, Comyn-Wattiau I, Akoka J (2011) Combining objects with rules to represent aggregation knowledge in data warehouse and OLAP systems. Data Knowl Eng 70(8):732–752

    Google Scholar 

  • Puterman ML (2009) Markov decision processes: discrete stochastic dynamic programming. Technometrics 37(3):353–353

    Google Scholar 

  • Rajkumar R, Lee I, Sha L, Stankovic J (2010) Cyber–physical systems: the next computing revolution. In: Design automation conference, 2010, pp 731–736

  • Romp G (1997) Game theory: introduction and applications. Oxford University Press, Oxford

    Google Scholar 

  • Rosenfeld A, Taylor ME, Kraus S (2017) Speeding up tabular reinforcement learning using state–action similarities. In: Conference on autonomous agents and multiagent systems, 2017, pp 1722–1724

  • Roy N, Gordon GJ (2003) Exponential family PCA for belief compression in POMDPS. In: Advances in neural information processing systems, 2003, pp 1667–1674

  • Rummery GA (n.d.) Problem solving with reinforcement learning. PhD Thesis, University of Cambridge

  • Rummery GA, Niranjan M (1994) On-line Q-learning using connectionist systems. University of Cambridge, Department of Engineering, Cambridge

    Google Scholar 

  • Sargent TJ (1987) Dynamic macroeconomic theory. Harvard University Press, Cambridge

    MATH  Google Scholar 

  • Schraudolph NN, Yu J, Aberdeen D (2006) Fast online policy gradient learning with SMD gain vector adaptation. In: Advances in neural information processing systems, 2006, pp 1185–1192

  • Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (n.d.) Proximal policy optimization algorithms. arXiv preprint. arXiv:1707.06347

  • Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275(5306):1593–1599

    Google Scholar 

  • Sheth A, Anantharam P, Henson C (2013) Physical–cyber–social computing: an early 21st century approach. IEEE Intell Syst 1:78–82

    Google Scholar 

  • Shi J, Wan J, Yan H, Hui S (2011) A survey of cyber physical systems. In: 2011 International conference on wireless communications and signal processing (WCSP), 2011

  • Shoeleh F, Asadpour M (2017) Graph based skill acquisition and transfer learning for continuous reinforcement learning domains. Pattern Recognit Lett 87:104–116

    Google Scholar 

  • Skyrms B, Pemantle R (2009) A dynamic model of social network formation. In: Adaptive networks. Springer, pp 231–251

  • Sorber L (n.d.) Data fusion: tensor factorizations by complex optimization. PhD Thesis, Faculty of Engineering, KU Leuven, Leuven

  • Sun T, Shen H, Chen T, Li D (2022) Adaptive temporal difference learning with linear function approximation. IEEE Trans Pattern Anal Mach Intell 44(12):8812–8824

    Google Scholar 

  • Sutton RS (1990) Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings of the seventh international conference, 1990

  • Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

    MATH  Google Scholar 

  • Sutton RS, Barto AG (2005) Reinforcement learning: an introduction, Bradford book. Mach Learn 16(1):285–286

    Google Scholar 

  • Sutton RS, McAllester DA, Singh SP, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems, 2000, pp 1057–1063

  • Tamagawa D, Taniguchi E, Yamada T (2010) Evaluating city logistics measures using a multi-agent model. Procedia Soc Behav Sci 2(3):6002–6012

    Google Scholar 

  • Tan Y, Liu W, Qiu Q (2009) Adaptive power management using reinforcement learning. In: Proceedings of the 2009 international conference on computer-aided design, 2009. ACM, pp 461–467

  • Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 10(July):1633–1685

    MathSciNet  MATH  Google Scholar 

  • Tesauro G (1995) TD-Gammon: a self-teaching backgammon program. In: Applications of neural networks. Springer, Boston, pp 267–285

  • Thorndike EL (1998) Animal intelligence: an experimental study of the associative processes in animals. Am Psychol 53(10):1125–1127

    Google Scholar 

  • Tousi M, Hosseinian S, Jadidinejad A, Menhaj M (2008) Application of Saras learning algorithm for reactive power control in power system. In: Proceedings of the 2nd international conference on power and energy, 2008. IEEE, pp 1198–1202

  • Vogel A, Jurafsky D (2010) Learning to follow navigational directions. In: Proceedings of the 48th annual meeting of the Association for Computational Linguistics, 2010. Association for Computational Linguistics, pp 806–814

  • Wang FY (2010) The emergence of intelligent enterprises: from CPS to CPSS. IEEE Intell Syst 25(4):85–88

    Google Scholar 

  • Wang X, Yang LT, Feng J, Chen X, Deen MJ (2016) A tensor-based big service framework for enhanced living environments. IEEE Cloud Comput Mag 3(6):36–43

    Google Scholar 

  • Wang P, Yang LT, Li J (2018) An edge cloud-assisted CPSS framework for smart city. IEEE Cloud Comput 5(5):37–46

    Google Scholar 

  • Wang P, Yang LT, Li J, Chen J, Hu S (2019) Data fusion in cyber–physical–social systems: state-of-the-art and perspectives. Inf Fusion 51:42–57

    Google Scholar 

  • Wang P, Yang LT, Li J, Zhou X (2020a) MMDP: a mobile-IoT based multi-modal reinforcement learning service framework. IEEE Trans Serv Comput 13(4):675–684

    Google Scholar 

  • Wang P, Yang LT, Nie X, Ren Z, Li J, Kuang L (2020b) Data-driven software defined network attack detection: state-of-the-art and perspectives. Inf Sci 513:65–83

    Google Scholar 

  • Wang P, Yang LT, Peng Y, Li J, Xie X (2020c) \({M^2}{T^2}\): the multivariate multi-step transition tensor for user mobility pattern prediction. IEEE Trans Netw Sci Eng 7(2):907–917

    Google Scholar 

  • Wang P, Yang LT, Qian G, Li J, Yan Z (2020d) HO-OTSVD: a novel tensor decomposition and its incremental computation for cyber–physical–social networks (CPSN). IEEE Trans Netw Sci Eng 7(2):713–725

    Google Scholar 

  • Wang P, Yang LT, Qian G, Lu F (2021) The cyber–physical–social transition tensor service framework. IEEE Trans Sustain Comput 6(3):481–492

    Google Scholar 

  • Wang Z, Schaul T, Hessel M, Van Hasselt H, Lanctot M, De Freitas N (n.d.) Dueling network architectures for deep reinforcement learning. arXiv preprint. arXiv:1511.06581

  • Watkins C (1989a) Learning from delayed rewards. Robot Auton Syst 15(4):233–235

    Google Scholar 

  • Watkins C (1989b) Learning from delayed rewards. PhD Thesis, Cambridge University

  • Wei Q, Liu D, Shi G (2015) A novel dual iterative Q-learning method for optimal battery management in smart residential environments. IEEE Trans Ind Electron 62(4):2509–2518

    Google Scholar 

  • Whong C (n.d.) Foiling NYC’s taxi trip data. http://chriswhong.com/opendata/foil-nyc-taxi/

  • Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8(3–4):229–256

    MATH  Google Scholar 

  • Wisetjindawat W, Sano K, Matsumoto S (2005) Supply chain simulation for modeling the interactions in freight movement. J East Asia Soc Transp Stud 6:2991–3004

    Google Scholar 

  • Xiang Y, Niu W, Liu J, Chen T, Han Z (2018a) A PCA-based model to predict adversarial examples on Q-learning of path finding. In: IEEE third international conference on data science in cyberspace, 2018, pp 773–780

  • Xiang Y, Niu W, Liu J, Chen T, Han Z (2018b) A PCA-based model to predict adversarial examples on q-learning of path finding. In: 2018 IEEE third international conference on data science in cyberspace (DSC), 2018, pp 773–780

  • Xu X, He H-G (2002) Residual-gradient-based neural reinforcement learning for the optimal control of an acrobat. In: Proceedings of the international symposium on intelligent control, 2002. IEEE, pp 758–763

  • Xu X, Zuo L, Huang Z (2014) Reinforcement learning algorithms with function approximation: recent advances and applications. Inf Sci 261:1–31

    MathSciNet  MATH  Google Scholar 

  • Yau K-LA, Qadir J, Khoo HL, Ling MH, Komisarczuk P (2017) A survey on reinforcement learning models and algorithms for traffic signal control. ACM Comput Surv (CSUR) 50(3):1–38

    Google Scholar 

  • Zeng J, Yang LT, Ma J (2016) A system-level modeling and design for cyber–physical–social systems. ACM Trans Embed Comput Syst 15(2):1–26

    Google Scholar 

  • Zhao D, Shao K, Zhu Y, Li D, Chen Y, Wang H, Liu D-R, Zhou T, Wang C-H (2016) Review of deep reinforcement learning and discussions on the development of computer go. Control Theory Appl 33(6):701–717

    MATH  Google Scholar 

  • Zhou Y (n.d.) Data-driven cyber–physical–social system for knowledge discovery in smart cities. PhD Thesis, University of Surrey

Download references

Acknowledgements

This work was supported by National Natural Science Foundation Project under Grants 62166047, 62101481, 62002313 and 62261060.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Puming Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, X., Wang, P., Jin, X. et al. Reinforcement learning architecture for cyber–physical–social AI: state-of-the-art and perspectives. Artif Intell Rev 56, 12655–12688 (2023). https://doi.org/10.1007/s10462-023-10450-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-023-10450-2

Keywords

Navigation