Open and real-world human-AI coordination by heterogeneous training with communication

Guan, Cong; Xue, Ke; Fan, Chunpeng; Chen, Feng; Zhang, Lichao; Yuan, Lei; Qian, Chao; Yu, Yang

doi:10.1007/s11704-024-3797-6

Open and real-world human-AI coordination by heterogeneous training with communication

Research Article
Published: 22 November 2024

Volume 19, article number 194314, (2025)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Cong Guan^1,2,
Ke Xue^1,2,
Chunpeng Fan³,
Feng Chen^1,2,
Lichao Zhang³,
Lei Yuan^1,2,3,
Chao Qian^1,3 &
…
Yang Yu^1,2,3

114 Accesses
4 Altmetric
1 Mention
Explore all metrics

Abstract

Human-AI coordination aims to develop AI agents capable of effectively coordinating with human partners, making it a crucial aspect of cooperative multi-agent reinforcement learning (MARL). Achieving satisfying performance of AI agents poses a long-standing challenge. Recently, ah-hoc teamwork and zero-shot coordination have shown promising advancements in open-world settings, requiring agents to coordinate efficiently with a range of unseen human partners. However, these methods usually assume an overly idealistic scenario by assuming homogeneity between the agent and the partner, which deviates from real-world conditions. To facilitate the practical deployment and application of human-AI coordination in open and real-world environments, we propose the first benchmark for open and real-world human-AI coordination (ORC) called ORCBench. ORCBench includes widely used human-AI coordination environments. Notably, within the context of real-world scenarios, ORCBench considers heterogeneity between AI agents and partners, encompassing variations in capabilities and observations, which aligns more closely with real-world applications. Furthermore, we introduce a framework known as Heterogeneous training with Communication (HeteC) for ORC. HeteC builds upon a heterogeneous training framework and enhances partner population diversity by using mixed partner training and frozen historical partners. Additionally, HeteC incorporates a communication module that enables human partners to communicate with AI agents, mitigating the adverse effects of partially observable environments. Through a series of experiments, we demonstrate the effectiveness of HeteC in improving coordination performance. Our contribution serves as an initial but important step towards addressing the challenges of ORC.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Decentralized multi-agent cooperation via adaptive partner modeling

Article Open access 15 April 2024

Cooperative Multi-Agent Reinforcement Learning with Dynamic Target Localization: A Reward Sharing Approach

Six Challenges for Human-AI Co-learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Klein G, Woods D D, Bradshaw J M, Hoffman R R, Feltovich P J. Ten challenges for making automation a “team player” in joint human-agent activity. IEEE Intelligent Systems, 2004, 19(6): 91–95
Article Google Scholar
Dafoe A, Bachrach Y, Hadfield G, Horvitz E, Larson K, Graepel T. Cooperative AI: machines must learn to find common ground. Nature, 2021, 593(7857): 33–36
Article Google Scholar
Hernandez-Leal P, Kartal B, Taylor M E. A survey and critique of multiagent deep reinforcement learning. Autonomous Agents and Multi-Agent Systems, 2019, 33(6): 750–797
Article Google Scholar
Du W, Ding S F. A survey on multi-agent deep reinforcement learning: From the perspective of challenges and applications. Artificial Intelligence Review, 2021, 54(5): 3215–3238
Article Google Scholar
Oroojlooy A, Hajinezhad D. A review of cooperative multi-agent deep reinforcement learning. Applied Intelligence, 2023, 53(11): 13677–13722
Article Google Scholar
Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I. Multi-agent actor-critic for mixed cooperative-competitive environments. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 6382–6393
Google Scholar
Sunehag P, Lever G, Gruslys A, Czarnecki W M, Zambaldi V, Jaderberg M, Lanctot M, Sonnerat N, Leibo J Z, Tuyls K, Graepel T. Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. 2018, 2085–2087
Google Scholar
Rashid T, Samvelyan M, Schroeder C, Farquhar G, Foerster J, Whiteson S. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 4295–4304
Google Scholar
Yu C, Velu A, Vinitsky E, Gao J, Wang Y, Bayen A M, Wu Y. The surprising effectiveness of PPO in cooperative multi-agent games. In: Proceedings of the 36th Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 2022, 24611–24624
Google Scholar
Gorsane R, Mahjoub O, De Kock R J, Dubb R, Singh S, Pretorius A. Towards a standardised performance evaluation protocol for cooperative marl. In: Proceedings of the 36th Conference on Neural Information Processing Systems, 2022, 5510–5521
Google Scholar
Hu H, Lerer A, Peysakhovich A, Foerster J. “Other-play” for zero-shot coordination. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 409
Google Scholar
Carroll M, Shah R, Ho M K, Griffiths T, Seshia S A, Abbeel P, Dragan A. On the utility of learning about humans for human-AI coordination. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 465
Google Scholar
Yuan L, Li L, Zhang Z, Chen F, Zhang T, Guan C, Yu Y, Zhou Z H. Learning to coordinate with anyone. In: Proceedings of the 5th International Conference on Distributed Artificial Intelligence, 2023, 4
Google Scholar
Zhou Z H. Open-environment machine learning. National Science Review, 2022, 9(8): nwac123
Article Google Scholar
Liu X, Liang J, Liu D Y, Chen R, Yuan S M. Weapon-target assignment in unreliable peer-to-peer architecture based on adapted artificial bee colony algorithm. Frontiers of Computer Science, 2022, 16(1): 161103
Article Google Scholar
Parmar J, Chouhan S, Raychoudhury V, Rathore S. Open-world machine learning: applications, challenges, and opportunities. ACM Computing Surveys, 2023, 55(10): 205
Article Google Scholar
Yuan L, Zhang Z, Li L, Guan C, Yu Y. A survey of progress on cooperative multi-agent reinforcement learning in open environment. 2023, arXiv preprint arXiv: 2312.01058
Stone P, Kaminka G A, Kraus S, Rosenschein J S. Ad hoc autonomous agent teams: Collaboration without pre-coordination. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence. 2010, 1504–1509
Google Scholar
Mirsky R, Carlucho I, Rahman A, Fosong E, Macke W, Sridharan M, Stone P, Albrecht S V. A survey of ad Hoc teamwork research. In: Proceedings of the 19th European Conference on Multi-Agent Systems. 2022, 275–293
Chapter Google Scholar
Lupu A, Cui B, Hu H, Foerster J. Trajectory diversity for zero-shot coordination. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 7204–7213
Google Scholar
Strouse D J, McKee K R, Botvinick M, Hughes E, Everett R. Collaborating with humans without human data. In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021, 14502–14515
Google Scholar
Zhao R, Song J, Yuan Y, Hu H, Gao Y, Wu Y, Sun Z, Yang W. Maximum entropy population-based training for zero-shot human-AI coordination. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. 2023, 689
Google Scholar
Yu C, Gao J, Liu W, Xu B, Tang H, Yang J, Wang Y, Wu Y. Learning zero-shot cooperation with humans, assuming humans are biased. In: Proceedings of the 11th International Conference on Learning Representations. 2023
Google Scholar
Wang X, Zhang S, Zhang W, Dong W, Chen J, Wen Y, Zhang W. Quantifying zero-shot coordination capability with behavior preferring partners. In: Proceedings of the 12th International Conference on Learning Representations. 2024
Google Scholar
Kapetanakis S, Kudenko D. Reinforcement learning of coordination in heterogeneous cooperative multi-agent systems. In: Proceedings of the 3rd International Joint Conference on Autonomous Agents and Multiagent Systems. 2004, 1258–1259
Google Scholar
Wang C, Pérez-D’Arpino C, Xu D, Li F F, Liu K, Savarese S. Co-GAIL: Learning diverse strategies for human-robot collaboration. In: Proceedings of the 5th Conference on Robot Learning. 2022, 1279–1290
Google Scholar
Xue K, Wang Y, Guan C, Yuan L, Fu H, Fu Q, Qian C, Yu Y. Heterogeneous multi-agent zero-shot coordination by coevolution. 2022, arXiv preprint arXiv: 2208.04957
Cabrera C, Paleyes A, Thodoroff P, Lawrence N D. Real-world machine learning systems: a survey from a data-oriented architecture perspective. 2023, arXiv preprint arXiv: 2302.04810
Davenport T H, Ronanki R. Artificial intelligence for the real world. Harvard Business Review, 2018, 96(1): 108–116
Google Scholar
Fontaine M C, Hsu Y C, Zhang Y, Tjanaka B, Nikolaidis S. On the importance of environments in human-robot coordination. In: Proceedings of the 17th Robotics: Science and Systems 2021. 2021
Google Scholar
Busoniu L, Babuska R, De Schutter B. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2008, 38(2): 156–172
Article Google Scholar
Zhang K, Yang Z, Başar T. Multi-agent reinforcement learning: a selective overview of theories and algorithms. In: Vamvoudakis K G, Wan Y, Lewis F L, Cansever D, eds. Handbook of Reinforcement Learning and Control. Cham: Springer, 2021, 321–384
Chapter Google Scholar
Sartoretti G, Kerr J, Shi Y, Wagner G, Kumar T K S, Koenig S, Choset H. Primal: pathfinding via reinforcement and imitation multi-agent learning. IEEE Robotics and Automation Letters, 2019, 4(3): 2378–2385
Article Google Scholar
Wang J, Xu W, Gu Y, Song W, Green T C. Multi-agent reinforcement learning for active voltage control on power distribution networks. In: Proceedings of the 35th Conference on Advances in Neural Information Processing Systems. 2021, 3271–3284
Google Scholar
Xue K, Xu J, Yuan L, Li M, Qian C, Zhang Z, Yu Y. Multi-agent dynamic algorithm configuration. In: Proceedings of the 36th Conference on Advances in Neural Information Processing Systems. 2022, 20147–20161
Google Scholar
Wen M, Kuba J G, Lin R, Zhang W, Wen Y, Wang J, Yang Y. Multi-agent reinforcement learning is a sequence modeling problem. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 16509–16521
Google Scholar
Samvelyan M, Rashid T, De Witt C S, Farquhar G, Nardelli N, Rudner T G J, Hung C, Torr P H S, Foerster J N, Whiteson S. The starcraft multi-agent challenge. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 2019, 2186–2188
Google Scholar
Bard N, Foerster J N, Chandar S, Burch N, Lanctot M, Song H F, Parisotto E, Dumoulin V, Moitra S, Hughes E, Dunning I, Mourad S, Larochelle H, Bellemare M G, Bowling M. The hanabi challenge: A new frontier for AI research. Artificial Intelligence, 2020, 280: 103216
Article MathSciNet Google Scholar
Zhu C, Dastani M, Wang S. A survey of multi-agent reinforcement learning with communication. 2022, arXiv preprint arXiv: 2203.08975
Zhang F, Jia C, Li Y C, Yuan L, Yu Y, Zhang Z. Discovering generalizable multi-agent coordination skills from multi-task offline data. In: Proceedings of the 11th International Conference on Learning Representations. 2023
Google Scholar
Wang X, Zhang Z, Zhang W. Model-based multi-agent reinforcement learning: Recent progress and prospects. 2022, arXiv preprint arXiv: 2203.10603
Guo J, Chen Y, Hao Y, Yin Z, Yu Y, Li S. Towards comprehensive testing on the robustness of cooperative multi-agent reinforcement learning. In: Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 2022
Google Scholar
Yuan L, Zhang Z, Xue K, Yin H, Chen F, Guan C, Li L, Qian C, Yu Y. Robust multi-agent coordination via evolutionary generation of auxiliary adversarial attackers. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. 2023, 1319
Google Scholar
Foerster J N, Assael Y M, De Freitas N, Whiteson S. Learning to communicate with deep multi-agent reinforcement learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 2145–2153
Google Scholar
Sukhbaatar S, Szlam A, Fergus R. Learning multiagent communication with backpropagation. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 2252–2260
Google Scholar
Ding Z, Huang T, Lu Z. Learning individually inferred communication for multi-agent cooperation. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1851
Google Scholar
Mao H, Zhang Z, Xiao Z, Gong Z, Ni Y. Learning agent communication under limited bandwidth by message pruning. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 5142–5149
Google Scholar
Yuan L, Wang J, Zhang F, Wang C, Zhang Z, Yu Y, Zhang C. Multi-agent incentive communication via decentralized teammate modeling. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence. 2022, 9466–9474
Google Scholar
Zhang S Q, Zhang Q, Lin J. Efficient communication in multi-agent reinforcement learning via variance based control. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 291
Google Scholar
Zhang S Q, Zhang Q, Lin J. Succinct and robust multi-agent communication with temporal message control. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1449
Google Scholar
Guan C, Chen F, Yuan L, Wang C, Yin H, Zhang Z, Yu Y. Efficient multi-agent communication via self-supervised information aggregation. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 1020–1033
Google Scholar
Das A, Gervet T, Romoff J, Batra D, Parikh D, Rabbat M, Pineau J. TarMAC: Targeted multi-agent communication. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 1538–1546
Google Scholar
Guan C, Chen F, Yuan L, Zhang Z, Yu Y. Efficient communication via self-supervised information aggregation for online and offline multi-agent reinforcement learning. 2023, arXiv preprint arXiv: 2302.09605
Yuan L, Jiang T, Li L, Chen F, Zhang Z, Yu Y. Robust multi-agent communication via multi-view message certification. 2023, arXiv preprint arXiv: 2305.13936
Yuan L, Chen F, Zhang Z, Yu Y. Communication-robust multi-agent learning by adaptable auxiliary multi-agent adversary generation. Frontiers of Computer Science, 2024, 18(6): 186331
Article Google Scholar
Gwak J, Jung J, Oh R, Park M, Rakhimov M A K, Ahn J. A review of intelligent self-driving vehicle software research. KSII Transactions on Internet and Information Systems (TIIS), 2019, 13(11): 5299–5320
Google Scholar
Andrychowicz O M, Baker B, Chociej M, Józefowicz R, McGrew B, Pachocki J, Petron A, Plappert M, Powell G, Ray A, Schneider J, Sidor S, Tobin J, Welinder P, Weng L L, Zaremba W. Learning dexterous inhand manipulation. The International Journal of Robotics Research, 2020, 39(1): 3–20
Article Google Scholar
Engelbart D C. Augmenting human intellect: a conceptual framework. Stanford Research Institute, 2023
Google Scholar
Carter S, Nielsen M. Using artificial intelligence to augment human intelligence. Distill, 2017, 2(12): e9
Article Google Scholar
Hu H, Lerer A, Cui B, Pineda L, Brown N, Foerster J N. Off-belief learning. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 4369–4379
Google Scholar
Treutlein J, Dennis M, Oesterheld C, Foerster J. A new formalism, method and open issues for zero-shot coordination. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 10413–10423
Google Scholar
Li Y, Zhang S, Sun J, Du Y, Wen Y, Wang X, Pan W. Cooperative open-ended learning framework for zero-shot coordination. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 844
Google Scholar
Oliehoek F A, Amato C. A Concise Introduction to Decentralized POMDPs. Cham: Springer, 2016
Book Google Scholar
Xue W, Qiu W, An B, Rabinovich Z, Obraztsova S, Yeo C K. Misspoke or mis-lead: Achieving robustness in multi-agent communicative reinforcement learning. In: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems. 2022, 1418–1426
Google Scholar
Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap T, Simonyan K, Hassabis D. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. 2017, arXiv preprint arXiv: 1712.01815
Tesauro G. TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 1994, 6(2): 215–219
Article Google Scholar
Jaderberg M, Dalibard V, Osindero S, Czarnecki W M, Donahue J, Razavi A, Vinyals O, Green T, Dunning I, Simonyan K, Fernando C, Kavukcuoglu K. Population based training of neural networks. 2017, arXiv preprint arXiv: 1711.09846
Lucas K, Allen R E. Any-play: an intrinsic augmentation for zero shot coordination. In: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems. 2022, 853–861
Google Scholar
Mondal W U, Agarwal M, Aggarwal V, Ukkusuri S V. On the approximation of cooperative heterogeneous multi-agent reinforcement learning (MARL) using mean field control (MFC). Journal of Machine Learning Research, 2022, 23(1): 129
MathSciNet Google Scholar
Kuba J G, Feng X, Ding S, Dong H, Wang J, Yang Y. Heterogeneous-agent mirror learning: A continuum of solutions to cooperative MARL. 2022, arXiv preprint arXiv: 2208.01682
Charakorn R, Manoonpong P, Dilokthanakul N. Generating diverse cooperative agents by learning incompatible policies. In: Proceedings of the 11th International Conference on Learning Representations. 2023
Google Scholar
Lou X, Guo J, Zhang J, Wang J, Huang K, Du Y. PECAN: leveraging policy ensemble for context-aware zero-shot human-AI coordination. In: Proceedings of the 22nd International Conference on Autonomous Agents and Multiagent Systems. 2023, 679–688
Google Scholar
Zheng S, Trott A, Srinivasa S, Naik N, Gruesbeck M, Parkes D C, Socher R. The AI economist: Improving equality and productivity with AI-Driven tax policies. 2020, arXiv preprint arXiv: 2004. 13332
Bäck T. Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms. New York: Oxford University Press, 1996
Book Google Scholar
Hao H, Zhang X, Zhou A. Enhancing SAEAs with unevaluated solutions: A case study of relation model for expensive optimization. Science China Information Sciences, 2024, 67(2): 120103
Article Google Scholar
Wang Y, Xue K, Qian C. Evolutionary diversity optimization with clustering-based selection for reinforcement learning. In: Proceedings of the 10th International Conference on Learning Representations. 2022
Google Scholar
Demšar J. Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 2006, 7: 1–30
MathSciNet Google Scholar

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China (2020AAA0107200), the National Natural Science Foundation of China (Grant Nos. 61921006, 61876119, 62276126), the Natural Science Foundation of Jiangsu (BK20221442). We thank Lihe Li and Ziqian Zhang for their useful suggestions and discussions.

Author information

Authors and Affiliations

National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
Cong Guan, Ke Xue, Feng Chen, Lei Yuan, Chao Qian & Yang Yu
School of Artificial Intelligence, Nanjing University, Nanjing, 210023, China
Cong Guan, Ke Xue, Feng Chen, Lei Yuan & Yang Yu
Polixir Technologies, Nanjing, 211106, China
Chunpeng Fan, Lichao Zhang, Lei Yuan, Chao Qian & Yang Yu

Authors

Cong Guan
View author publications
You can also search for this author inPubMed Google Scholar
Ke Xue
View author publications
You can also search for this author inPubMed Google Scholar
Chunpeng Fan
View author publications
You can also search for this author inPubMed Google Scholar
Feng Chen
View author publications
You can also search for this author inPubMed Google Scholar
Lichao Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Lei Yuan
View author publications
You can also search for this author inPubMed Google Scholar
Chao Qian
View author publications
You can also search for this author inPubMed Google Scholar
Yang Yu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Yang Yu.

Ethics declarations

Competing interests The authors declare that they have no competing interests or financial conflicts to disclose.

Additional information

Cong Guan received the BSc degree and MSc degree from School of Mechanical Engineering and Automation, Northeastern University, China. He is currently pursuing the PhD degree with the Department of Computer Science and Technology, Nanjing University, China. His current research interests mainly include machine learning, reinforcement learning, and multi-agent reinforcement learning.

Ke Xue received the BSc degree in Mathematics and Applied Mathematics from School of Mathematics, Sun Yat-Sen University, China in 2019. He is currently pursuing the PhD degree with the School of Artificial Intelligence, Nanjing University, China. His current research interests mainly include machine learning and black-box optimization.

Chunpeng Fan received his MSc degree in communication engineering from Liaoning University of Technology, China in 2017. He is currently working in Polixir Technologies. His research interests include multi-agent reinforcement learning, multiagent system.

Feng Chen received his BSc degree from School of Artificial Intelligence, Nanjing University, China in 2022. He is currently pursuing the MSc degree with the School of Artificial Intelligence, Nanjing University, Nanjing, China. His research interests include multi-agent reinforcement learning, multiagent system.

Lichao Zhang received his MSc degree in Agricultural Electrification and Automation from Shihezi University, China in 2018. He is currently working in Polixir Technologies. His research interests include multi-agent reinforcement learning, multiagent system.

Lei Yuan received the BSc degree in Department of Electronic Engineering in 2016 from Tsinghua University, and his MSc degree from Chinese Aeronautical Establishment, China in 2019. He is currently pursuing the PhD degree with the Department of Computer Science and Technology, Nanjing University, China. His current research interests mainly include machine learning, reinforcement learning, and multi-agent reinforcement learning.

Chao Qian received PhD degree in the Department of Computer Science and Technology from Nanjing University, China in 2015, and is currently an associate professor at the School of Artificial Intelligence, Nanjing University, China. His research interests are mainly theoretical analysis of evolutionary algorithms, design of safe and efficient EAs, and evolutionary learning. He is an associate editor of IEEE Transactions on Evolutionary Computation, an associate editor of SCIENCE CHINA Information Sciences. He has regularly given tutorials and co-chaired special sessions at leading evolutionary computation conferences (CEC, GECCO, PPSN), and has been invited to give an Early Career Spotlight Talk “Towards Theoretically Grounded Evolutionary Learning” at IJCAI 2022.

Yang Yu received the PhD degree in the Department of Computer Science and Technology from Nanjing University, China in 2011, and is currently a professor at the School of Artificial Intelligence, Nanjing University, China. His research interests include machine learning, mainly reinforcement learning and derivative-free optimization for learning. Prof. Yu was granted the CCF-IEEE CS Young Scientist Award in 2020, recognized as one of the AI’s 10 to Watch by IEEE Intelligent Systems, and received the PAKDD Early Career Award in 2018. His teams won the Champion of the 2018 OpenAI Retro Contest on transfer reinforcement learning and the 2021 ICAPS Learning to Run a Power Network Challenge with Trust. He served as Area Chairs for NeurIPS, ICML, IJCAI, AAAI, etc.

Electronic supplementary material