Skip to main content

Reinforcement Learning in Cyclic Environmental Changes for Agents in Non-Communicative Environments: A Theoretical Approach

  • Conference paper
  • First Online:
Explainable and Transparent AI and Multi-Agent Systems (EXTRAAMAS 2023)

Abstract

Multi-agent Reinforcement Learning is required to adapt to the dynamic of the environment by transferring the learning outcomes in the case of the non-communicative and dynamic environment. Profit minimizing reinforcement learning with the oblivion of memory (PMRL-OM) enables agents to learn a co-operative policy using learning dynamics instead of communication information. It enables the agents to adapt to the dynamics of the other agents’ behaviors without any design of the relationship or communication rules between agents. It helps easily to add robots to the system with keeping co-operation in a multi-robot system. However, it is available for long-term dynamic changes, but not for the short-them changes because it used the outcome with enough trials. This paper picked up cyclic environmental changes as short-term changes and aimed to improve the performance in cyclic environmental changes and analyze theoretically the rationality of this approach. Specifically, we extend PMRL-OM based on an analysis of the PMRL-OM approach. Our experiments evaluated the performance of the proposed method for a navigation task in a maze-type environment undergoing cyclic environmental change, with the results showing that the proposed method gave an enhanced performance. Our method also enabled the adaptation to cyclic change to occur sooner than for the existing PMRL-OM method. In addition, the theoretical analysis not only investigates the PMRL-OM rationality but also suggests optimal parameter values for the proposed method. The proposed method contributed to XAI by showing the precise profits of the agents and the approach with rationality.

This research was supported by JSPS KAKENHI Grant Number JP21K17807.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bargiacchi, E., Verstraeten, T., Roijersk, D.M., Nowé, A., van Hasselt, H.: Learning to coordinate with coordination graphs in repeated single-stage multi-agent decision problems. In: The 35th International Conference on Machine Learning, vol. 80, 482–490 (2018)

    Google Scholar 

  2. Chen, L., et al.: Multiagent path finding using deep reinforcement learning coupled with hot supervision contrastive loss. IEEE Trans. Industr. Electron. 70(7), 7032–7040 (2023). https://doi.org/10.1109/TIE.2022.3206745

    Article  Google Scholar 

  3. Ding, S., Aoyama, H., Lin, D.: Combining multiagent reinforcement learning and search method for drone delivery on a non-grid graph. In: Advances in Practical Applications of Agents, Multi-Agent Systems, and Complex Systems Simulation. The PAAMS Collection: 20th International Conference, PAAMS 2022, L’Aquila, Italy, July 13–15, 2022, Proceedings, pp. 112–126. Springer-Verlag, Berlin, Heidelberg (2022)

    Google Scholar 

  4. Du, Y., et al.: Learning correlated communication topology in multi-agent reinforcement learning. In: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pp. 456–464. AAMAS ’21, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2021)

    Google Scholar 

  5. Grefenstette, J.J.: Credit assignment in rule discovery systems based on genetic algorithms. Mach. Learn. 3(2), 225–245 (1988). https://doi.org/10.1023/A:1022614421909

    Article  Google Scholar 

  6. Li, J., Shi, H., Hwang, K.S.: Using fuzzy logic to learn abstract policies in large-scale multiagent reinforcement learning. IEEE Trans. Fuzzy Syst. 30(12), 5211–5224 (2022). https://doi.org/10.1109/TFUZZ.2022.3170646

    Article  Google Scholar 

  7. Raileanu, R., Denton, E., Szlam, A., Fergus, R.: Modeling others using oneself in multi-agent reinforcement learning. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 4257–4266. PMLR (10–15 Jul 2018). https://proceedings.mlr.press/v80/raileanu18a.html

  8. Rashid, T., Samvelyan, M., Schroeder, C., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: The 35th International Conference on Machine Learning, vol. 80, pp. 4295–4304 (2018). http://proceedings.mlr.press/v80/rashid18a.html

  9. Rashid, T., Farquhar, G., Peng, B., Whiteson, S.: Weighted qmix: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS’20, Curran Associates Inc., Red Hook, NY, USA (2020)

    Google Scholar 

  10. Sigaud, O., Buffet, O.: Markov Decision Processes in Artificial Intelligence. Wiley-IEEE Press (2010)

    Google Scholar 

  11. Uwano, F., Takadama, K.: Utilizing observed information for no-communication multi-agent reinforcement learning toward cooperation in dynamic environment. SICE J. Contr. Measure. Syst. Integr. 12(5), 199–208 (2019). https://doi.org/10.9746/jcmsi.12.199

    Article  Google Scholar 

  12. Uwano, F., Tatebe, N., Tajima, Y., Nakata, M., Kovacs, T., Takadama, K.: Multi-agent cooperation based on reinforcement learning with internal reward in maze problem. SICE J. Contr., Measure. Syst. Integr. 11(4), 321–330 (2018). https://doi.org/10.9746/jcmsi.11.321

    Article  Google Scholar 

  13. Uwano, F., Takadama, K.: Directionality reinforcement learning to operate multi-agent system without communication (2021). 10.48550/ARXIV.2110.05773, arXiv:2110.05773

  14. Zhou, Z., Xu, H.: Decentralized adaptive optimal tracking control for massive autonomous vehicle systems with heterogeneous dynamics: A stackelberg game. IEEE Trans. Neural Netw. Learn. Syst. 32(12), 5654–5663 (2021). https://doi.org/10.1109/TNNLS.2021.3100417

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fumito Uwano .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Uwano, F., Takadama, K. (2023). Reinforcement Learning in Cyclic Environmental Changes for Agents in Non-Communicative Environments: A Theoretical Approach. In: Calvaresi, D., et al. Explainable and Transparent AI and Multi-Agent Systems. EXTRAAMAS 2023. Lecture Notes in Computer Science(), vol 14127. Springer, Cham. https://doi.org/10.1007/978-3-031-40878-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-40878-6_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-40877-9

  • Online ISBN: 978-3-031-40878-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics