ASN: action semantics network for multiagent reinforcement learning

Yang, Tianpei; Wang, Weixun; Hao, Jianye; Taylor, Matthew E.; Liu, Yong; Hao, Xiaotian; Hu, Yujing; Chen, Yingfeng; Fan, Changjie; Ren, Chunxu; Huang, Ye; Zhu, Jiangcheng; Gao, Yang

doi:10.1007/s10458-023-09628-3

ASN: action semantics network for multiagent reinforcement learning

Published: 10 November 2023

Volume 37, article number 45, (2023)
Cite this article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Tianpei Yang^1,2,3^na1,
Weixun Wang⁴^na1,
Jianye Hao^1,5,
Matthew E. Taylor^2,3,
Yong Liu⁶,
Xiaotian Hao¹,
Yujing Hu⁴,
Yingfeng Chen⁴,
Changjie Fan⁴,
Chunxu Ren⁴,
Ye Huang⁴,
Jiangcheng Zhu⁵ &
…
Yang Gao⁶

546 Accesses
Explore all metrics

Abstract

In multiagent systems (MASs), each agent makes individual decisions but all contribute globally to the system’s evolution. Learning in MASs is difficult since each agent’s selection of actions must take place in the presence of other co-learning agents. Moreover, the environmental stochasticity and uncertainties increase exponentially with the number of agents. Previous works borrow various multiagent coordination mechanisms for use in deep learning architectures to facilitate multiagent coordination. However, none of them explicitly consider that different actions can have different influence on other agents, which we call the action semantics. In this paper, we propose a novel network architecture, named Action Semantics Network (ASN), that explicitly represents such action semantics between agents. ASN characterizes different actions’ influence on other agents using neural networks based on the action semantics between them. ASN can be easily combined with existing deep reinforcement learning (DRL) algorithms to boost their performance. Experimental results on StarCraft II micromanagement and Neural MMO show that ASN significantly improves the performance of state-of-the-art DRL approaches, compared with several other network architectures. We also successfully deploy ASN to a popular online MMORPG game called Justice Online, which indicates a promising future for ASN to be applied in even more complex scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MAIT: Multi-agent Local Observation Interaction to Improve the Decision-Making Ability

A Real-Time Multiagent Strategy Learning Environment and Experimental Framework

GHQ: grouped hybrid Q-learning for cooperative heterogeneous multi-agent reinforcement learning

Article Open access 23 April 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

More details can be found at https://sites.google.com/view/asn-intro, the source code is at https://github.com/wwxFromTju/ASN_cloud
https://github.com/wwxFromTju/MA-RLlib
Our ASN is compatible with extensions of QMIX since these methods follow the QMIX structure and we select QMIX as an representative baseline.
The implementation details are not public since this is a commercial game and all details are close-sourced.

References

Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., & Ostrovski, G. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529.
Article Google Scholar
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2016). Continuous control with deep reinforcement learning. In Proceedings of the 4th international conference on learning representations.
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T. P., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., & Hassabis, D. (2017). Mastering the game of go without human knowledge. Nature, 550(7676), 354–359.
Article Google Scholar
Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the fifteenth national conference on artificial intelligence and tenth innovative applications of artificial intelligence conference (pp. 746–752).
Hu, J., Wellman, M. P. (1998). Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proceedings of the fifteenth international conference on machine learning (pp. 242–250).
Bu, L., Babu, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2), 156–172.
Article Google Scholar
Hauwere, Y. D., Devlin, S., Kudenko, D., & Nowé, A. (2016). Context-sensitive reward shaping for sparse interaction multi-agent systems. The Knowledge Engineering Review, 31(1), 59–76.
Article Google Scholar
Yang, T., Wang, W., Tang, H., Hao, J., Meng, Z., Mao, H., Li, D., Liu, W., Chen, Y., & Hu, Y. (2021). An efficient transfer learning framework for multiagent reinforcement learning. In Advances in neural information processing systems (vol. 34, pp. 17037–17048).
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O. P., & Mordatch, I. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in neural information processing systems (pp. 6379–6390).
Foerster, J. N., Farquhar, G., Afouras, T., Nardelli, N., & Whiteson, S. (2018). Counterfactual multi-agent policy gradients. In Proceedings of the thirty-second AAAI conference on artificial intelligence.
Yang, Y., Luo, R., Li, M., Zhou, M., Zhang, W., & Wang, J. (2018). Mean field multi-agent reinforcement learning. In Proceedings of the 35th international conference on machine learning (pp. 5567–5576).
Stanley, H. E. (1971). Phase transitions and critical phenomena. Clarendon Press.
Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M., Zambaldi, V., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J. Z., & Tuyls, K. (2018). Value-decomposition networks for cooperative multi-agent learning based on team reward. In Proceedings of the 17th international conference on autonomous agents and multiagent systems (pp. 2085–2087).
Rashid, T., Samvelyan, M., Witt, C. S., Farquhar, G., Foerster, J., & Whiteson, S. (2018). QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning. In Proceedings of the 35th international conference on machine learning (pp. 4292–4301).
Sukhbaatar, S., Szlam, A., & Fergus, R. (2016). Learning multiagent communication with backpropagation. In Advances in neural information processing systems (vol. 29, pp. 2244–2252).
Singh, A., Jain, T., & Sukhbaatar, S. (2019). Individualized controlled continuous communication model for multiagent cooperative and competitive tasks. In Proceedings of the 7th international conference on learning representations.
Zambaldi, V. F., Raposo, D., Santoro, A., Bapst, V., Li, Y., Babuschkin, I., Tuyls, K., Reichert, D. P., Lillicrap, T. P., Lockhart, E., Shanahan, M., Langston, V., Pascanu, R., Botvinick, M., Vinyals, O., & Battaglia, P. W. (2019). Deep reinforcement learning with relational inductive biases. In Proceedings of the 7th international conference on learning representations.
Tacchetti, A., Song, H. F., Mediano, P. A. M., Zambaldi, V.F., Kramár, J., Rabinowitz, N. C., Graepel, T., Botvinick, M., Battaglia, P. . (2019). Relational forward models for multi-agent learning. In Proceedings of the 7th international conference on learning representations.
Pachocki, J., Brockman, G., Raiman, J., Zhang, S., Pondé, H., Tang, J., Wolski, F., Dennison, C., Jozefowicz, R., Debiak, P., et al. (2018). OpenAI five, 2018. URL https://blog.openai.com/openai-five
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347
Samvelyan, M., Rashid, T., de Witt, C. S., Farquhar, G., Nardelli, N., Rudner, T. G. J., Hung, C., Torr, P. H. S., Foerster, J. N., & Whiteson, S. (2019). The starcraft multi-agent challenge, pp. 2186–2188.
Suarez, J., Du, Y., Isola, P., & Mordatch, I. (2019). Neural MMO: A massively multiagent game environment for training and evaluating intelligent agents. arXiv preprint arXiv:1903.00784
Wang, W., Yang, T., Liu, Y., Hao, J., Hao, X., Hu, Y., Chen, Y., Fan, C., & Gao, Y. (2020). Action semantics network: Considering the effects of actions in multiagent systems. In Proceedings of the 8th international conference on learning representations.
Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the Eleventh International Conference on Machine Learning (pp. 157–163).
Hansen, E. A., Bernstein, D. S., & Zilberstein, S. (2004). Dynamic programming for partially observable stochastic games. In Proceedings of the nineteenth national conference on artificial intelligence (pp. 709–715).
Watkins, C. J. C. H., & Dayan, P. (1992). Technical note Q-learning. Machine Learning, 8, 279–292.
Article Google Scholar
van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double Q-learning. In Proceedings of the thirtieth AAAI conference on artificial intelligence (pp. 2094–2100).
Anschel, O., Baram, N., & Shimkin, N. (2017). Averaged-DQN: Variance reduction and stabilization for deep reinforcement learning. In Proceedings of the 34th international conference on machine learning (pp. 176–185).
Bellemare, M. G., Dabney, W., & Munos, R. (2017). A distributional perspective on reinforcement learning. In Proceedings of the 34th international conference on machine learning (pp. 449–458).
Dabney, W., Rowland, M., Bellemare, M. G., & Munos, R. (2018). Distributional reinforcement learning with quantile regression. In Proceedings of the thirty-second AAAI conference on artificial intelligence (pp. 2892–2901).
Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., & Freitas, N. (2016). Dueling network architectures for deep reinforcement learning. In Proceedings of the 33rd international conference on machine learning (pp. 1995–2003).
Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2016). Prioritized experience replay. In Proceedings of the 4th international conference on learning representations.
Hessel, M., Modayil, J., van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M. G., & Silver, D. (2018). Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the thirty-second AAAI conference on artificial intelligence (pp. 3215–3222).
Sosic, A., KhudaBukhsh, W. R., Zoubir, A. M., & Koeppl, H. (2017). Inverse reinforcement learning in swarm systems. In Proceedings of the 16th conference on autonomous agents and multiagent systems (pp. 1413–1421).
Oh, K.-K., Park, M.-C., & Ahn, H.-S. (2015). A survey of multi-agent formation control. Automatica, 53, 424–440.
Article MathSciNet Google Scholar
Wang, W., Yang, T., Liu, Y., Hao, J., Hao, X., Hu, Y., Chen, Y., Fan, C., & Gao, Y. (2020). From few to more: Large-scale dynamic multiagent curriculum learning. In Proceedings of the AAAI conference on artificial intelligence (vol. 34, pp. 7293–7300).
Jiang, J., & Lu, Z. (2018). Learning attentional communication for multi-agent cooperation. In Advances in neural information processing systems (pp. 7254–7264).
Subramanian, S. G., Taylor, M. E., Crowley, M., & Poupart, P. (2021). Decentralized mean field games. arXiv preprint arXiv:2112.09099
Wolpert, D. H., & Tumer, K. (2001). Optimal payoff functions for members of collectives. Advances in Complex Systems, 4(2–3), 265–280.
Article Google Scholar
Foerster, J. N., de Witt, C. A. S., Farquhar, G., Torr, P. H. S., Boehmer, W., & Whiteson, S. (2018). Multi-agent common knowledge reinforcement learning. arXiv preprint arXiv:1810.11702
Du, Y., Han, L., Fang, M., Liu, J., Dai, T., & Tao, D. (2019). LIIR: Learning individual intrinsic reward in multi-agent reinforcement learning. In Advances in neural information processing systems (pp. 4405–4416).
Son, K., Kim, D., Kang, W. J., Hostallero, D., & Yi, Y. (2019). QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In Proceedings of the 36th International Conference on Machine Learning (pp. 5887–5896).
Rashid, T., Farquhar, G., Peng, B., & Whiteson, S. (2020). Weighted QMIX: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. In Advances in neural information processing systems.
Yang, Y., Hao, J., Liao, B., Shao, K., Chen, G., Liu, W., & Tang, H. (2020). Qatten: A general framework for cooperative multiagent reinforcement learning. arXiv preprint arXiv:2002.03939
Wang, J., Ren, Z., Liu, T., Yu, Y., & Zhang, C. (2021) QPLEX: Duplex dueling multi-agent Q-learning. In Proceedings of the International Conference on Learning Representations.
Panait, L., Luke, S., & Wiegand, R. P. (2006). Biasing coevolutionary search for optimal multiagent behaviors. IEEE Transactions on Evolutionary Computation, 10(6), 629–645.
Article Google Scholar
Mahajan, A., Rashid, T., Samvelyan, M., & Whiteson, S. (2019). MAVEN: Multi-agent variational exploration. In Advances in neural information processing systems (pp. 7611–7622).
Wang, T., Wang, J., Wu, Y., & Zhang, C. (2020). Influence-based multi-agent exploration. In Proceedings of the 8th international conference on learning representations.
Yoo, B., Ningombam, D. D., Yi, S., Kim, H. W., Chung, E., Han, R., & Song, H. J. (2022). A novel and efficient influence-seeking exploration in deep multiagent reinforcement learning. IEEE Access, 10, 47741–47753.
Article Google Scholar
Pieroth, F. R., Fitch, K., & Belzner, L. (2022). Detecting influence structures in multi-agent reinforcement learning systems.
Hu, S., Zhu, F., Chang, X., & Liang, X. (2021). UPDeT: Universal multi-agent RL via policy decoupling with transformers. In proceedings of the 9th international conference on learning representations.
Zhou, T., Zhang, F., Shao, K., Li, K., Huang, W., Luo, J., Wang, W., Yang, Y., Mao, H., Wang, B., & Li, D. (2021). Cooperative multi-agent transfer learning with level-adaptive credit assignment. arXiv preprint arXiv:2106.00517
Chai, J., Li, W., Zhu, Y., Zhao, D., Ma, Z., Sun, K., & Ding, J. (2021). UNMAS: Multiagent reinforcement learning for unshaped cooperative scenarios. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2021.3105869
Article Google Scholar
Wang, T., Gupta, T., Mahajan, A., Peng, B., Whiteson, S., & Zhang, C. (2021). RODE: Learning roles to decompose multi-agent tasks. In Proceedings of the international conference on learning representations.
Cao, J., Yuan, L., Wang, J., Zhang, S., Zhang, C., Yu, Y., & Zhan, D. (2021). LINDA: Multi-agent local information decomposition for awareness of teammates. arXiv preprint arXiv:2109.12508
Hao, X., Wang, W., Mao, H., Yang, Y., Li, D., Zheng, Y., Wang, Z., & Hao, J. (2022). API: Boosting multi-agent reinforcement learning via agent-permutation-invariant networks. arXiv preprint arXiv:2203.05285
Palmer, G., Tuyls, K., Bloembergen, D., & Savani, R. (2018). Lenient multi-agent deep reinforcement learning. In Proceedings of the 17th international conference on autonomous agents and MultiAgent systems (pp. 443–451).
Gupta, J. K., Egorov, M., & Kochenderfer, M. (2017). Cooperative multi-agent control using deep reinforcement learning. In Proceedings of the 16th international conference on autonomous agents and multiagent systems, workshops (pp. 66–83).
David, H., Andrew, D., & Quoc, V. (2016). Hypernetworks. arXiv preprint arXiv 1609.
Tampuu, A., Matiisen, T., Kodelja, D., Kuzovkin, I., Korjus, K., Aru, J., Aru, J., & Vicente, R. (2017). Multiagent cooperation and competition with deep reinforcement learning. PloS one, 12(4), 0172395.
Article Google Scholar
Moritz, P., Nishihara, R., Wang, S., Tumanov, A., Liaw, R., Liang, E., Elibol, M., Yang, Z., Paul, W., & Jordan, M.I. (2018). Ray: A distributed framework for emerging AI applications. In Proceedings of the 13th USENIX symposium on operating systems design and implementation (OSDI 18) (pp. 561–577).
Wu, Y., Mansimov, E., Grosse, R. B., Liao, S., & Ba, J. (2017). Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. In Advances in Neural Information Processing Systems (vol. 30, pp. 5279–5288).
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T. P., Harley, T., Silver, D., & Kavukcuoglu, K. (2016) Asynchronous methods for deep reinforcement learning. In Proceedings of the 33nd international conference on machine learning (pp. 1928–1937).
Yu, C., Velu, A., Vinitsky, E., Wang, Y., Bayen, A. M., & Wu, Y. (2021). The surprising effectiveness of MAPPO in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955
Sarafian, E., Keynan, S., & Kraus, S. (2021). Recomposing the reinforcement learning building blocks with hypernetworks. In International Conference on Machine Learning (pp. 9301–9312). PMLR.
Fu, J., Kumar, A., Soh, M., & Levine, S. (2019). Diagnosing bottlenecks in deep Q-learning algorithms. In International Conference on Machine Learning (pp. 2021–2030).
Andrychowicz, M., Raichuk, A., Stanczyk, P., Orsini, M., Girgin, S., Marinier, R., Hussenot, L., Geist, M., Pietquin, O., Michalski, M., Gelly, S., & Bachem, O. (2021). What matters for on-policy deep actor-critic methods? A large-scale study. In Proceedings of the 9th international conference on learning representations.
NetEase: Justice Online. (2018). https://n.163.com/index.html, https://www.mmobomb.com/news/netease-looks-to-bring-justice-online-west-this-year
Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V., Ward, T., Doron, Y., Firoiu, V., Harley, T., & Dunning, I. (2018). IMPALA: Scalable distributed Deep-RL with importance weighted actor-learner architectures. In International conference on machine learning (pp. 1407–1416).

Download references

Acknowledgements

This work is supported by the Major Program of the National Natural Science Foundation of China(Grant No. 92370132) and the National Key R&D Program of China (Grant No. 2022ZD0116402). Part of this work has taken place in the Intelligent Robot Learning (IRL) Lab at the University of Alberta, which is supported in part by research grants from the Alberta Machine Intelligence Institute (Amii); a Canada CIFAR AI Chair, Amii; Compute Canada; Huawei; Mitacs; and NSERC.

Author information

Tianpei Yang and Weixun Wang have contributed equally to this work.

Authors and Affiliations

College of Intelligence and Computing, Tianjin University, Tianjin, China
Tianpei Yang, Jianye Hao & Xiaotian Hao
Department of Computing Science, University of Alberta, Edmonton, Canada
Tianpei Yang & Matthew E. Taylor
Alberta Machine Intelligence Institute (Amii), Edmonton, Canada
Tianpei Yang & Matthew E. Taylor
Fuxi AI Lab, NetEase, Hangzhou, China
Weixun Wang, Yujing Hu, Yingfeng Chen, Changjie Fan, Chunxu Ren & Ye Huang
Huawei, Shenzhen, China
Jianye Hao & Jiangcheng Zhu
Nanjing University, Nanjing, China
Yong Liu & Yang Gao

Authors

Tianpei Yang
View author publications
You can also search for this author inPubMed Google Scholar
Weixun Wang
View author publications
You can also search for this author inPubMed Google Scholar
Jianye Hao
View author publications
You can also search for this author inPubMed Google Scholar
Matthew E. Taylor
View author publications
You can also search for this author inPubMed Google Scholar
Yong Liu
View author publications
You can also search for this author inPubMed Google Scholar
Xiaotian Hao
View author publications
You can also search for this author inPubMed Google Scholar
Yujing Hu
View author publications
You can also search for this author inPubMed Google Scholar
Yingfeng Chen
View author publications
You can also search for this author inPubMed Google Scholar
Changjie Fan
View author publications
You can also search for this author inPubMed Google Scholar
Chunxu Ren
View author publications
You can also search for this author inPubMed Google Scholar
Ye Huang
View author publications
You can also search for this author inPubMed Google Scholar
Jiangcheng Zhu
View author publications
You can also search for this author inPubMed Google Scholar
Yang Gao
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

TY wrote the main manuscript text. WW and TY prepared all experimental results. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Tianpei Yang or Jianye Hao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: environmental settings

1.1 A.1 StarCraft II

State Description In StarCraft II, we follow the settings of previous works [15, 22]. The local observation of each agent is drawn within their field of view, which encompasses the circular area of the map surrounding units and has a radius equal to the sight range. Each agent receives as input a vector consisting of the following features for all units in its field of view (both allied and enemy): distance, relative x, relative y, and unit type. More details can be found at https://github.com/wwxFromTju/ASN_cloud or https://github.com/oxwhirl/smac.

1.2 A.2 Neural MMO

State Description In a 10 × 10 tile (where each tile can be set as different kinds, e.g., rocks, grass), there are two teams of agents (green and red), each of which has 3 agents. At the beginning of each episode, each agent appears on any of the 10 × 10 tiles. The observation of an agent is in the form of a 43-dimensional vector, in which the first 8 dimensions are: time to live, HP, remaining foods (set 0), remaining water (set 0), current position (x and y), the amount of damage suffered, frozen state (1 or 0); the rest of 35 dimensions are divided equally to describe the other 5 agents’ information. The first 14 dimensions describe the information of 2 teammates, followed by the description of 3 opponents’ information. Each observed agent’s information includes the relative position (x and y), whether it is a teammate (1 or 0), HP, remaining foods, remaining water, and the frozen state.

Each agent chooses an action from a set of 14 discrete actions: stop, move left, right, up or down, and three different attacks (“Melee”, the attack distance is 2, the damage is 5; “Range”, the attack distance is 4, the damage is 2; “Mage”, the attack distance is 10, the damage is 1) against one of the three opponents.

Each agent gets a penalty of $-0.1$ if the attack fails. They get a $-0.01$ reward for each tick and a $-10$ penalty for being killed. The game ends when a group of agents dies or the time exceeds a fixed period, and agents belonging to the same group receive the same reward, which is the difference of the total number of HPs between itself and its opposite side.

Appendix B: Network structure and parameter settings

1.1 B.1 StarCraft II

Network Structure The details of different network structures for StarCraft II are shown in Fig. 19. The vanilla network (Fig. 19a) of each agent i contains two fully-connected hidden layers with 64 units and one GRU layer with 64 units, taking $o_t^i$ as input. The output layer is a fully-connected layer that outputs the Q-values of each action. The attention network (Fig. 19b) of each agent i contains two isolated fully-connected layers with 64 units, taking $o_t^i$ as input and computing the standard attention value for each dimension of the input. The following hidden layer is a GRU with 64 units. The output contains the Q-values of each action. The entity-attention network (Fig. 19c) is similar to that in Fig. 19b, except that the attention weight is calculated on each $o_t^{i,j}$. The dueling network (Fig. 19d) is the same as vanilla except for the output layer that outputs the advantages of each action and also the state value. Our homogeneous ASN (Fig. 19e) of each agent i contains two sub-modules, one is the $O2A^i$ which contains two fully-connected layers with 32 units, taking $o_t^i$ as input, followed by a GRU layer with 32 units; the other is a parameter-sharing sub-module which contains two fully-connected layers with 32 units, taking each $o_t^{i,j}$ as input, following with a GRU layer with 32 units; the output layer outputs the Q-values of each action.

Parameter Settings Here we provide the hyperparameters for StarCraft II as shown in Table 4, more details can be found at https://github.com/wwxFromTju/ASN_cloud.

Table 4 Hyperparameter settings for StarCraft II

Full size table

1.2 Neural MMO

Network structure The details of vanilla, attention, and entity-attention networks for Neural MMO are shown in Fig. 20a–c which contains an actor network, and a critic network. All actors are similar to those for StarCraft II in Fig. 19, except that the GRU layer is excluded and the output is the logic probability of choosing each action. All critics are the same as shown in Fig. 20a. Since in Neural MMO, each agent has multiple actions that have a direct influence on each other agent, i.e., three kinds of attack actions, we test two kinds of ASN variants: one (Fig. 20d) is the Multi-action ASN we mentioned in the previous section that shares the first layer parameters among multiple actions; the other (Fig. 20e) is the basic homogeneous ASN that does not share the first layer parameters among multiple actions.

Parameter settings Here we provide the hyperparameters for Neural MMO shown in Table 5.

Table 5 Parameters of all algorithms

Full size table

Appendix C: Environmental results

The following results present the performance of QMIX-ASN and vanilla QMIX under different StarCraft II maps by adding the manual rule (forbids the agent to choose the invalid actions) (Fig. 21).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, T., Wang, W., Hao, J. et al. ASN: action semantics network for multiagent reinforcement learning. Auton Agent Multi-Agent Syst 37, 45 (2023). https://doi.org/10.1007/s10458-023-09628-3

Download citation

Accepted: 18 October 2023
Published: 10 November 2023
DOI: https://doi.org/10.1007/s10458-023-09628-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ASN: action semantics network for multiagent reinforcement learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MAIT: Multi-agent Local Observation Interaction to Improve the Decision-Making Ability

A Real-Time Multiagent Strategy Learning Environment and Experimental Framework

GHQ: grouped hybrid Q-learning for cooperative heterogeneous multi-agent reinforcement learning

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's Note

Appendices

Appendix A: environmental settings

1.1 A.1 StarCraft II

1.2 A.2 Neural MMO

Appendix B: Network structure and parameter settings

1.1 B.1 StarCraft II

1.2 Neural MMO

Appendix C: Environmental results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now