Skip to main content
Log in

Multi-agent reinforcement learning based on local communication

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Aiming at the locality and uncertainty of observations in large-scale multi-agent application scenarios, the model of Decentralized Partially Observable Markov Decision Processes (DEC-POMDP) is considered, and a novel multi-agent reinforcement learning algorithm based on local communication is proposed. For a distributed learning environment, the elements of reinforcement learning are difficult to describe effectively in local observation situation, and the learning behaviour of each individual agent is influenced by its teammates. The local communication with consensus protocol is utilized to agree on the global observing environment, and thus that a part of strategies generated by repeating observations are eliminated, and the agent team gradually reach uniform opinion on the state of the event or object to be observed, they can thus approach a unique belief space regardless of whether each individual agent can perform a complete or partial observation. The simulation results show that the learning strategy space is reduced, and the learning speed is improved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Ramachandram, D., Taylor, G.W.: Deep multimodal learning: a survey on recent advances and trends. IEEE Signal Process. Mag. 34(6), 96–108 (2017)

    Article  Google Scholar 

  2. Girard, J., Emami, M.R.: Concurrent Markov decision processes for robot team learning. Eng. Appl. Artif. Intell. 39, 223–234 (2015)

    Article  Google Scholar 

  3. Shani, G., Pineau, J., Kaplow, R.: A survey of point-based POMDP solvers. Auton. Agent. Multi-Agent Syst. 27(1), 1–51 (2013)

    Article  Google Scholar 

  4. Pajarinen, J., Hottinen, A., Peltonen, J.: Optimizing spatial and temporal reuse in wireless networks by decentralized partially observable Markov decision processes. IEEE Trans. Mob. Comput. 13(4), 866–879 (2014)

    Article  Google Scholar 

  5. Kraemer, L., Banerjee, B.: Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing 190, 82–94 (2016)

    Article  Google Scholar 

  6. Kraemer, L., Banerjee, B.: Reinforcement learning of informed initial policies for decentralized planning. ACM Trans. Auton. Adapt. Syst. 9(4), 1–32 (2015)

    Article  Google Scholar 

  7. Sharma, R., Spaan, M.T.J.: Bayesian-game-based fuzzy reinforcement learning control for decentralized POMDPs. IEEE Trans. Comput. Intell. AI Games 4(4), 309–328 (2012)

    Article  Google Scholar 

  8. Vaisenberg, R., Della Motta, A., Mehrotra, S., et al.: Scheduling sensors for monitoring sentient spaces using an approximate POMDP policy. Pervasive Mobile Comput. 10, 83–103 (2014)

    Article  Google Scholar 

  9. Chandrasekaran, M., Doshi, P., Zeng, Y., et al.: Can bounded and self-interested agents be teammates? Application to planning in ad hoc teams. Auton. Agent. Multi-Agent Syst. 31(4), 821–860 (2017)

    Article  Google Scholar 

  10. Dutta, P.S., Jennings, N.R., Moreau, L.: Cooperative information sharing to improve distributed learning in multi-agent systems. J. Artif. Intell. Res. 24, 407–463 (2005)

    Article  Google Scholar 

  11. Fang, M., Groen, F.C.A., Li, H., et al.: Collaborative multi-agent reinforcement learning based on a novel coordination tree frame with dynamic partition. Eng. Appl. Artif. Intell. 27(1), 191–198 (2014)

    Article  Google Scholar 

  12. Cockburn, J., Collins, A.E., Frank, M.: A reinforcement learning mechanism responsible for the valuation of free choice. Neuron 83(3), 551–557 (2014)

    Article  Google Scholar 

  13. Mongillo, G., Shteingart, H., Loewenstein, Y.: The misbehavior of reinforcement learning. Proc. IEEE 102(4), 528–541 (2014)

    Article  Google Scholar 

  14. La, H.M., Sheng, W.: Distributed sensor fusion for scalar field mapping using mobile sensor networks. IEEE Trans. Cybern. 43(2), 766–778 (2013)

    Article  Google Scholar 

  15. Liu, J., Anderson, B.D.O., Cao, M., et al.: Analysis of accelerated gossip algorithms. Automatica 49(4), 873–883 (2013)

    Article  MathSciNet  Google Scholar 

  16. Wang, Y.H., Li, T.H.S., Lin, C.J.: Backward Q-learning: the combination of Sarsa algorithm and Q-learning. Eng. Appl. Artif. Intell. 26(9), 2184–2193 (2013)

    Article  Google Scholar 

  17. Mazzarella, L., Sarlette, A., Ticozzi, F.: Consensus for quantum networks: symmetry from gossip interactions. IEEE Trans. Autom. Control 60(1), 158–172 (2015)

    Article  MathSciNet  Google Scholar 

  18. Liu, D., Wei, Q.: Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans. Neural Netw. Learning Syst. 25(3), 621–634 (2014)

    Article  Google Scholar 

  19. Pynadath, D.V., Tambe, M.: The communicative multi-agent team decision problem: analyzing teamwork theories and models. J. Artif. Intell. Res. 16, 389–423 (2002)

    Article  Google Scholar 

  20. Cao, Y., Yu, W., Ren, W., et al.: An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans. Industr. Inf. 9(1), 427–438 (2013)

    Article  Google Scholar 

  21. Wu, F., Zilberstein, S., Chen, X.: Online planning for multi-agent systems with bounded communication. Artif. Intell. 175(2), 487–511 (2011)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenxu Zhang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, W., Ma, L. & Li, X. Multi-agent reinforcement learning based on local communication. Cluster Comput 22 (Suppl 6), 15357–15366 (2019). https://doi.org/10.1007/s10586-018-2597-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-018-2597-x

Keywords

Navigation