Skip to main content

Context-Aware Multi-agent Coordination with Loose Couplings and Repeated Interaction

  • Conference paper
  • First Online:
Book cover Distributed Artificial Intelligence (DAI 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12547))

Included in the following conference series:

  • 471 Accesses

Abstract

Coordination between multiple agents can be found in many areas of industry or society. Despite a few recent advances, this problem remains challenging due to its combinatorial nature. First, with an exponentially scaling action set, it is challenging to search effectively and find the right balance between exploration and exploitation. Second, performing maximization over all agents’ actions jointly is computationally intractable. To tackle these challenges, we exploit the side information and loose couplings, i.e., conditional independence between agents, which is often available in coordination tasks. We make several key contributions in this paper. First, the repeated multi-agent coordination problem is formulated as a multi-agent contextual bandit problem to balance the exploration-exploitation trade-off. Second, a novel algorithm called MACUCB is proposed, which uses a modified zooming technique to improve the context exploitation process and a variable elimination technique to efficiently perform the maximization through exploiting the loose couplings. Third, two enhancements to MACUCB are proposed with improved theoretical guarantees. Fourth, we derive theoretical bounds on the regrets of each of the algorithms. Finally, to demonstrate the effectiveness of our methods, we apply MACUCB and its variants to a realistic cloudlet resource rental problem. In this problem, cloudlets must coordinate their computation resources in order to optimize the quality of service at a low cost. We evaluate our approaches on a real-world dataset and the results show that MACUCB and its variants significantly outperform other benchmarks.

Supported by the Alibaba-NTU Singapore Joint Research Institute, Nanyang Technological University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Audibert, J.Y., Bubeck, S., Lugosi, G.: Minimax policies for combinatorial prediction games. In: Proceedings of the 24th Annual Conference on Learning Theory, pp. 107–132 (2011)

    Google Scholar 

  2. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)

    Article  Google Scholar 

  3. Bargiacchi, E., Verstraeten, T., Roijers, D., Nowé, A., Hasselt, H.: Learning to coordinate with coordination graphs in repeated single-stage multi-agent decision problems. In: International Conference on Machine Learning, pp. 491–499 (2018)

    Google Scholar 

  4. Bubeck, S., Cesa-Bianchi, N., et al.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends® Mach. Learn. 5(1), 1–122 (2012)

    Google Scholar 

  5. Cesa-Bianchi, N., Lugosi, G.: Combinatorial bandits. J. Comput. Syst. Sci. 78(5), 1404–1422 (2012)

    Article  MathSciNet  Google Scholar 

  6. Chen, L., Xu, J.: Budget-constrained edge service provisioning with demand estimation via bandit learning. IEEE J. Sel. Areas Commun. 37(10), 2364–2376 (2019)

    Article  Google Scholar 

  7. Chen, W., Wang, Y., Yuan, Y.: Combinatorial multi-armed bandit: general framework and applications. In: International Conference on Machine Learning, pp. 151–159 (2013)

    Google Scholar 

  8. De, Y.M., Vrancx, P., Nowé, A.: Learning multi-agent state space representations. In: Proceedings of 9th International Conference of Autonomous Agents and Multiagent Systems, pp. 715–722 (2010)

    Google Scholar 

  9. Gai, Y., Krishnamachari, B., Jain, R.: Combinatorial network optimization with unknown variables: multi-armed bandits with linear rewards and individual observations. IEEE/ACM Trans. Network. 20(5), 1466–1478 (2012)

    Article  Google Scholar 

  10. Guestrin, C., Koller, D., Parr, R.: Multiagent planning with factored MDPS. In: Advances in Neural Information Processing Systems, pp. 1523–1530 (2002)

    Google Scholar 

  11. Iosup, A., et al.: The grid workloads archive. Fut. Gener. Comput. Syst. 24(7), 672–686 (2008)

    Article  Google Scholar 

  12. Kok, J.R., Spaan, M.T., Vlassis, N., et al.: Multi-robot decision making using coordination graphs. In: Proceedings of the 11th International Conference on Advanced Robotics, ICAR, vol. 3, pp. 1124–1129 (2003)

    Google Scholar 

  13. Kok, J.R., Vlassis, N.: Collaborative multiagent reinforcement learning by payoff propagation. J. Mach. Learn. Res. 7, 1789–1828 (2006)

    MathSciNet  MATH  Google Scholar 

  14. Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th International Conference on World Wide Web, pp. 661–670 (2010)

    Google Scholar 

  15. Qin, L., Chen, S., Zhu, X.: Contextual combinatorial bandit and its application on diversified online recommendation. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 461–469. SIAM (2014)

    Google Scholar 

  16. Roijers, D.M., Whiteson, S., Oliehoek, F.A.: Computing convex coverage sets for faster multi-objective coordination. J. Artif. Intell. Res. 52, 399–443 (2015)

    Article  MathSciNet  Google Scholar 

  17. Rollón, E., Larrosa, J.: Bucket elimination for multiobjective optimization problems. J. Heurist. 12(4–5), 307–328 (2006)

    Article  Google Scholar 

  18. Scharpff, J., Roijers, D.M., Oliehoek, F.A., Spaan, M.T., de Weerdt, M.M.: Solving transition-independent multi-agent MDPS with sparse interactions. In: Thirtieth AAAI Conference on Artificial Intelligence, pp. 3174–3180 (2016)

    Google Scholar 

  19. Scharpff, J., Spaan, M.T., Volker, L., De Weerdt, M.M.: Planning under uncertainty for coordinating infrastructural maintenance. In: Twenty-Third International Conference on Automated Planning and Scheduling, pp. 169–170 (2013)

    Google Scholar 

  20. Slivkins, A.: Contextual bandits with similarity information. J. Mach. Learn. Res. 15(1), 2533–2568 (2014)

    MathSciNet  MATH  Google Scholar 

  21. Verstraeten, T., Bargiacchi, E., Libin, P.J., Helsen, J., Roijers, D.M., Nowé, A.: Thompson sampling for loosely-coupled multi-agent systems: An application to wind farm control. In: Adaptive and Learning Agents Workshop 2020, ALA 2020 (2020). https://ala2020.vub.ac.be

  22. Wiering, M.: Multi-agent reinforcement learning for traffic light control. In: Machine Learning: Proceedings of the Seventeenth International Conference (ICML 2000), pp. 1151–1158 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feifei Lin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lin, F., He, X., An, B. (2020). Context-Aware Multi-agent Coordination with Loose Couplings and Repeated Interaction. In: Taylor, M.E., Yu, Y., Elkind, E., Gao, Y. (eds) Distributed Artificial Intelligence. DAI 2020. Lecture Notes in Computer Science(), vol 12547. Springer, Cham. https://doi.org/10.1007/978-3-030-64096-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-64096-5_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-64095-8

  • Online ISBN: 978-3-030-64096-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics