Context-Aware Multi-agent Coordination with Loose Couplings and Repeated Interaction

Lin, Feifei; He, Xu; An, Bo

doi:10.1007/978-3-030-64096-5_8

Feifei Lin¹²,
Xu He¹² &
Bo An¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12547))

Included in the following conference series:

International Conference on Distributed Artificial Intelligence

471 Accesses

Abstract

Coordination between multiple agents can be found in many areas of industry or society. Despite a few recent advances, this problem remains challenging due to its combinatorial nature. First, with an exponentially scaling action set, it is challenging to search effectively and find the right balance between exploration and exploitation. Second, performing maximization over all agents’ actions jointly is computationally intractable. To tackle these challenges, we exploit the side information and loose couplings, i.e., conditional independence between agents, which is often available in coordination tasks. We make several key contributions in this paper. First, the repeated multi-agent coordination problem is formulated as a multi-agent contextual bandit problem to balance the exploration-exploitation trade-off. Second, a novel algorithm called MACUCB is proposed, which uses a modified zooming technique to improve the context exploitation process and a variable elimination technique to efficiently perform the maximization through exploiting the loose couplings. Third, two enhancements to MACUCB are proposed with improved theoretical guarantees. Fourth, we derive theoretical bounds on the regrets of each of the algorithms. Finally, to demonstrate the effectiveness of our methods, we apply MACUCB and its variants to a realistic cloudlet resource rental problem. In this problem, cloudlets must coordinate their computation resources in order to optimize the quality of service at a low cost. We evaluate our approaches on a real-world dataset and the results show that MACUCB and its variants significantly outperform other benchmarks.

Supported by the Alibaba-NTU Singapore Joint Research Institute, Nanyang Technological University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Audibert, J.Y., Bubeck, S., Lugosi, G.: Minimax policies for combinatorial prediction games. In: Proceedings of the 24th Annual Conference on Learning Theory, pp. 107–132 (2011)
Google Scholar
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)
Article Google Scholar
Bargiacchi, E., Verstraeten, T., Roijers, D., Nowé, A., Hasselt, H.: Learning to coordinate with coordination graphs in repeated single-stage multi-agent decision problems. In: International Conference on Machine Learning, pp. 491–499 (2018)
Google Scholar
Bubeck, S., Cesa-Bianchi, N., et al.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends® Mach. Learn. 5(1), 1–122 (2012)
Google Scholar
Cesa-Bianchi, N., Lugosi, G.: Combinatorial bandits. J. Comput. Syst. Sci. 78(5), 1404–1422 (2012)
Article MathSciNet Google Scholar
Chen, L., Xu, J.: Budget-constrained edge service provisioning with demand estimation via bandit learning. IEEE J. Sel. Areas Commun. 37(10), 2364–2376 (2019)
Article Google Scholar
Chen, W., Wang, Y., Yuan, Y.: Combinatorial multi-armed bandit: general framework and applications. In: International Conference on Machine Learning, pp. 151–159 (2013)
Google Scholar
De, Y.M., Vrancx, P., Nowé, A.: Learning multi-agent state space representations. In: Proceedings of 9th International Conference of Autonomous Agents and Multiagent Systems, pp. 715–722 (2010)
Google Scholar
Gai, Y., Krishnamachari, B., Jain, R.: Combinatorial network optimization with unknown variables: multi-armed bandits with linear rewards and individual observations. IEEE/ACM Trans. Network. 20(5), 1466–1478 (2012)
Article Google Scholar
Guestrin, C., Koller, D., Parr, R.: Multiagent planning with factored MDPS. In: Advances in Neural Information Processing Systems, pp. 1523–1530 (2002)
Google Scholar
Iosup, A., et al.: The grid workloads archive. Fut. Gener. Comput. Syst. 24(7), 672–686 (2008)
Article Google Scholar
Kok, J.R., Spaan, M.T., Vlassis, N., et al.: Multi-robot decision making using coordination graphs. In: Proceedings of the 11th International Conference on Advanced Robotics, ICAR, vol. 3, pp. 1124–1129 (2003)
Google Scholar
Kok, J.R., Vlassis, N.: Collaborative multiagent reinforcement learning by payoff propagation. J. Mach. Learn. Res. 7, 1789–1828 (2006)
MathSciNet MATH Google Scholar
Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th International Conference on World Wide Web, pp. 661–670 (2010)
Google Scholar
Qin, L., Chen, S., Zhu, X.: Contextual combinatorial bandit and its application on diversified online recommendation. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 461–469. SIAM (2014)
Google Scholar
Roijers, D.M., Whiteson, S., Oliehoek, F.A.: Computing convex coverage sets for faster multi-objective coordination. J. Artif. Intell. Res. 52, 399–443 (2015)
Article MathSciNet Google Scholar
Rollón, E., Larrosa, J.: Bucket elimination for multiobjective optimization problems. J. Heurist. 12(4–5), 307–328 (2006)
Article Google Scholar
Scharpff, J., Roijers, D.M., Oliehoek, F.A., Spaan, M.T., de Weerdt, M.M.: Solving transition-independent multi-agent MDPS with sparse interactions. In: Thirtieth AAAI Conference on Artificial Intelligence, pp. 3174–3180 (2016)
Google Scholar
Scharpff, J., Spaan, M.T., Volker, L., De Weerdt, M.M.: Planning under uncertainty for coordinating infrastructural maintenance. In: Twenty-Third International Conference on Automated Planning and Scheduling, pp. 169–170 (2013)
Google Scholar
Slivkins, A.: Contextual bandits with similarity information. J. Mach. Learn. Res. 15(1), 2533–2568 (2014)
MathSciNet MATH Google Scholar
Verstraeten, T., Bargiacchi, E., Libin, P.J., Helsen, J., Roijers, D.M., Nowé, A.: Thompson sampling for loosely-coupled multi-agent systems: An application to wind farm control. In: Adaptive and Learning Agents Workshop 2020, ALA 2020 (2020). https://ala2020.vub.ac.be
Wiering, M.: Multi-agent reinforcement learning for traffic light control. In: Machine Learning: Proceedings of the Seventeenth International Conference (ICML 2000), pp. 1151–1158 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Nanyang Technological University, Singapore, Singapore
Feifei Lin, Xu He & Bo An

Authors

Feifei Lin
View author publications
You can also search for this author in PubMed Google Scholar
Xu He
View author publications
You can also search for this author in PubMed Google Scholar
Bo An
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feifei Lin .

Editor information

Editors and Affiliations

University of Alberta, Edmonton, AB, Canada
Matthew E. Taylor
Nanjing University, Nanjing, China
Yang Yu
University of Oxford, Oxford, UK
Edith Elkind
Nanjing University, Nanjing, China
Yang Gao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, F., He, X., An, B. (2020). Context-Aware Multi-agent Coordination with Loose Couplings and Repeated Interaction. In: Taylor, M.E., Yu, Y., Elkind, E., Gao, Y. (eds) Distributed Artificial Intelligence. DAI 2020. Lecture Notes in Computer Science(), vol 12547. Springer, Cham. https://doi.org/10.1007/978-3-030-64096-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-64096-5_8
Published: 25 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64095-8
Online ISBN: 978-3-030-64096-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics