Abstract
Coordination between multiple agents can be found in many areas of industry or society. Despite a few recent advances, this problem remains challenging due to its combinatorial nature. First, with an exponentially scaling action set, it is challenging to search effectively and find the right balance between exploration and exploitation. Second, performing maximization over all agents’ actions jointly is computationally intractable. To tackle these challenges, we exploit the side information and loose couplings, i.e., conditional independence between agents, which is often available in coordination tasks. We make several key contributions in this paper. First, the repeated multi-agent coordination problem is formulated as a multi-agent contextual bandit problem to balance the exploration-exploitation trade-off. Second, a novel algorithm called MACUCB is proposed, which uses a modified zooming technique to improve the context exploitation process and a variable elimination technique to efficiently perform the maximization through exploiting the loose couplings. Third, two enhancements to MACUCB are proposed with improved theoretical guarantees. Fourth, we derive theoretical bounds on the regrets of each of the algorithms. Finally, to demonstrate the effectiveness of our methods, we apply MACUCB and its variants to a realistic cloudlet resource rental problem. In this problem, cloudlets must coordinate their computation resources in order to optimize the quality of service at a low cost. We evaluate our approaches on a real-world dataset and the results show that MACUCB and its variants significantly outperform other benchmarks.
Supported by the Alibaba-NTU Singapore Joint Research Institute, Nanyang Technological University.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Audibert, J.Y., Bubeck, S., Lugosi, G.: Minimax policies for combinatorial prediction games. In: Proceedings of the 24th Annual Conference on Learning Theory, pp. 107–132 (2011)
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)
Bargiacchi, E., Verstraeten, T., Roijers, D., Nowé, A., Hasselt, H.: Learning to coordinate with coordination graphs in repeated single-stage multi-agent decision problems. In: International Conference on Machine Learning, pp. 491–499 (2018)
Bubeck, S., Cesa-Bianchi, N., et al.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends® Mach. Learn. 5(1), 1–122 (2012)
Cesa-Bianchi, N., Lugosi, G.: Combinatorial bandits. J. Comput. Syst. Sci. 78(5), 1404–1422 (2012)
Chen, L., Xu, J.: Budget-constrained edge service provisioning with demand estimation via bandit learning. IEEE J. Sel. Areas Commun. 37(10), 2364–2376 (2019)
Chen, W., Wang, Y., Yuan, Y.: Combinatorial multi-armed bandit: general framework and applications. In: International Conference on Machine Learning, pp. 151–159 (2013)
De, Y.M., Vrancx, P., Nowé, A.: Learning multi-agent state space representations. In: Proceedings of 9th International Conference of Autonomous Agents and Multiagent Systems, pp. 715–722 (2010)
Gai, Y., Krishnamachari, B., Jain, R.: Combinatorial network optimization with unknown variables: multi-armed bandits with linear rewards and individual observations. IEEE/ACM Trans. Network. 20(5), 1466–1478 (2012)
Guestrin, C., Koller, D., Parr, R.: Multiagent planning with factored MDPS. In: Advances in Neural Information Processing Systems, pp. 1523–1530 (2002)
Iosup, A., et al.: The grid workloads archive. Fut. Gener. Comput. Syst. 24(7), 672–686 (2008)
Kok, J.R., Spaan, M.T., Vlassis, N., et al.: Multi-robot decision making using coordination graphs. In: Proceedings of the 11th International Conference on Advanced Robotics, ICAR, vol. 3, pp. 1124–1129 (2003)
Kok, J.R., Vlassis, N.: Collaborative multiagent reinforcement learning by payoff propagation. J. Mach. Learn. Res. 7, 1789–1828 (2006)
Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th International Conference on World Wide Web, pp. 661–670 (2010)
Qin, L., Chen, S., Zhu, X.: Contextual combinatorial bandit and its application on diversified online recommendation. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 461–469. SIAM (2014)
Roijers, D.M., Whiteson, S., Oliehoek, F.A.: Computing convex coverage sets for faster multi-objective coordination. J. Artif. Intell. Res. 52, 399–443 (2015)
Rollón, E., Larrosa, J.: Bucket elimination for multiobjective optimization problems. J. Heurist. 12(4–5), 307–328 (2006)
Scharpff, J., Roijers, D.M., Oliehoek, F.A., Spaan, M.T., de Weerdt, M.M.: Solving transition-independent multi-agent MDPS with sparse interactions. In: Thirtieth AAAI Conference on Artificial Intelligence, pp. 3174–3180 (2016)
Scharpff, J., Spaan, M.T., Volker, L., De Weerdt, M.M.: Planning under uncertainty for coordinating infrastructural maintenance. In: Twenty-Third International Conference on Automated Planning and Scheduling, pp. 169–170 (2013)
Slivkins, A.: Contextual bandits with similarity information. J. Mach. Learn. Res. 15(1), 2533–2568 (2014)
Verstraeten, T., Bargiacchi, E., Libin, P.J., Helsen, J., Roijers, D.M., Nowé, A.: Thompson sampling for loosely-coupled multi-agent systems: An application to wind farm control. In: Adaptive and Learning Agents Workshop 2020, ALA 2020 (2020). https://ala2020.vub.ac.be
Wiering, M.: Multi-agent reinforcement learning for traffic light control. In: Machine Learning: Proceedings of the Seventeenth International Conference (ICML 2000), pp. 1151–1158 (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Lin, F., He, X., An, B. (2020). Context-Aware Multi-agent Coordination with Loose Couplings and Repeated Interaction. In: Taylor, M.E., Yu, Y., Elkind, E., Gao, Y. (eds) Distributed Artificial Intelligence. DAI 2020. Lecture Notes in Computer Science(), vol 12547. Springer, Cham. https://doi.org/10.1007/978-3-030-64096-5_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-64096-5_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64095-8
Online ISBN: 978-3-030-64096-5
eBook Packages: Computer ScienceComputer Science (R0)