Abstract
Training distributed reinforcement learning models over a network of users (or agents) has great potential for many applications in distributed devices such as face recognition, health tracking, recommender systems, and smart homes. Cooperation among networked agents by sharing and aggregating their model parameters can benefit considerably the learning performance. However, agents may have different objectives and unplanned cooperation may lead to undesired outcomes. Therefore, it is important to ensure that cooperation in distributed learning is beneficial especially when agents receive information from unidentifiable peers. In this paper, we consider the problem of training distributed reinforcement learning models and we focus on distributed actor-critic algorithms because they are used successfully in many application domains. We propose an efficient adaptive cooperation strategy with linear time complexity to capture the similarities among agents and assign adaptive weights for aggregating the parameters from neighboring agents. Essentially, a larger weight is assigned to a neighboring agent that performs a similar task or shares a similar objective. The approach has significant advantages in situations when different agents are assigned different tasks and in the presence of adversarial agents. Empirical results are provided to validate the proposed approach and demonstrate its effectiveness in improving the learning performance in single-task, multi-task, and adversarial scenarios.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A fraction of \(\frac{1}{2}\) is introduced to the objective function for the simplification of the solution.
- 2.
If in a particular algorithm, the loss of the critic can be negative, we can include a softmax layer to \(L^w_{k,t}(\cdot )\) in the objective function.
- 3.
The simulation code is available in https://github.com/cbhowmic/resilient-adaptive-RL.
- 4.
Maximum value for each task is in bold fonts; ± corresponds to a single standard deviation over the network.
- 5.
The solid lines in the plots show the average return of the agents and the shaded area represents its range.
References
Anguita, D., Ghio, A., Oneto, L., Parra, X., Reyes-Ortiz, J.L.: A public domain dataset for human activity recognition using smartphones. In: 21st European Symposium on Artificial Neural Networks, ESANN, Bruges, Belgium, 24–26 April 2013 (2013)
Chen, Y., Qin, X., Wang, J., Yu, C., Gao, W.: FedHealth: a federated transfer learning framework for wearable healthcare. IEEE Intell. Syst. 35(4), 83–93 (2020)
Sayed, A.H., Tu, S.-Y., Chen, J., Zhao, X., Towfic, Z.J.: Diffusion strategies for adaptation and learning over networks: an examination of distributed strategies and network behavior. IEEE Signal Process. Mag. 30(3), 155–171 (2013)
Macua, S.V., Chen, J., Zazo, S., Sayed, A.H.: Distributed policy evaluation under multiple behavior strategies. IEEE Trans. Autom. Control 60(5), 1260–1274 (2014)
McMahan, H.B., Moore, E., Ramage, D., Agüera y Arcas, B.: Federated learning of deep networks using model averaging. CoRR, abs/1602.05629 (2016)
Afsar, M.M., Crump, T., Far, B.: Reinforcement learning based recommender systems: a survey. ACM Comput. Surv. 55(7), 1–38 (2022)
Prabuchandran, K.J., Hemanth Kumar, A.N., Bhatnagar, S.: Multi-agent reinforcement learning for traffic signal control. In: 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), pp. 2529–2534. IEEE (2014)
Liu, W., Zhuang, P., Liang, H., Peng, J., Huang, Z.: Distributed economic dispatch in microgrids based on cooperative reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 29(6), 2192–2203 (2018)
Zhang, K., Yang, Z., Liu, H., Zhang, T., Basar, T.: Fully decentralized multi-agent reinforcement learning with networked agents. In: International Conference on Machine Learning, pp. 5872–5881. PMLR (2018)
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Kar, S., Moura, J.M.F., Poor, H.V.: QD-learning: a collaborative distributed strategy for multi-agent reinforcement learning through Consensus + Innovations. IEEE Trans. Signal Process. 61(7), 1848–1862 (2013)
Macua, S.V., Tukiainen, A., Hernández, D.G.-O., Baldazo, D., de Cote, E.M., Zazo, S.: Diff-DAC: distributed actor-critic for multitask deep reinforcement learning. arXiv preprint arXiv:1710.10363 (2017)
Yan, D., et al.: Multi-task deep reinforcement learning for intelligent multi-zone residential HVAC control. Electr. Power Syst. Res. 192, 106959 (2021)
Zhang, Q., et al.: Multi-task fusion via reinforcement learning for long-term user satisfaction in recommender systems. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 4510–4520 (2022)
Nassif, R., Vlaski, S., Richard, C., Chen, J., Sayed, A.H.: Multitask learning over graphs: an approach for distributed, streaming machine learning. IEEE Signal Process. Mag. 37(3), 14–25 (2020)
Konstantinov, N., Lampert, C.: Robust learning from untrusted sources. In: International Conference on Machine Learning, pp. 3488–3498. PMLR (2019)
Lin, Y., Gade, S., Sandhu, R., Liu, J.: Toward resilient multi-agent actor-critic algorithms for distributed reinforcement learning. In: 2020 American Control Conference (ACC), pp. 3953–3958. IEEE (2020)
Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural network. In: International Conference on Machine Learning, pp. 1613–1622. PMLR (2015)
Shui, C., Abbasi, M., Robitaille, L.-É., Wang, B., Gagné, C.: A principled approach for learning task similarity in multitask learning. In: IJCAI (2019)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870. PMLR (2018)
Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning, pp. 1587–1596. PMLR (2018)
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Wu, Y.F., Zhang, W., Xu, P., Gu, Q.: A finite-time analysis of two time-scale actor-critic methods. In: Advances in Neural Information Processing Systems, vol. 33, pp. 17617–17628 (2020)
Todorov, E., Erez, T., Tassa, Y.: MuJoCo: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033. IEEE (2012)
Brockman, G., et al.: OpenAI Gym.arXiv preprint arXiv:1606.01540 (2016)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bhowmick, C., Li, J., Koutsoukos, X. (2023). Adaptive Learning from Peers for Distributed Actor-Critic Algorithms. In: Ossowski, S., Sitek, P., Analide, C., Marreiros, G., Chamoso, P., Rodríguez, S. (eds) Distributed Computing and Artificial Intelligence, 20th International Conference. DCAI 2023. Lecture Notes in Networks and Systems, vol 740. Springer, Cham. https://doi.org/10.1007/978-3-031-38333-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-38333-5_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38332-8
Online ISBN: 978-3-031-38333-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)