Adaptive Learning from Peers for Distributed Actor-Critic Algorithms

Bhowmick, Chandreyee; Li, Jiani; Koutsoukos, Xenofon

doi:10.1007/978-3-031-38333-5_6

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 740))

Included in the following conference series:

International Symposium on Distributed Computing and Artificial Intelligence

305 Accesses

Abstract

Training distributed reinforcement learning models over a network of users (or agents) has great potential for many applications in distributed devices such as face recognition, health tracking, recommender systems, and smart homes. Cooperation among networked agents by sharing and aggregating their model parameters can benefit considerably the learning performance. However, agents may have different objectives and unplanned cooperation may lead to undesired outcomes. Therefore, it is important to ensure that cooperation in distributed learning is beneficial especially when agents receive information from unidentifiable peers. In this paper, we consider the problem of training distributed reinforcement learning models and we focus on distributed actor-critic algorithms because they are used successfully in many application domains. We propose an efficient adaptive cooperation strategy with linear time complexity to capture the similarities among agents and assign adaptive weights for aggregating the parameters from neighboring agents. Essentially, a larger weight is assigned to a neighboring agent that performs a similar task or shares a similar objective. The approach has significant advantages in situations when different agents are assigned different tasks and in the presence of adversarial agents. Empirical results are provided to validate the proposed approach and demonstrate its effectiveness in improving the learning performance in single-task, multi-task, and adversarial scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A fraction of \(\frac{1}{2}\) is introduced to the objective function for the simplification of the solution.
2.
If in a particular algorithm, the loss of the critic can be negative, we can include a softmax layer to \(L^w_{k,t}(\cdot )\) in the objective function.
3.
The simulation code is available in https://github.com/cbhowmic/resilient-adaptive-RL.
4.
Maximum value for each task is in bold fonts; ± corresponds to a single standard deviation over the network.
5.
The solid lines in the plots show the average return of the agents and the shaded area represents its range.

References

Anguita, D., Ghio, A., Oneto, L., Parra, X., Reyes-Ortiz, J.L.: A public domain dataset for human activity recognition using smartphones. In: 21st European Symposium on Artificial Neural Networks, ESANN, Bruges, Belgium, 24–26 April 2013 (2013)
Google Scholar
Chen, Y., Qin, X., Wang, J., Yu, C., Gao, W.: FedHealth: a federated transfer learning framework for wearable healthcare. IEEE Intell. Syst. 35(4), 83–93 (2020)
Article Google Scholar
Sayed, A.H., Tu, S.-Y., Chen, J., Zhao, X., Towfic, Z.J.: Diffusion strategies for adaptation and learning over networks: an examination of distributed strategies and network behavior. IEEE Signal Process. Mag. 30(3), 155–171 (2013)
Article Google Scholar
Macua, S.V., Chen, J., Zazo, S., Sayed, A.H.: Distributed policy evaluation under multiple behavior strategies. IEEE Trans. Autom. Control 60(5), 1260–1274 (2014)
Article MathSciNet MATH Google Scholar
McMahan, H.B., Moore, E., Ramage, D., Agüera y Arcas, B.: Federated learning of deep networks using model averaging. CoRR, abs/1602.05629 (2016)
Google Scholar
Afsar, M.M., Crump, T., Far, B.: Reinforcement learning based recommender systems: a survey. ACM Comput. Surv. 55(7), 1–38 (2022)
Article Google Scholar
Prabuchandran, K.J., Hemanth Kumar, A.N., Bhatnagar, S.: Multi-agent reinforcement learning for traffic signal control. In: 17th International IEEE Conference on Intelligent Transportation Systems (ITSC), pp. 2529–2534. IEEE (2014)
Google Scholar
Liu, W., Zhuang, P., Liang, H., Peng, J., Huang, Z.: Distributed economic dispatch in microgrids based on cooperative reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 29(6), 2192–2203 (2018)
Article MathSciNet Google Scholar
Zhang, K., Yang, Z., Liu, H., Zhang, T., Basar, T.: Fully decentralized multi-agent reinforcement learning with networked agents. In: International Conference on Machine Learning, pp. 5872–5881. PMLR (2018)
Google Scholar
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Kar, S., Moura, J.M.F., Poor, H.V.: QD-learning: a collaborative distributed strategy for multi-agent reinforcement learning through Consensus + Innovations. IEEE Trans. Signal Process. 61(7), 1848–1862 (2013)
Article MathSciNet MATH Google Scholar
Macua, S.V., Tukiainen, A., Hernández, D.G.-O., Baldazo, D., de Cote, E.M., Zazo, S.: Diff-DAC: distributed actor-critic for multitask deep reinforcement learning. arXiv preprint arXiv:1710.10363 (2017)
Yan, D., et al.: Multi-task deep reinforcement learning for intelligent multi-zone residential HVAC control. Electr. Power Syst. Res. 192, 106959 (2021)
Article Google Scholar
Zhang, Q., et al.: Multi-task fusion via reinforcement learning for long-term user satisfaction in recommender systems. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 4510–4520 (2022)
Google Scholar
Nassif, R., Vlaski, S., Richard, C., Chen, J., Sayed, A.H.: Multitask learning over graphs: an approach for distributed, streaming machine learning. IEEE Signal Process. Mag. 37(3), 14–25 (2020)
Article Google Scholar
Konstantinov, N., Lampert, C.: Robust learning from untrusted sources. In: International Conference on Machine Learning, pp. 3488–3498. PMLR (2019)
Google Scholar
Lin, Y., Gade, S., Sandhu, R., Liu, J.: Toward resilient multi-agent actor-critic algorithms for distributed reinforcement learning. In: 2020 American Control Conference (ACC), pp. 3953–3958. IEEE (2020)
Google Scholar
Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural network. In: International Conference on Machine Learning, pp. 1613–1622. PMLR (2015)
Google Scholar
Shui, C., Abbasi, M., Robitaille, L.-É., Wang, B., Gagné, C.: A principled approach for learning task similarity in multitask learning. In: IJCAI (2019)
Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870. PMLR (2018)
Google Scholar
Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning, pp. 1587–1596. PMLR (2018)
Google Scholar
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Article MathSciNet MATH Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Wu, Y.F., Zhang, W., Xu, P., Gu, Q.: A finite-time analysis of two time-scale actor-critic methods. In: Advances in Neural Information Processing Systems, vol. 33, pp. 17617–17628 (2020)
Google Scholar
Todorov, E., Erez, T., Tassa, Y.: MuJoCo: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033. IEEE (2012)
Google Scholar
Brockman, G., et al.: OpenAI Gym.arXiv preprint arXiv:1606.01540 (2016)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)

Download references

Author information

Authors and Affiliations

Institute of Software Integrated Systems, Vanderbilt University, Nashville, TN, 37209, USA
Chandreyee Bhowmick, Jiani Li & Xenofon Koutsoukos

Authors

Chandreyee Bhowmick
View author publications
You can also search for this author in PubMed Google Scholar
Jiani Li
View author publications
You can also search for this author in PubMed Google Scholar
Xenofon Koutsoukos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chandreyee Bhowmick .

Editor information

Editors and Affiliations

ETSII / CETINIA, Universidad Rey Juan Carlos, Madrid, Madrid, Spain
Sascha Ossowski
Politechnika Śląska Kielce, Kielce University of Technology, Santa Cruz, Poland
Pawel Sitek
Department of Informatics, Engineering School, University of Minho, Braga, Portugal
Cesar Analide
Departamento de Engenharia Informática, Polytechnic Institute of Porto, Porto, Portugal
Goreti Marreiros
BISITE, University of Salamanca, Salamanca, Salamanca, Spain
Pablo Chamoso
BISITE, University of Salamanca, Salamanca, Salamanca, Spain
Sara Rodríguez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bhowmick, C., Li, J., Koutsoukos, X. (2023). Adaptive Learning from Peers for Distributed Actor-Critic Algorithms. In: Ossowski, S., Sitek, P., Analide, C., Marreiros, G., Chamoso, P., Rodríguez, S. (eds) Distributed Computing and Artificial Intelligence, 20th International Conference. DCAI 2023. Lecture Notes in Networks and Systems, vol 740. Springer, Cham. https://doi.org/10.1007/978-3-031-38333-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-38333-5_6
Published: 21 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38332-8
Online ISBN: 978-3-031-38333-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Adaptive Learning from Peers for Distributed Actor-Critic Algorithms