Skip to main content

Byzantine Resilient Aggregation in Distributed Reinforcement Learning

  • Conference paper
  • First Online:
  • 480 Accesses

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 327))

Abstract

Recent distributed reinforcement learning techniques utilize networked agents to accelerate exploration and speed up learning. However, such techniques are not resilient in the presence of Byzantine agents which can disturb convergence. In this paper, we present a Byzantine resilient aggregation rule for distributed reinforcement learning with networked agents that incorporates the idea of optimizing the objective function in designing the aggregation rules. We evaluate our approach using multiple reinforcement learning environments for both value-based and policy-based methods with homogeneous and heterogeneous agents. The results show that cooperation using the proposed approach exhibits better learning performance than the non-cooperative case and is resilient in the presence of an arbitrary number of Byzantine agents.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    An episode is a sequence of states from the start state to a terminal state.

  2. 2.

    During learning, t increases while i remains the same; and during model aggregation, i increases while t changes from \(\infty \) to 0. To simplify, hereafter, we omit the subscripts of t for cooperation and the superscripts of i for learning.

  3. 3.

    Methods for updating these parameters are discussed in Sect. 3.

References

  1. Sayed, A.H., Tu, S.Y., Chen, J., Zhao, X., Towfic, Z.J.: Diffusion strategies for adaptation and learning over networks: an examination of distributed strategies and network behavior. IEEE Signal Process. Mag. 30(3), 155–171 (2013)

    Google Scholar 

  2. Zhang, K., Yang, Z., Liu, H., Zhan g, T., Basar, T.: Fully decentralized multi-agent reinforcement learning with networked agents. In: ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018, pp. 5867–5876 (2018)

    Google Scholar 

  3. Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: JMLR Workshop and Conference Proceedings of, ICML 2016, New York City, NY, USA, 19-24 June 2016, vol. 48, pp. 1928–1937. JMLR.org (2016)

    Google Scholar 

  4. Espeholt, L., et al.: IMPALA: scalable distributed Deep-RL with importance weighted actor-learner architectures. In: ICML 2018, Stockholm, Sweden, 10-15 July 2018

    Google Scholar 

  5. Blanchard, P., El Mhamdi, E.M., Guerraoui, R., Stainer, J.: Machine learning with adversaries: byzantine tolerant gradient descent. In: Annual Conference on Neural Information Processing Systems, pp. 118–128 (2017)

    Google Scholar 

  6. Li, J., Abbas, W., Koutsoukos, X.: Resilient distributed diffusion in networks with adversaries. IEEE Trans. Signal Inf. Process. over Netw. 6, 1–17 (2020)

    Article  MathSciNet  Google Scholar 

  7. Yin, D., Chen, Y., Kannan, R., Bartlett, P.: Byzantine-robust distributed learning: towards optimal statistical rates. In: ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10-15 July 2018, pp. 5636–5645 (2018)

    Google Scholar 

  8. Yang, Z., Bajwa, W.U.: ByRDiE: byzantine-resilient distributed coordinate descent for decentralized learning. IEEE Trans. Signal Info. Process. Over Netw. 5(4), 611–627 (2019)

    Google Scholar 

  9. Chen, Y., Su, L., Xu, J.: Distributed statistical machine learning in adversarial settings: byzantine gradient descent. In: Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 1, no. 2, pp. 44:1–44:25, December 2017

    Google Scholar 

  10. Li, J., Abbas, W., Shabbir, M., Koutsoukos, X.: Resilient distributed diffusion for multi-robot systems using centerpoint. In: Proceedings of Robotics: Science and Systems, Corvalis, Oregon, USA, July 2020

    Google Scholar 

  11. Li, J., Abbas, W., Koutsoukos, X.: Byzantine resilient distributed multi-task learning. In: Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems, 6-12 December 2020 (2020)

    Google Scholar 

  12. Lin, Y., Gade, S., Sandhu, R., Liu, J.: Toward resilient multi-agent actor-critic algorithms for distributed reinforcement learning. In: 2020 American Control Conference, ACC 2020, Denver, CO, USA, 1-3 July 2020, pp. 3953–3958. IEEE (2020)

    Google Scholar 

  13. Xie, Y., Mou, S., Sundaram, S.: Towards resilience for multi-agent QD-learning. CoRR, abs/2104.03153 (2021)

    Google Scholar 

  14. Macua, S.V., et.al.: Distributed policy evaluation under multiple behavior strategies. IEEE Trans. Automat. Contr. 60(5), 1260–1274 (2015)

    Google Scholar 

  15. Nair, A., et al.: Massively parallel methods for deep reinforcement learning. CoRR, abs/1507.04296 (2015)

    Google Scholar 

  16. Zhang, K., Yang, Z., Basar, T.: Multi-agent reinforcement learning: a selective overview of theories and algorithms. CoRR, abs/1911.10635 (2019)

    Google Scholar 

  17. Balcan, M.F., Weinberger, K.Q. (eds.) Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, 19-24 June 2016, vol. 48 of JMLR Workshop and Conference Proceedings. JMLR.org (2016)

    Google Scholar 

  18. Kar, S., Moura, J.M., Poor, H.V.: QD-learning: a collaborative distributed strategy for multi-agent reinforcement learning through Consensus + Innovations. IEEE Trans. Signal Process. 61(7), 1848–1862 (2013)

    Google Scholar 

  19. Macua, S.V., Tukiainen, A., Hernández, D.G.O., Baldazo, D., de Cote, E.M., Zazo, S.: Diff-dac: distributed actor-critic for multitask deep reinforcement learning. CoRR, abs/1710.10363 (2017)

    Google Scholar 

  20. Watkins, C.J., Dayan, P.: Q-learning. In: Machine Learning, pp. 279–292 (1992)

    Google Scholar 

  21. Mnih, V., et al.: Playing atari with deep reinforcement learning. CoRR, abs/1312.5602 (2013)

    Google Scholar 

  22. Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems 12, Denver, Colorado, USA, pp. 1057–1063 (1999)

    Google Scholar 

  23. Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. In: ICLR 2016, San Juan, Puerto Rico, 2-4 May 2016, Conference Track Proceedings (2016)

    Google Scholar 

  24. Brockman, G.: OpenAI Gym. arXiv preprint arXiv:1606.01540 (2016)

  25. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: ICLR (2016)

    Google Scholar 

  26. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR 2015, San Diego, CA, USA, 7-9 May 2015

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiani Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, J., Cai, F., Koutsoukos, X. (2022). Byzantine Resilient Aggregation in Distributed Reinforcement Learning. In: Matsui, K., Omatu, S., Yigitcanlar, T., González, S.R. (eds) Distributed Computing and Artificial Intelligence, Volume 1: 18th International Conference. DCAI 2021. Lecture Notes in Networks and Systems, vol 327. Springer, Cham. https://doi.org/10.1007/978-3-030-86261-9_6

Download citation

Publish with us

Policies and ethics