Skip to main content

ToFi: An Algorithm to Defend Against Byzantine Attacks in Federated Learning

  • Conference paper
  • First Online:
  • 1236 Accesses

Abstract

In distributed gradient descent based machine learning model training, workers periodically upload locally computed gradients or weights to the parameter server (PS). Byzantine attacks take place when some workers upload wrong gradients or weights, i.e., the information received by the PS is not always the true values computed by workers. Approaches such as score-based, median-based, and distance-based defense algorithms were proposed previously, but all of them made the asumptions: (1) the dataset on each worker is independent and identically distributed (i.i.d.), and (2) the majority of all participating workers are honest. These assumptions are not realistic in federated learning where each worker may keep its non-i.i.d. private dataset and malicious workers may take over the majority in some iterations. In this paper, we propose a novel reference dataset based algorithm along with a practical Two-Filter algorithm (ToFi) to defend against Byzantine attacks in federated learning. Our experiments highlight the effectiveness of our algorithm compared with previous algorithms in different settings.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Alistarh, D., Allen-Zhu, Z., Li, J.: Byzantine stochastic gradient descent. CoRR abs/1803.08917 (2018). http://arxiv.org/abs/1803.08917

  2. Blanchard, P., El Mhamdi, E.M., Guerraoui, R., Stainer, J.: Machine learning with adversaries: byzantine tolerant gradient descent. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 119–129. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/6617-machine-learning-with-adversaries-byzantine-tolerant-gradient-descent.pdf

  3. Chen, Y., Su, L., Xu, J.: Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. Proc. ACM Meas. Anal. Comput. Syst. 1(2), 1–25 (2017). https://doi.org/10.1145/3154503

  4. Damaskinos, G., El Mhamdi, E.M., Guerraoui, R., Patra, R., Taziki, M.: Asynchronous Byzantine machine learning (the case of SGD). In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 1145–1154. PMLR, Stockholmsmässan, Stockholm Sweden (10–15 Jul 2018). http://proceedings.mlr.press/v80/damaskinos18a.html

  5. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  6. Konecný, J., McMahan, H., Yu, F., Richtárik, P., Suresh, A., Bacon, D.: Federated learning: strategies for improving communication efficiency. CoRR abs/1610.05492 (2016)

    Google Scholar 

  7. Konstantinov, N., Lampert, C.: Robust learning from untrusted sources. CoRR abs/1901.10310 (2019). http://arxiv.org/abs/1901.10310

  8. Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report, Google (2009)

    Google Scholar 

  9. Mao, Y., Hong, W., Wang, H., Li, Q., Zhong, S.: Privacy-preserving computation offloading for parallel deep neural networks training. IEEE Trans. Parallel Distrib. Syst. 32(7), 1777–1788 (2021). https://doi.org/10.1109/TPDS.2020.3040734

    Article  Google Scholar 

  10. Mao, Y., Yi, S., Li, Q., Feng, J., Xu, F., Zhong, S.: Learning from differentially private neural activations with edge computing. In: 2018 IEEE/ACM Symposium on Edge Computing (SEC), pp. 90–102 (2018). https://doi.org/10.1109/SEC.2018.00014

  11. McMahan, H.B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS) (2017). http://arxiv.org/abs/1602.05629

  12. Paudice, A., Muñoz-González, L., Lupu, E.C.: Label sanitization against label flipping poisoning attacks. In: Alzate, C., et al. (eds.) ECML PKDD 2018 Workshops, pp. 5–15. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-13453-2_1

    Chapter  Google Scholar 

  13. Tao, Z., Li, Q.: eSGD: Communication efficient distributed deep learning on the edge. In: USENIX Workshop on Hot Topics in Edge Computing (HotEdge 18). USENIX Association, Boston, MA, July 2018

    Google Scholar 

  14. Tao, Z., et al.: A survey of virtual machine management in edge computing. Proc. IEEE 107(8), 1482–1499 (2019). https://doi.org/10.1109/JPROC.2019.2927919

    Article  Google Scholar 

  15. Wu, Y., He, K.: Group normalization. Int. J. Comput. Vis. 128(3), 742–755 (2020). https://doi.org/10.1007/s11263-019-01198-w

    Article  Google Scholar 

  16. Xia, Q., Tao, Z., Hao, Z., Li, Q.: FABA: an algorithm for fast aggregation against byzantine attacks in distributed neural networks. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pp. 4824–4830. International Joint Conferences on Artificial Intelligence Organization (July 2019). https://doi.org/10.24963/ijcai.2019/670

  17. Xia, Q., Tao, Z., Li, Q.: Defenses against byzantine attacks in distributed deep neural networks. IEEE Transactions on Network Science and Engineering (2020). https://doi.org/10.1109/TNSE.2020.3035112

  18. Xia, Q., Tao, Z., Li, Q.: Privacy issues in edge computing. In: Chang, W., Wu, J. (eds.) Fog/Edge Computing For Security, Privacy, and Applications. AIS, vol. 83, pp. 147–169. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-57328-7_6

    Chapter  Google Scholar 

  19. Xia, Q., Ye, W., Tao, Z., Wu, J., Li, Q.: A survey of federated learning for edge computing: Research problems and solutions. High-Confidence Computing (2021). https://doi.org/10.1016/j.hcc.2021.100008

  20. Xie, C., Koyejo, O., Gupta, I.: Generalized byzantine-tolerant SGD. CoRR abs/1802.10116 (2018). http://arxiv.org/abs/1802.10116

  21. Xie, C., Koyejo, O., Gupta, I.: Zeno++: Robust fully asynchronous sgd (2020). https://openreview.net/forum?id=rygHe64FDS

  22. Xie, C., Koyejo, S., Gupta, I.: Zeno: Distributed stochastic gradient descent with suspicion-based fault-tolerance. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 6893–6901. PMLR, Long Beach, California, USA (09–15 Jun 2019). http://proceedings.mlr.press/v97/xie19b.html

  23. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791

    Article  Google Scholar 

  24. Yin, D., Chen, Y., Kannan, R., Bartlett, P.: Byzantine-robust distributed learning: towards optimal statistical rates. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 5650–5659. PMLR, Stockholmsmässan, Stockholm Sweden (10–15 Jul 2018). http://proceedings.mlr.press/v80/yin18a.html

  25. Zhang, M., Hu, L., Shi, C., Wang, X.: Adversarial label-flipping attack and defense for graph neural networks. In: 2020 IEEE International Conference on Data Mining (ICDM), pp. 791–800 (2020). https://doi.org/10.1109/ICDM50108.2020.00088

Download references

Acknowledgements

We thank all the reviewers for their constructive comments. This project was supported in part by US National Science Foundation grant CNS-1816399. This work was also supported in part by the Commonwealth Cyber Initiative, an investment in the advancement of cyber R&D, innovation and workforce development. For more information about CCI, visit cyberinitiative.org.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qi Xia .

Editor information

Editors and Affiliations

A More Experiments on MNIST

A More Experiments on MNIST

All of the experiments in this section have the same setting as the main paper. We will mark any differences if there are. In this section, we supplement some results of experiments on the MNIST dataset and more workers [23]. The model we are using is LeNet-5 [23].

1.1 A.1 Federated Learning with All Node Participation

Naive Heterogeneous Environment. We compare our ToFi with three classic methods and ground truth (filter out all Byzantine attacks, average aggregation) in three Byzantine environments (Gaussian, wrong label, and one bit) we described in the main paper and no Byzantine environment. The distributed environment that we use here is the naive heterogeneous environment. We use 10 as interval length. The results are in Fig. 6. From this figure, we can see that the performance of Krum is not as good as ToFi, GeoMedian, and FABA, while these three methods have very similar performance in the naive heterogeneous environment for these three different types of attacks. As for the no Byzantine scenario, the performances are similar among ToFi, GeoMedian, and FABA, while Krum has a lower accuracy than those algorithms.

Fig. 6.
figure 6

Experiment results of different algorithms for Gaussian, wrong label, one bit Byzantine attacks and no Byzantine attack scenario on naive heterogeneous environment for MNIST dataset

Enhanced Heterogeneous Environment. In order to show the difference, we compare ToFi with three classic methods and ground truth (filter out all Byzantine attacks, average aggregation) in three Byzantine environments (Gaussian, wrong label, and one bit) and no Byzantine environment. This time we change the distributed environment to the enhanced heterogeneous environment. We use 10 as interval length. The results are in Fig. 7. From this figure, we can see that ToFi has much better performance than Krum, FABA, and GeoMedian. GeoMedian has the second-best performance for Gaussian and one bit attacks. FABA has the second-best performance for wrong label attacks and no Byzantine scenario. But both of them have a significant accuracy decline than our algorithm. Krum has the worst performance in the enhanced heterogeneous environment.

Fig. 7.
figure 7

Experiment results of different algorithms for Gaussian, wrong label, one bit Byzantine attacks and no Byzantine attack scenario on enhanced heterogeneous environment for MNIST dataset

More Workers Experiment. Because of the limitation of the hardware, we cannot make experiments for more workers than 8 on the CIFAR-10 dataset. Here we only examine the scenario with more workers on the MNIST dataset. In this experiment, we choose 32 workers, among which 8 out of 32 workers are Byzantine workers. To show the difference, we examine this setting in the enhanced heterogeneous environment. The results are in Fig. 8. From Fig. 8, it has a very similar performance with the 8-worker scenario. ToFi still outperforms other algorithms. For the Gaussian attack, ToFi has a similar performance with FABA and beats all other algorithms. For wrong label attack and one bit attack, ToFi performs much better than others. The best performance here is not as good as no Byzantine attack case. It is because in the experiment we fixed the workers who suffer Byzantine attacks. Since in this experiment we use the enhanced heterogeneous environment, the data with some labels may be hidden by the Byzantine workers. This will cause a decrease in the accuracy for the best performance.

Fig. 8.
figure 8

Experiment results of different algorithms for Gaussian, wrong label and one bit Byzantine attacks on enhanced heterogeneous environment with 32 workers for MNIST dataset

1.2 A.2 Federated Learning with Partial Node Participation

We compare Krum, GeoMedian, FABA and Zeno with ToFi in the federated learning environment using similar setting with CIFAR-10 dataset. The results are in Fig. 9. We can see that ToFi outperforms all other algorithms. All the other algorithms are not designed for federated learning and thus have very bad performance.

Fig. 9.
figure 9

Experiment results of Gaussian attack on enhanced heterogeneous environment in federated learning

Rights and permissions

Reprints and permissions

Copyright information

© 2021 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xia, Q., Tao, Z., Li, Q. (2021). ToFi: An Algorithm to Defend Against Byzantine Attacks in Federated Learning. In: Garcia-Alfaro, J., Li, S., Poovendran, R., Debar, H., Yung, M. (eds) Security and Privacy in Communication Networks. SecureComm 2021. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 398. Springer, Cham. https://doi.org/10.1007/978-3-030-90019-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-90019-9_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-90018-2

  • Online ISBN: 978-3-030-90019-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics