Skip to main content

Non-IID Distributed Learning with Optimal Mixture Weights

  • Conference paper
  • First Online:
  • 537 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13716))

Abstract

Distributed learning can well solve the problem of training model with large-scale data, which has attracted much attention in recent years. However, most existing distributed learning algorithms set uniform mixture weights across clients when aggregating the global model, which impairs the accuracy under Non-IID (Not Independently or Identically Distributed) setting. In this paper, we present a general framework to optimize the mixture weights and show that our framework has lower expected loss than the uniform mixture weights framework theoretically. Moreover, we provide strong generalization guarantee for our framework, where the excess risk bound can converge at \(\mathcal {O}(1/n)\), which is as fast as centralized training. Motivated by the theoretical findings, we propose a novel algorithm to improve the performance of distributed learning under Non-IID setting. Through extensive experiments, we show that our algorithm outperforms other mainstream methods, which coincides with our theory.

J. Li and B. Wei—Contribute equally to this work.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Available at https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/.

  2. 2.

    Codes are available at https://github.com/Bojian-Wei/Non-IID-Distributed-Learning-with-Optimal-Mixture-Weights.

References

  1. Acharya, J., Sa, C.D., Foster, D.J., Sridharan, K.: Distributed learning with sublinear communication. In: ICML 2019, vol. 97, pp. 40–50 (2019)

    Google Scholar 

  2. Arjevani, Y., Shamir, O.: Communication complexity of distributed convex learning and optimization. In: NIPS 2015, pp. 1756–1764 (2015)

    Google Scholar 

  3. Aviv, R.Z., Hakimi, I., Schuster, A., Levy, K.Y.: Asynchronous distributed learning: adapting to gradient delays without prior knowledge. In: ICML 2021, vol. 139, pp. 436–445 (2021)

    Google Scholar 

  4. Bartlett, P.L., Boucheron, S., Lugosi, G.: Model selection and error estimation. Mach. Learn. 48, 85–113 (2002)

    Article  MATH  Google Scholar 

  5. Bartlett, P.L., Bousquet, O., Mendelson, S.: Local Rademacher complexities. Ann. Stat. 33(4), 1497–1537 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bartlett, P.L., Mendelson, S.: Rademacher and gaussian complexities: risk bounds and structural results. J. Mach. Learn. Res. 3(Nov), 463–482 (2002)

    Google Scholar 

  7. Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Advances in Neural Information Processing Systems, vol. 21 (NIPS), pp. 161–168 (2008)

    Google Scholar 

  8. Bousquet, O., Elisseeff, A.: Stability and generalization. J. Mach. Learn. Res. 2, 499–526 (2002)

    MathSciNet  MATH  Google Scholar 

  9. Bousquet, O., Klochkov, Y., Zhivotovskiy, N.: Sharper bounds for uniformly stable algorithms. In: COLT, pp. 610–626 (2020)

    Google Scholar 

  10. Cortes, C., Kloft, M., Mohri, M.: Learning kernels using local Rademacher complexity. In: Advances in Neural Information Processing Systems, vol. 26 (NIPS), pp. 2760–2768 (2013)

    Google Scholar 

  11. Karimireddy, S.P., Kale, S., Mohri, M., Reddi, S.J., Stich, S.U., Suresh, A.T.: SCAFFOLD: stochastic controlled averaging for federated learning. In: ICML 2020, vol. 119, pp. 5132–5143 (2020)

    Google Scholar 

  12. Koltchinskii, V., Panchenko, D.: Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Stat. 1–50 (2002)

    Google Scholar 

  13. Kutin, S., Niyogi, P.: Almost-everywhere algorithmic stability and generalization error. In: Proceedings of the 18th Conference in Uncertainty in Artificial Intelligence (UAI), pp. 275–282 (2002)

    Google Scholar 

  14. Lange, T., Braun, M.L., Roth, V., Buhmann, J.M.: Stability-based model selection. In: Advances in Neural Information Processing Systems, vol. 15 (NIPS), pp. 617–624 (2002)

    Google Scholar 

  15. Li, T., Sahu, A.K., Talwalkar, A., Smith, V.: Federated learning: challenges, methods, and future directions. IEEE Sig. Process. Mag. 37(3), 50–60 (2020)

    Article  Google Scholar 

  16. Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V.: Federated optimization in heterogeneous networks. In: MLSys (2020)

    Google Scholar 

  17. Lin, S.B., Wang, D., Zhou, D.X.: Distributed kernel ridge regression with communications. J. Mach. Learn. Res. 21(93), 1–38 (2020)

    MathSciNet  MATH  Google Scholar 

  18. Liu, Y., Liu, J., Wang, S.: Effective distributed learning with random features: improved bounds and algorithms. In: ICLR 2021 (2021)

    Google Scholar 

  19. McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: AISTATS 2017, vol. 54, pp. 1273–1282 (2017)

    Google Scholar 

  20. Oneto, L., Ghio, A., Ridella, S., Anguita, D.: Local Rademacher complexity: sharper risk bounds with and without unlabeled samples. Neural Netw. 65, 115–125 (2015)

    Article  MATH  Google Scholar 

  21. Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: NIPS 2007, pp. 1177–1184 (2007)

    Google Scholar 

  22. Reddi, S.J., et al.: Adaptive federated optimization. In: ICLR 2021 (2021)

    Google Scholar 

  23. Richards, D., Rebeschini, P., Rosasco, L.: Decentralised learning with random features and distributed gradient descent. In: ICML 2020, vol. 119, pp. 8105–8115 (2020)

    Google Scholar 

  24. Sharif-Nassab, A., Salehkaleybar, S., Golestani, S.J.: Order optimal one-shot distributed learning. In: NeurIPS 2019, pp. 2165–2174 (2019)

    Google Scholar 

  25. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (2000). https://doi.org/10.1007/978-1-4757-3264-1

  26. Wang, C., Cheng, M., Hu, X., Huang, J.: EasyASR: a distributed machine learning platform for end-to-end automatic speech recognition. In: AAAI 2021, pp. 16111–16113 (2021)

    Google Scholar 

  27. Wang, J., Liu, Q., Liang, H., Joshi, G., Poor, H.V.: Tackling the objective inconsistency problem in heterogeneous federated optimization. In: NeurIPS 2020 (2020)

    Google Scholar 

  28. Wei, B., Li, J., Liu, Y., Wang, W.: Federated learning for non-IID data: from theory to algorithm. In: PRICAI 2021, vol. 13031, pp. 33–48 (2021)

    Google Scholar 

  29. Woodworth, B.E., Patel, K.K., Srebro, N.: Minibatch vs local SGD for heterogeneous distributed learning. In: NeurIPS 2020 (2020)

    Google Scholar 

  30. Yu, C., et al.: Distributed learning over unreliable networks. In: ICML 2019, vol. 97, pp. 7202–7212 (2019)

    Google Scholar 

  31. Zhu, R., Yang, S., Pfadler, A., Qian, Z., Zhou, J.: Learning efficient parameter server synchronization policies for distributed SGD. In: ICLR 2020 (2020)

    Google Scholar 

Download references

Acknowledgement

This work was supported in part by the Excellent Talents Program of Institute of Information Engineering, CAS, the Special Research Assistant Project of CAS, the Beijing Outstanding Young Scientist Program (No. BJJWZYJH012019100020098), Beijing Natural Science Foundation (No. 4222029), and National Natural Science Foundation of China (No. 62076234, No. 62106257).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Liu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1110 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, J., Wei, B., Liu, Y., Wang, W. (2023). Non-IID Distributed Learning with Optimal Mixture Weights. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13716. Springer, Cham. https://doi.org/10.1007/978-3-031-26412-2_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26412-2_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26411-5

  • Online ISBN: 978-3-031-26412-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics