Non-IID Distributed Learning with Optimal Mixture Weights

Li, Jian; Wei, Bojian; Liu, Yong; Wang, Weiping

doi:10.1007/978-3-031-26412-2_33

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13716))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

841 Accesses

Abstract

Distributed learning can well solve the problem of training model with large-scale data, which has attracted much attention in recent years. However, most existing distributed learning algorithms set uniform mixture weights across clients when aggregating the global model, which impairs the accuracy under Non-IID (Not Independently or Identically Distributed) setting. In this paper, we present a general framework to optimize the mixture weights and show that our framework has lower expected loss than the uniform mixture weights framework theoretically. Moreover, we provide strong generalization guarantee for our framework, where the excess risk bound can converge at $\mathcal {O}(1/n)$, which is as fast as centralized training. Motivated by the theoretical findings, we propose a novel algorithm to improve the performance of distributed learning under Non-IID setting. Through extensive experiments, we show that our algorithm outperforms other mainstream methods, which coincides with our theory.

J. Li and B. Wei—Contribute equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Mixture of experts distributional regression: implementation using robust estimation with adaptive first-order methods

Article Open access 15 November 2023

Mixture density network estimation of continuous variable maximum likelihood using discrete training samples

Article Open access 28 July 2021

Distributed block-diagonal approximation methods for regularized empirical risk minimization

Article Open access 18 December 2019

Notes

1.
Available at https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/.
2.
Codes are available at https://github.com/Bojian-Wei/Non-IID-Distributed-Learning-with-Optimal-Mixture-Weights.

References

Acharya, J., Sa, C.D., Foster, D.J., Sridharan, K.: Distributed learning with sublinear communication. In: ICML 2019, vol. 97, pp. 40–50 (2019)
Google Scholar
Arjevani, Y., Shamir, O.: Communication complexity of distributed convex learning and optimization. In: NIPS 2015, pp. 1756–1764 (2015)
Google Scholar
Aviv, R.Z., Hakimi, I., Schuster, A., Levy, K.Y.: Asynchronous distributed learning: adapting to gradient delays without prior knowledge. In: ICML 2021, vol. 139, pp. 436–445 (2021)
Google Scholar
Bartlett, P.L., Boucheron, S., Lugosi, G.: Model selection and error estimation. Mach. Learn. 48, 85–113 (2002)
Article MATH Google Scholar
Bartlett, P.L., Bousquet, O., Mendelson, S.: Local Rademacher complexities. Ann. Stat. 33(4), 1497–1537 (2005)
Article MathSciNet MATH Google Scholar
Bartlett, P.L., Mendelson, S.: Rademacher and gaussian complexities: risk bounds and structural results. J. Mach. Learn. Res. 3(Nov), 463–482 (2002)
Google Scholar
Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Advances in Neural Information Processing Systems, vol. 21 (NIPS), pp. 161–168 (2008)
Google Scholar
Bousquet, O., Elisseeff, A.: Stability and generalization. J. Mach. Learn. Res. 2, 499–526 (2002)
MathSciNet MATH Google Scholar
Bousquet, O., Klochkov, Y., Zhivotovskiy, N.: Sharper bounds for uniformly stable algorithms. In: COLT, pp. 610–626 (2020)
Google Scholar
Cortes, C., Kloft, M., Mohri, M.: Learning kernels using local Rademacher complexity. In: Advances in Neural Information Processing Systems, vol. 26 (NIPS), pp. 2760–2768 (2013)
Google Scholar
Karimireddy, S.P., Kale, S., Mohri, M., Reddi, S.J., Stich, S.U., Suresh, A.T.: SCAFFOLD: stochastic controlled averaging for federated learning. In: ICML 2020, vol. 119, pp. 5132–5143 (2020)
Google Scholar
Koltchinskii, V., Panchenko, D.: Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Stat. 1–50 (2002)
Google Scholar
Kutin, S., Niyogi, P.: Almost-everywhere algorithmic stability and generalization error. In: Proceedings of the 18th Conference in Uncertainty in Artificial Intelligence (UAI), pp. 275–282 (2002)
Google Scholar
Lange, T., Braun, M.L., Roth, V., Buhmann, J.M.: Stability-based model selection. In: Advances in Neural Information Processing Systems, vol. 15 (NIPS), pp. 617–624 (2002)
Google Scholar
Li, T., Sahu, A.K., Talwalkar, A., Smith, V.: Federated learning: challenges, methods, and future directions. IEEE Sig. Process. Mag. 37(3), 50–60 (2020)
Article Google Scholar
Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V.: Federated optimization in heterogeneous networks. In: MLSys (2020)
Google Scholar
Lin, S.B., Wang, D., Zhou, D.X.: Distributed kernel ridge regression with communications. J. Mach. Learn. Res. 21(93), 1–38 (2020)
MathSciNet MATH Google Scholar
Liu, Y., Liu, J., Wang, S.: Effective distributed learning with random features: improved bounds and algorithms. In: ICLR 2021 (2021)
Google Scholar
McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: AISTATS 2017, vol. 54, pp. 1273–1282 (2017)
Google Scholar
Oneto, L., Ghio, A., Ridella, S., Anguita, D.: Local Rademacher complexity: sharper risk bounds with and without unlabeled samples. Neural Netw. 65, 115–125 (2015)
Article MATH Google Scholar
Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: NIPS 2007, pp. 1177–1184 (2007)
Google Scholar
Reddi, S.J., et al.: Adaptive federated optimization. In: ICLR 2021 (2021)
Google Scholar
Richards, D., Rebeschini, P., Rosasco, L.: Decentralised learning with random features and distributed gradient descent. In: ICML 2020, vol. 119, pp. 8105–8115 (2020)
Google Scholar
Sharif-Nassab, A., Salehkaleybar, S., Golestani, S.J.: Order optimal one-shot distributed learning. In: NeurIPS 2019, pp. 2165–2174 (2019)
Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (2000). https://doi.org/10.1007/978-1-4757-3264-1
Wang, C., Cheng, M., Hu, X., Huang, J.: EasyASR: a distributed machine learning platform for end-to-end automatic speech recognition. In: AAAI 2021, pp. 16111–16113 (2021)
Google Scholar
Wang, J., Liu, Q., Liang, H., Joshi, G., Poor, H.V.: Tackling the objective inconsistency problem in heterogeneous federated optimization. In: NeurIPS 2020 (2020)
Google Scholar
Wei, B., Li, J., Liu, Y., Wang, W.: Federated learning for non-IID data: from theory to algorithm. In: PRICAI 2021, vol. 13031, pp. 33–48 (2021)
Google Scholar
Woodworth, B.E., Patel, K.K., Srebro, N.: Minibatch vs local SGD for heterogeneous distributed learning. In: NeurIPS 2020 (2020)
Google Scholar
Yu, C., et al.: Distributed learning over unreliable networks. In: ICML 2019, vol. 97, pp. 7202–7212 (2019)
Google Scholar
Zhu, R., Yang, S., Pfadler, A., Qian, Z., Zhou, J.: Learning efficient parameter server synchronization policies for distributed SGD. In: ICLR 2020 (2020)
Google Scholar

Download references

Acknowledgement

This work was supported in part by the Excellent Talents Program of Institute of Information Engineering, CAS, the Special Research Assistant Project of CAS, the Beijing Outstanding Young Scientist Program (No. BJJWZYJH012019100020098), Beijing Natural Science Foundation (No. 4222029), and National Natural Science Foundation of China (No. 62076234, No. 62106257).

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Jian Li, Bojian Wei & Weiping Wang
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Bojian Wei
Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China
Yong Liu

Authors

Jian Li
View author publications
You can also search for this author in PubMed Google Scholar
Bojian Wei
View author publications
You can also search for this author in PubMed Google Scholar
Yong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Weiping Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Liu .

Editor information

Editors and Affiliations

Grenoble Alpes University, Saint Martin d’Hères, France
Massih-Reza Amini
INSA Rouen Normandy, Saint Etienne du Rouvray, France
Stéphane Canu
Ruhr-Universität Bochum, Bochum, Germany
Asja Fischer
KU Leuven, Leuven, Belgium
Tias Guns
Central European University, Vienna, Austria
Petra Kralj Novak
Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1110 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, J., Wei, B., Liu, Y., Wang, W. (2023). Non-IID Distributed Learning with Optimal Mixture Weights. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13716. Springer, Cham. https://doi.org/10.1007/978-3-031-26412-2_33

Download citation

DOI: https://doi.org/10.1007/978-3-031-26412-2_33
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26411-5
Online ISBN: 978-3-031-26412-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Non-IID Distributed Learning with Optimal Mixture Weights

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Mixture of experts distributional regression: implementation using robust estimation with adaptive first-order methods

Mixture density network estimation of continuous variable maximum likelihood using discrete training samples

Distributed block-diagonal approximation methods for regularized empirical risk minimization

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 1110 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Non-IID Distributed Learning with Optimal Mixture Weights

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Mixture of experts distributional regression: implementation using robust estimation with adaptive first-order methods

Mixture density network estimation of continuous variable maximum likelihood using discrete training samples

Distributed block-diagonal approximation methods for regularized empirical risk minimization

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 1110 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation