Skip to main content
Log in

DRPS: efficient disk-resident parameter servers for distributed machine learning

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Parameter server (PS) as the state-of-the-art distributed framework for large-scale iterative machine learning tasks has been extensively studied. However, existing PS-based systems often depend on memory implementations. With memory constraints, machine learning (ML) developers cannot train large-scale ML models in their rather small local clusters. Moreover, renting large-scale cloud servers is always economically infeasible for research teams and small companies. In this paper, we propose a disk-resident parameter server system named DRPS, which reduces the hardware requirement of large-scale machine learning tasks by storing high dimensional models on disk. To further improve the performance of DRPS, we build an efficient index structure for parameters to reduce the disk I/O cost. Based on this index structure, we propose a novel multi-objective partitioning algorithm for the parameters. Finally, a flexible workerselection parallel model of computation (WSP) is proposed to strike a right balance between the problem of inconsistent parameter versions (staleness) and that of inconsistent execution progresses (straggler). Extensive experiments on many typical machine learning applications with real and synthetic datasets validate the effectiveness of DRPS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Li M, Andersen D G, Park J W, Smola A J, Ahmed A, Josifovski V, Long J, Shekita E J, Su B Y. Scaling distributed machine learning with the parameter server. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2014, 583–598

  2. Chen T Q, Li M, Li Y T, Lin M, Wang N Y, Wang M J, Xiao T J, Xu B, Zhang C Y, Zhang Z. MXNet: a flexible and efficient machine learning library for heterogeneous distributed system. 2015, arXiv preprint arXiv: 1512.01274

  3. Xing E P, Ho Q R, Dai W, Kim J K, Wei J L, Lee S H, Zheng X, Xie P T, Kumar A, Yu Y L. Petuum: a new platform for distributed machine learning on big data. In: Proceedings of ACM Conference on Knowledge Discovery and Data Mining. 2015, 1335–1344

  4. Abadi M, Barham P, Chen J M, Chen Z F, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray D G, Steiner B, Tucker P A, Vasudevan V, Warden P, Wicke M, Yu Y, Zheng X Q. TensorFlow: a system for large-scale machine learning. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2016, 265–283

  5. Recht B, Re C, Wright S J, Niu F. Hogwild: a lock-free approach to parallelizing stochastic gradient descent. In: Proceeding of the 24th International Conference on Neural Information Processing Systems. 2011, 693–701

  6. Xin D, Macke S, Ma L T, Liu J L, Song S C, Parameswaran A G. Helix: holistic optimization for accelerating iterative machine learning. Proceedings of the VLDB Endowment, 2018, 12(4): 446–460

    Article  Google Scholar 

  7. Huang Y Z, Jin T, Wu Y D, Cai Z K, Yan X, Yang F, Li J F, Guo Y Y, Cheng J. FlexPS: flexible parallelism control in parameter server architecture. Proceedings of the VLDB Endowment, 2018, 11(5): 566–579

    Article  Google Scholar 

  8. Zhang Z P, Cui B, Shao Y X, Yu L L, Jiang J W, Miao X P. PS2: parameter server on spark. In: Proceedings of ACM Conference on Management of Data. 2019, 376–388

  9. Zaharia M, Chowdhury M, Franklin M J, Shenker S, Stoica I. Spark: cluster computing with working sets. In: Proceedings of USENIX Workshop on Hot Topics in Cloud Computing. 2010, 1–7

  10. Cho M, Finkler U, Kung D S, Hunter H C. BlueConnect: decomposing all-reduce for deep learning on heterogeneous network hierarchy. In: Proceedings of Conference on Machine Learning and Systems. 2019, 1–11

  11. Yang F, Li J F, Cheng J. Husky: towards a more efficient and expressive distributed computing framework. Proceedings of the VLDB Endowment, 2016, 9(5): 420–431

    Article  Google Scholar 

  12. Jiang Y M, Zhu Y B, Lan C, Yi B, Cui Y, Guo C X. A unified architecture for accelerating distributed dnn training in heterogeneous gpu/cpu clusters. In: Proceedings of USENIX Symposium on Operating Systems Design and Implementation. 2020, 463–479

  13. Wang Z G, Gu Y, Bao Y B, Yu G, Yu J X. Hybrid pulling/pushing for i/o-efficient distributed and iterative graph computing. In: Proceedings of ACM Conference on Management of Data. 2016, 479–494

  14. Qin C J, Torres M, Rusu F. Scalable asynchronous gradient descent optimization for out-of-core models. Proceedings of the VLDB Endowment, 2017, 10(10): 986–997

    Article  Google Scholar 

  15. Li M, Andersen D G, Smola A J. Graph partitioning via parallel submodular approximation to accelerate distributed machine learning. 2015, arXiv preprint arXiv: 1505.04636

  16. Renz-Wieland A, Gemulla R, Zeuch S, Markl V. Dynamic parameter allocation in parameter servers. Proceedings of the VLDB Endowment, 2020, 13(12): 1877–1890

    Article  Google Scholar 

  17. Chen Y R, Peng Y H, Bao Y X, Wu C, Zhu Y B, Guo C X. Elastic parameter server load distribution in deep learning clusters. In: Proceedings of ACM Symposium on Cloud Computing. 2020, 507–521

  18. Gallet B, Gowanlock M. Heterogeneous cpu-gpu epsilon grid joins: static and dynamic work partitioning strategies. Data Science and Engineering, 2021, 6(1): 39–62

    Article  Google Scholar 

  19. Valiant L G. A bridging model for parallel computation. Communications of the ACM, 1990, 33(8): 103–111

    Article  Google Scholar 

  20. Ho Q R, Cipar J, Cui H G, Lee S H, Kim J K, Gibbons P B, Gibson G A, Ganger G R, Xing E P. More effective distributed ML via a stale synchronous parallel parameter server. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 1223–1231

  21. Li M, Andersen D G, Smola A J, Yu K. Communication efficient distributed machine learning with the parameter server. In: Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014, 19–27

  22. Fan W F, Lu P, Luo X J, Xu J B, Yin Q, Yu W Y, Xu R Q. Adaptive asynchronous parallelization of graph algorithms. In: Proceedings of the International Conference on Management of Data. 2018, 1141–1156

  23. Jiang J W, Cui B, Zhang C, Yu L L. Heterogeneity-aware distributed parameter servers. In: Proceedings of the ACM International Conference on Management of Data. 2017, 463–478

  24. Wang Z G, Gao L X, Gu Y, Bao Y B, Yu G. FSP: towards flexible synchronous parallel framework for expectation-maximization based algorithms on cloud. In: Proceedings of the Symposium on Cloud Computing. 2017, 1–14

Download references

Acknowledgements

This work was supported by the National Key R&D Program of China (2018YFB1003404), the National Natural Science Foundation of China (Grant Nos. 62072083, U1811261, 61902366), Basal Research Fund (N180716010), Liao Ning Revitalization Talents Program (XLYC1807158) and the China Postdoctoral Science Foundation (2020T130623).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Gu.

Additional information

Zhen Song received the master degree in computer software and theory from Northeastern University, China in 2019. He is a PhD candidate in Northeastern University, China. His current research interests include distributed graph computation and distributed machine learning.

Yu Gu received the PhD degree in computer software and theory from Northeastern University, China in 2010. Currently, he is a professor and the PhD supervisor at Northeastern University, China. His current research interests include big data analysis, spatial data management and graph data management. He is a senior member of the China Computer Federation (CCF).

Zhigang Wang received the PhD degree in computer software and theory from Northeastern University, China in 2018. He is currently a lecturer in the College of Information Science and Engineering, Ocean University of China, China. He has been a visiting PhD student in University of Massachusetts Amherst, USA during December 2014 to December 2016. His research interests include cloud computing, distributed graph processing and machine learning.

Ge Yu received the PhD degree in computer science from Kyushu University, Japan in 1996. He is currently a professor and the PhD supervisor at Northeastern University, China. His research interests include distributed and parallel database, OLAP and data warehousing, data integration, graph data management, etc. He is a member of the IEEE Computer Society, IEEE, ACM, and a Fellow of the China Computer Federation (CCF).

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, Z., Gu, Y., Wang, Z. et al. DRPS: efficient disk-resident parameter servers for distributed machine learning. Front. Comput. Sci. 16, 164321 (2022). https://doi.org/10.1007/s11704-021-0445-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-021-0445-2

Keywords

Navigation