skip to main content
10.1145/3590140.3629112acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

Characterizing Distributed Machine Learning Workloads on Apache Spark: (Experimentation and Deployment Paper)

Published:27 November 2023Publication History

ABSTRACT

Distributed machine learning (DML) environments are widely used in many application domains to build decision-making systems. However, the complexity of these environments is overwhelming for novice users. On the one hand, data scientists are more familiar with hyper-parameter tuning and typically lack an understanding of the trade-offs and challenges of parameterizing DML platforms to achieve good performance. On the other hand, system administrators focus on tuning distributed platforms, unaware of the possible implications of the platform on the quality of the learning models. To shed light on such parameter configuration interplay, we run multiple DML workloads on the widely used Apache Spark distributed platform, leveraging 13 popular learning methods and 6 real-world datasets on two distinct clusters. We collect and perform an in-depth analysis of workload execution traces to compare the efficiency of different configuration strategies. We consider tuning only hyper-parameters, tuning only platform parameters, and jointly tuning both hyper-parameters and platform parameters. We publicly release our collected traces and derive key takeaways on DML workloads. Counter-intuitively, platform parameters have a higher impact on the model quality than hyper-parameters. More generally, we show that multi-level parameter configuration can provide better results in terms of model quality and execution time while also optimizing resource costs.

References

  1. MLPerf. https://mlperf.org/. Last accessed: Oct 24, 2023.Google ScholarGoogle Scholar
  2. News 20 dataset. http://qwone.com/~jason/20Newsgroups. Last accessed: Oct 24, 2023.Google ScholarGoogle Scholar
  3. Sparkmeasure, a tool for performance troubleshooting of apache spark workloads. https://db-blog.web.cern.ch/blog/luca-canali/2018-08-sparkmeasure-tool-performance-troubleshooting-apache-spark-workloads. Last accessed: Oct 24, 2023.Google ScholarGoogle Scholar
  4. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml. Last accessed: Oct 24, 2023.Google ScholarGoogle Scholar
  5. Kaggle. https://www.kaggle.com/datasets, 2021.Google ScholarGoogle Scholar
  6. DML Workload Characterization Git Repository. https://github.com/DMLCharacterization/DMLCharacterization/, May 2023.Google ScholarGoogle Scholar
  7. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: A System for Large-scale Machine Learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, Osdi'16, pages 265--283, Berkeley, CA, USA, 2016. USENIX Association.Google ScholarGoogle Scholar
  8. Sunita B. Aher and L.M.R.J. Lobo. Combination of Machine Learning Algorithms for Recommendation of Courses in E-Learning System Based on Historical Data. Knowledge-Based Systems, 51:1--14, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Laila Alterkawi and Matteo Migliavacca. Parallelism and Partitioning in Large-Scale GAs Using Spark. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO '19, pages 736--744, New York, NY, USA, 2019. Association for Computing Machinery.Google ScholarGoogle Scholar
  10. Apache Spark. Spark Configuration. https://spark.apache.org/docs/2.4.3/configuration.html. Last accessed: Oct 24, 2023.Google ScholarGoogle Scholar
  11. AWS. Amazon EC2 On-Demand Pricing. https://aws.amazon.com/fr/ec2/pricing/on-demand/. Last accessed: Oct 24, 2023.Google ScholarGoogle Scholar
  12. Pierre Baldi, Peter Sadowski, and Daniel Whiteson. Searching for Exotic Particles in High-Energy Physics with Deep Learning. Nature Communications, 5(C), July 2014.Google ScholarGoogle Scholar
  13. Jeff Barnes. Azure Machine Learning. Microsoft Azure Essentials. 1st ed, Microsoft, 2015.Google ScholarGoogle Scholar
  14. Maria Carla Calzarossa, Luisa Massari, and Daniele Tessera. Workload Characterization: A Survey Revisited. ACM Computing Surveys (CSUR), 48(3):1--43, 2016.Google ScholarGoogle Scholar
  15. Beidi Chen, Tharun Medini, James Farwell, Sameh Gobriel, Tsung-Yuan Charlie Tai, and Anshumali Shrivastava. SLIDE: In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems. In Inderjit S. Dhillon, Dimitris S. Papailiopoulos, and Vivienne Sze, editors, Proc. of Machine Learning and Systems 2020, MLSys 2020, Austin, TX, USA, March 2-4, 2020. mlsys.org, 2020.Google ScholarGoogle Scholar
  16. Jian Chen and Russell M. Clapp. Astro: Auto-Generation of Synthetic Traces Using Scaling Pattern Recognition for MPI Workloads. IEEE Transactions on Parallel and Distributed Systems, 28(8):2159--2171, 2017.Google ScholarGoogle Scholar
  17. Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, and Scott Yang. AdaNet: Adaptive Structural Learning of Artificial Neural Networks. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 874--883, 06--11 Aug 2017.Google ScholarGoogle Scholar
  18. Jason Jinquan Dai, Yiheng Wang, Xin Qiu, Ding Ding, Yao Zhang, Yanzhang Wang, Xianyan Jia, Cherry Li Zhang, Yan Wan, Zhichao Li, et al. BigDL: A Distributed Deep Learning Framework for Big Data. In Proceedings of the ACM Symposium on Cloud Computing, pages 50--60, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Samuel Danziger, Roberta Baronio, Lydia Ho, Linda Hall, Kirsty Salmon, G. Hatfield, Peter Kaiser, and Richard Lathrop. Predicting Positive p53 Cancer Rescue Regions Using Most Informative Positive (MIP) Active Learning. PLoS computational biology, 5, 09 2009.Google ScholarGoogle Scholar
  20. Li Deng. The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web]. IEEE Signal Processing Magazine, 29(6):141--142, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  21. Li Deng and Dong Yu. Deep Learning: Methods and Applications. Foundations and trends in signal processing, 7(3-4):197--387, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  22. Katerine Diaz-Chito, Aura Hernández-Sabaté, and Antonio M. López. A Reduced Feature Set for Driver Head Pose Estimation. Appl. Soft Comput., 45(C):98--107, August 2016.Google ScholarGoogle Scholar
  23. Radwa Elshawi, Abdul Wahab, Ahmed Barnawi, and Sherif Sakr. DLBench: A Comprehensive Experimental Evaluation of Deep Learning Frameworks. Cluster Computing, February 2021.Google ScholarGoogle Scholar
  24. Matthias Feurer, Katharina Eggensperger, Stefan Falkner, Marius Lindauer, and Frank Hutter. Auto-Sklearn 2.0: The Next Generation. CoRR, abs/2007.04074, 2020.Google ScholarGoogle Scholar
  25. Pasi Fränti and Sami Sieranoja. How Much Can k-Means Be Improved by Using Better Initialization and Repeats? Pattern Recognition, 93:95--112, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Paulo Gabriel and Rodrigo Mello. Modelling Distributed Computing Workloads to Support The Study of Scheduling Decisions. International Journal of Computational Science and Engineering, 11:155--166, 01 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hugo E. S. Galindo, Erico A. C. Guedes, Paulo R. M. Maciel, Bruno Silva, and Sergio M. L. Galdino. WGCap: A Synthetic Trace Generation Tool for Capacity Planning of Virtual Server Environments. In 2010 IEEE International Conference on Systems, Man and Cybernetics, pages 2094--2101, 2010.Google ScholarGoogle Scholar
  28. Matt W Gardner and SR Dorling. Artificial Neural Networks (The Multilayer Perceptron) -- A Review of Applications in The Atmospheric Sciences. Atmospheric environment, 32(14-15):2627--2636, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  29. Jean Gaudart, Bernard Giusiano, and Laetitia Huiart. Comparison of The Performance of Multi-Layer Perceptron and Linear Regression for Epidemiological Data. Computational Statistics and Data Analysis, 44:547--570, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  30. Herodotos Herodotou, Yuxing Chen, and Jiaheng Lu. A Survey on Automatic Parameter Tuning for Big Data Processing Systems. ACM Comput. Surv., 53(2), April 2020.Google ScholarGoogle Scholar
  31. Sepp Hochreiter and Jürgen Schmidhuber. Long Short-Term Memory. Neural computation, 9(8):1735--1780, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Xia Hu, Lingyang Chu, Jian Pei, Weiqing Liu, and Jiang Bian. Model Complexity of Deep Learning: A Survey. Knowledge and Information Systems, 63:2585--2619, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jeffrey Jackovich and Ruze Richards. Machine Learning with AWS: Explore The Power of Cloud Services for Your Machine Learning and Artificial Intelligence Projects. Packt Publishing Ltd, 2018.Google ScholarGoogle Scholar
  34. Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, and Fan Yang. Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads. In 2019 USENIX Annual Technical Conference (USENIX ATC 19), pages 947--960, Renton, WA, July 2019. USENIX Association.Google ScholarGoogle Scholar
  35. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional Architecture for Fast Feature Embedding. In Proceedings of the 22nd ACM international conference on Multimedia, pages 675--678, 2014.Google ScholarGoogle Scholar
  36. Haifeng Jin, Qingquan Song, and Xia Hu. Auto-Keras: An Efficient Neural Architecture Search System. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1946--1956, 2019.Google ScholarGoogle Scholar
  37. Virginia Klema and Alan Laub. The Singular Value Decomposition: Its Computation and Some Applications. IEEE Transactions on automatic control, 25(2):164--176, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  38. Furkan Koltuk and Ece GÃijran Schmidt. A Novel Method for the Synthetic Generation of Non-I.I.D Workloads for Cloud Data Centers. In 2020 IEEE Symposium on Computers and Communications (ISCC), pages 1--6, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  39. Konstantina Kourou, Themis P. Exarchos, Konstantinos P. Exarchos, Michalis V. Karamouzis, and Dimitrios I Fotiadis. Machine Learning Applications in Cancer Prognosis and Prediction. Computational and structural biotechnology journal, 13:8--17, 2015.Google ScholarGoogle Scholar
  40. Min Li, Jian Tan, Yandong Wang, Li Zhang, and Valentina Salapura. SparkBench: A Comprehensive Benchmarking Suite for in Memory Data Analytic Platform Spark. In Proceedings of the 12th ACM International Conference on Computing Frontiers, CF '15, New York, NY, USA, 2015. Association for Computing Machinery.Google ScholarGoogle Scholar
  41. Weizhe Li, Mike Mikailov, and Weijie Chen. Scaling the Inference of Digital Pathology Deep Learning Models using CPU-based High-Performance Computing. IEEE Transactions on Artificial Intelligence, pages 1--15, 2023.Google ScholarGoogle Scholar
  42. Petro Liashchynskyi and Pavlo Liashchynskyi. Grid Search, Random Search, Genetic Algorithm: A Big Comparison for NAS. arXiv preprint arXiv:1912.06059, 2019.Google ScholarGoogle Scholar
  43. Aristidis Likas, Nikos Vlassis, and Jakob J Verbeek. The Global K-Means Clustering Algorithm. Pattern recognition, 36(2):451--461, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  44. Xiu Ma, Guangli Li, Lei Liu, Huaxiao Liu, and Xueying Wang. Accelerating Deep Neural Network Filter Pruning with Mask-Aware Convolutional Computations on Modern CPUs. Neurocomputing, 505:375--387, 2022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. D. Magalhães, R. Calheiros, R. Buyya, and D. Gomes. Workload Modeling for Resource Usage Analysis and Simulation in Cloud Computing. Comput. Electr. Eng., 47:69--81, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. S. G. Makridakis and M. Hibon. Evaluating Accuracy (or Error) Measures. Fontainebleau edition, 1995.Google ScholarGoogle Scholar
  47. O. Marcu, A. Costan, G. Antoniu, and M. S. Perez-Hernandez. Spark Versus Flink: Understanding Performance in Big Data Analytics Frameworks. In 2016 IEEE International Conference on Cluster Computing(CLUSTER), pages 433--442, 2016.Google ScholarGoogle Scholar
  48. Andrew McCallumzy, Kamal Nigamy, Jason Renniey, and Kristie Seymorey. Building Domain-Specific Search Engines with Machine Learning Techniques. In Proceedings of the AAAI Spring Symposium on Intelligent Agents in Cyberspace. Citeseer, pages 28--39. Citeseer, 1999.Google ScholarGoogle Scholar
  49. Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, et al. MLlib: Machine Learning in Apache Spark. The Journal of Machine Learning Research, 17(1):1235--1241, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Sparsh Mittal, Poonam Rajput, and Sreenivas Subramoney. A Survey of Deep Learning on CPUs: Opportunities and Co-Optimizations. IEEE Transactions on Neural Networks and Learning Systems, 33(10):5095--5115, 2022.Google ScholarGoogle ScholarCross RefCross Ref
  51. Sparsh Mittal and Jeffrey S. Vetter. A Survey of Methods for Analyzing and Improving GPU Energy Efficiency. ACM Comput. Surv., 47(2), aug 2014.Google ScholarGoogle Scholar
  52. Ali Mostafaeipour, Amir Jahangard, Mohammad Ahmadi, and Joshuva Arockia Dhanraj. Investigating the Performance of Hadoop and Spark Platforms on Machine Learning Algorithms. The Journal of Supercomputing, 77, 02 2021.Google ScholarGoogle ScholarCross RefCross Ref
  53. John Ashworth Nelder and Robert WM Wedderburn. Generalized Linear Models. Journal of the Royal Statistical Society: Series A (General), 135(3):370--384, 1972.Google ScholarGoogle ScholarCross RefCross Ref
  54. Nhan Nguyen, Mohammad Maifi Hasan Khan, and Kewen Wang. Towards Automatic Tuning of Apache Spark Configuration. In 2018 IEEE 11th International Conference on Cloud Computing (CLOUD), pages 417--425, 2018.Google ScholarGoogle Scholar
  55. Anant V. Nori, Rahul Bera, Shankar Balachandran, Joydeep Rakshit, Om J. Omer, Avishaii Abuhatzera, Belliappa Kuttanna, and Sreenivas Subramoney. REDUCT: Keep It Close, Keep It Cool! Efficient Scaling of DNN Inference on Multi-Core CPUs with near-Cache Compute. In Proceedings of the 48th Annual International Symposium on Computer Architecture, ISCA '21, pages 167--180. IEEE Press, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Gang-Min Park, Yong Seok Heo, and Hyuk-Yoon Kwon. Trade-Off Analysis Between Parallelism and Accuracy of SLIC on Apache Spark. In 2021 IEEE International Conference on Big Data and Smart Computing (BigComp), pages 5--12, 2021.Google ScholarGoogle Scholar
  57. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.Google ScholarGoogle Scholar
  58. PCM. Processor Counter Monitor (PCM). https://software.intel.com/content/www/us/en/develop/articles/intel-performance-counter-monitor.html, 2022. Last accessed: Oct 24, 2023.Google ScholarGoogle Scholar
  59. Leonardo Piga, Reinaldo Bergamaschi, Felipe Klein, Rodolfo Azevedo, and Sandro Rigo. Empirical Web Server Power Modeling and Characterization. In 2011 IEEE International Symposium on Workload Characterization (IISWC), pages 75--75, 2011.Google ScholarGoogle Scholar
  60. Philipp Probst, Marvin N Wright, and Anne-Laure Boulesteix. Hyperparameters and Tuning Strategies for Random Forest. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(3):e1301, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  61. J. Ross Quinlan. Induction of Decision Trees. Machine learning, 1(1):81--106, 1986.Google ScholarGoogle ScholarCross RefCross Ref
  62. Angie K Reyes, Juan C Caicedo, and Jorge E Camargo. Fine-Tuning Deep Convolutional Networks for Plant Recognition. CLEF (Working Notes), 1391:467--475, 2015.Google ScholarGoogle Scholar
  63. C.J. Van Rijsbergen. Information Retrieval. Journal of the American Society for Information Science, 30(6):374--375, 1979.Google ScholarGoogle ScholarCross RefCross Ref
  64. Irina Rish et al. An Empirical Study of The Naive Bayes Classifier. In IJCAI 2001 workshop on empirical methods in artificial intelligence, volume 3, pages 41--46, 2001.Google ScholarGoogle Scholar
  65. Isabelly Rocha, Nathaniel Morris, Lydia Y. Chen, Pascal Felber, Robert Birke, and Valerio Schiavoni. PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep Learning Clusters. In Proceedings of the 21st International Middleware Conference, Middleware '20, pages 89--104, New York, NY, USA, 2020. Association for Computing Machinery.Google ScholarGoogle Scholar
  66. Tara N Sainath, Abdel-rahman Mohamed, Brian Kingsbury, and Bhuvana Ramabhadran. Deep Convolutional Neural Networks for LVCSR. In 2013 IEEE international conference on acoustics, speech and signal processing, pages 8614--8618. IEEE, 2013.Google ScholarGoogle Scholar
  67. Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. The Hadoop Distributed File System. In 2010 IEEE 26th symposium on mass storage systems and technologies (MSST), pages 1--10. Ieee, 2010.Google ScholarGoogle Scholar
  68. The Apache Software Foundation. Hadoop Commands Guide. https://hadoop.apache.org/docs/r1.2.1/cluster_setup.html#Configuration. Last accessed: Oct 24, 2023.Google ScholarGoogle Scholar
  69. Alexander Vergara, Shankar Vembu, Tuba Ayhan, Margaret A. Ryan, Margie L. Homer, and Ramón Huerta. Chemical Gas Sensor Drift Compensation Using Classifier Ensembles. Sensors and Actuators B: Chemical, 166-167:320--329, May 2012.Google ScholarGoogle ScholarCross RefCross Ref
  70. Mengdi Wang, Chen Meng, Guoping Long, Chuan Wu, Jun Yang, Wei Lin, and Yangqing Jia. Characterizing Deep Learning Training Workloads on Alibaba-PAI. In IEEE International Symposium on Workload Characterization, IISWC 2019, Orlando, FL, USA, November 3-5, 2019, pages 189--202. IEEE, 2019.Google ScholarGoogle Scholar
  71. Qizhen Weng, Wencong Xiao, Yinghao Yu, Wei Wang, Cheng Wang, Jian He, Yong Li, Liping Zhang, Wei Lin, and Yu Ding. MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pages 945--960, Renton, WA, 2022. USENIX Association.Google ScholarGoogle Scholar
  72. Svante Wold, Kim Esbensen, and Paul Geladi. Principal Component Analysis. Chemometrics and intelligent laboratory systems, 2(1-3):37--52, 1987.Google ScholarGoogle Scholar
  73. Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv preprint arXiv:1708.07747, 2017.Google ScholarGoogle Scholar
  74. Reynold S Xin, Josh Rosen, Matei Zaharia, Michael J Franklin, Scott Shenker, and Ion Stoica. Shark: SQL and Rich Analytics at Scale. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of data, pages 13--24, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Zehua Yang, Zhisheng Ye, Tianhao Fu, Jing Luo, Xiong Wei, Yingwei Luo, Xiaolin Wang, Zhenlin Wang, and Tianwei Zhang. Tear Up the Bubble Boom: Lessons Learned From a Deep Learning Research and Development Cluster. In 2022 IEEE 40th International Conference on Computer Design (ICCD), pages 672--680, 2022.Google ScholarGoogle ScholarCross RefCross Ref
  76. Madhu Yedla, Srinivasa Rao Pathakota, and TM Srinivasa. Enhancing K-Means Clustering Algorithm with Improved Initial Center. International Journal of computer science and information technologies, 1(2):121--125, 2010.Google ScholarGoogle Scholar
  77. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J Franklin, Scott Shenker, and Ion Stoica. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Presented as part of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12), pages 15--28, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, et al. Apache Spark: A Unified Engine for Big Data Processing. Communications of the ACM, 59(11):56--65, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Hongyu Zhu, Mohamed Akrout, Bojian Zheng, AndrewPelegris, Anand Jayarajan, Amar Phanishayee, Bianca Schroeder, and Gennady Pekhimenko. Benchmarking and Analyzing Deep Neural Network Training. In IEEE International Symposium on Workload Characterization (IISWC'18), North Carolina, October 2018.Google ScholarGoogle Scholar
  80. Xiaonan Zou, Yong Hu, Zhewen Tian, and Kaiyuan Shen. Logistic Regression Model Optimization and Case Analysis. In 2019 IEEE 7th International Conference on Computer Science and Network Technology (ICCSNT), pages 135--139. IEEE, 2019.Google ScholarGoogle Scholar

Index Terms

  1. Characterizing Distributed Machine Learning Workloads on Apache Spark: (Experimentation and Deployment Paper)

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          Middleware '23: Proceedings of the 24th International Middleware Conference
          November 2023
          334 pages
          ISBN:9798400701771
          DOI:10.1145/3590140

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 November 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate203of948submissions,21%
        • Article Metrics

          • Downloads (Last 12 months)99
          • Downloads (Last 6 weeks)10

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader