ABSTRACT
Running extensive experiments is essential for building Machine Learning (ML) models. Such experiments usually require iterative execution of many trials with varying run times. In recent years, Apache Spark has become the de-facto standard for parallel data processing in the industry, in which iterative processes are implemented within the bulk-synchronous parallel (BSP) execution model. The BSP approach is also being used to parallelize ML trials in Spark. However, the BSP task synchronization barriers prevent asynchronous execution of trials, which leads to a reduced number of trials that can be run on a given computational budget. In this paper, we introduce Maggy, an open-source framework based on Spark, to execute ML trials asynchronously in parallel, with the ability to early stop poorly performing trials. In the experiments, we compare Maggy with the BSP execution of parallel trials in Spark and show that on random hyperparameter search on a convolutional neural network for the Fashion-MNIST dataset Maggy reduces the required time to execute a fixed number of trials by 33% to 58%, without any loss in the final model accuracy.
Supplemental Material
- Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ( {OSDI} 16). 265--283.Google ScholarDigital Library
- Ahsan S Alvi, Binxin Ru, Jan Calliess, Stephen J Roberts, and Michael A Os-borne. 2019. Asynchronous Batch Bayesian Optimisation with Improved Local Penalisation. arXiv preprint arXiv:1901.10452 (2019).Google Scholar
- Bowen Baker, Otkrist Gupta, Ramesh Raskar, and Nikhil Naik. 2017. Practical Neural Network Performance Prediction for Early Stopping. arXiv preprint arXiv:1705.10823 2, 3 (2017), 6.Google Scholar
- James Bergstra, Daniel Yamins, and David Cox. 2012. Hyperopt: Distributed Asynchronous Hyper-parameter Optimization. Retrieved May 21, 2020 from http://hyperopt.github.io/hyperoptGoogle Scholar
- James Bergstra, Daniel Yamins, and David Cox. 2013. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. In International Conference on Machine Learning. 115--123.Google ScholarDigital Library
- Franćois Chollet. 2020. Keras: The Next Five Years. Retrieved May 21, 2020 from https://www.youtube.com/watch?v=HBqCpWldPIIGoogle Scholar
- Databricks.2019. Scaling Hyperopt to Tune Machine Learning Models in Python. Retrieved Sep 18, 2020 from https://databricks.com/blog/2019/10/29/scaling-hyperopt-to-tune-machine-learning-models-in-python.htmlGoogle Scholar
- Stefan Falkner, Aaron Klein, and Frank Hutter. 2018. BOHB: Robust and Efficient Hyperparameter Optimization at Scale. arXiv preprint arXiv:1807.01774 (2018).Google Scholar
- David Ginsbourger, Janis Janusevskis, and Rodolphe Le Riche. 2011. Dealing with Asynchronicity in Parallel Gaussian Process based Global Optimization.Google Scholar
- Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D Sculley. 2017. Google Vizier: A Service for Black-Box Optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1487--1495.Google ScholarDigital Library
- Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren. 2019. Automated Machine Learning: Methods, Systems, Challenges. Springer Nature.Google Scholar
- Kevin Jamieson and Ameet Talwalkar. 2016. Non-stochastic Best Arm Identification and Hyperparameter Optimization. In Artificial Intelligence and Statistics. 240--248.Google Scholar
- Kirthevasan Kandasamy, Akshay Krishnamurthy, Jeff Schneider, and Barnabás Póczos. 2018. Parallelised Bayesian Optimisation via Thompson Sampling. In International Conference on Artificial Intelligence and Statistics. 133--142.Google Scholar
- Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2017. Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization. The Journal of Machine Learning Research 18, 1 (2017), 6765--6816.Google ScholarDigital Library
- Liam Li, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina, Moritz Hardt, Benjamin Recht, and Ameet Talwalkar. 2018. Massively Parallel Hyperparameter Tuning. arXiv preprint arXiv:1810.05934 (2018).Google Scholar
- Richard Liaw, Eric Liang, Robert Nishihara, Philipp Moritz, Joseph E Gonzalez, and Ion Stoica. 2018. Tune: A Research Platform for Distributed Model Selection and Training. arXiv preprint arXiv:1807.05118 (2018).Google Scholar
- Moritz Meister, Sina Sheikholeslami, Robin Andersson, Alexandru A Ormenisan, and Jim Dowling. 2020. Towards Distribution Transparency for Supervised ML With Oblivious Training Functions. In Workshop on MLOps Systems.Google Scholar
- Richard Meyes, Melanie Lu, Constantin Waubert de Puiseau, and Tobias Meisen. 2019. Ablation Studies in Artificial Neural Networks. arXiv preprint arXiv:1901.08644 (2019).Google Scholar
- Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I Jordan, et al. 2018. Ray: A Distributed Framework for Emerging AI Applications. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 561--577.Google ScholarDigital Library
- Tom O'Malley, Elie Bursztein, James Long, François Chollet, Haifeng Jin, Luca Invernizzi, et al. 2019. Keras Tuner. Retrieved May 21, 2020 from https://github.com/keras-team/keras-tunerGoogle Scholar
- Mohammad Pezeshki, Linxi Fan, Philemon Brakel, Aaron Courville, and Yoshua Bengio. 2016. Deconstructing the Ladder Network Architecture. In International Conference on Machine Learning. 2368--2376.Google Scholar
- Lutz Prechelt. 1998. Early stopping-but when? In Neural Networks: Tricks of the trade. Springer, 55--69.Google Scholar
- Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. 2015. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE 104, 1 (2015), 148--175.Google ScholarCross Ref
- Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv preprint arXiv:1708.07747 (2017).Google Scholar
- Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Presented as part of the 9th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 12). 15--28.Google Scholar
- Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, Ion Stoica, et al. 2010. Spark: Cluster computing with working sets. HotCloud 10, 10--10 (2010), 95.Google ScholarDigital Library
Index Terms
- Maggy: Scalable Asynchronous Parallel Hyperparameter Search
Recommendations
Anatomy of machine learning algorithm implementations in MPI, Spark, and Flink
With the ever-increasing need to analyze large amounts of data to get useful insights, it is essential to develop complex parallel machine learning algorithms that can scale with data and number of parallel processes. These algorithms need to run on ...
Heterogeneous Big Data Parallel Computing Optimization Model using MPI/OpenMP Hybrid and Sensor Networks
For the heterogeneous big data parallel computing model, two levels of parallelism between nodes are not considered, resulting in low efficiency of heterogeneous big data parallel computing and bandwidth to send and receive information, high communication ...
Challenges for MapReduce in Big Data
SERVICES '14: Proceedings of the 2014 IEEE World Congress on ServicesIn the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce ...
Comments