research-article

Maggy: Scalable Asynchronous Parallel Hyperparameter Search

Authors:
Moritz Meister

Logical Clocks AB, Stockholm, Sweden

Logical Clocks AB, Stockholm, Sweden
View Profile

,
Sina Sheikholeslami

KTH Royal Institute of Technology, Stockholm, Sweden

KTH Royal Institute of Technology, Stockholm, Sweden
View Profile

,
Amir H. Payberah

KTH Royal Institute of Technology, Stockholm, Sweden

KTH Royal Institute of Technology, Stockholm, Sweden
View Profile

,
Vladimir Vlassov

KTH Royal Institute of Technology, Stockholm, Sweden

KTH Royal Institute of Technology, Stockholm, Sweden
View Profile

,
Jim Dowling

KTH Royal Institute of Technology, Logical Clocks AB Stockholm, Sweden

KTH Royal Institute of Technology, Logical Clocks AB Stockholm, Sweden
View Profile

DistributedML'20: Proceedings of the 1st Workshop on Distributed Machine LearningDecember 2020Pages 28–33https://doi.org/10.1145/3426745.3431338

Published:01 December 2020Publication History

DistributedML'20: Proceedings of the 1st Workshop on Distributed Machine Learning

Pages 28–33

ABSTRACT

Running extensive experiments is essential for building Machine Learning (ML) models. Such experiments usually require iterative execution of many trials with varying run times. In recent years, Apache Spark has become the de-facto standard for parallel data processing in the industry, in which iterative processes are implemented within the bulk-synchronous parallel (BSP) execution model. The BSP approach is also being used to parallelize ML trials in Spark. However, the BSP task synchronization barriers prevent asynchronous execution of trials, which leads to a reduced number of trials that can be run on a given computational budget. In this paper, we introduce Maggy, an open-source framework based on Spark, to execute ML trials asynchronously in parallel, with the ability to early stop poorly performing trials. In the experiments, we compare Maggy with the BSP execution of parallel trials in Spark and show that on random hyperparameter search on a convolutional neural network for the Fashion-MNIST dataset Maggy reduces the required time to execute a fixed number of trials by 33% to 58%, without any loss in the final model accuracy.

Supplemental Material

3426745.3431338.mp4

mp4

38.9 MB

Download

References

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ( {OSDI} 16). 265--283.Google ScholarDigital Library
Ahsan S Alvi, Binxin Ru, Jan Calliess, Stephen J Roberts, and Michael A Os-borne. 2019. Asynchronous Batch Bayesian Optimisation with Improved Local Penalisation. arXiv preprint arXiv:1901.10452 (2019).Google Scholar
Bowen Baker, Otkrist Gupta, Ramesh Raskar, and Nikhil Naik. 2017. Practical Neural Network Performance Prediction for Early Stopping. arXiv preprint arXiv:1705.10823 2, 3 (2017), 6.Google Scholar
James Bergstra, Daniel Yamins, and David Cox. 2012. Hyperopt: Distributed Asynchronous Hyper-parameter Optimization. Retrieved May 21, 2020 from http://hyperopt.github.io/hyperoptGoogle Scholar
James Bergstra, Daniel Yamins, and David Cox. 2013. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. In International Conference on Machine Learning. 115--123.Google ScholarDigital Library
Franćois Chollet. 2020. Keras: The Next Five Years. Retrieved May 21, 2020 from https://www.youtube.com/watch?v=HBqCpWldPIIGoogle Scholar
Databricks.2019. Scaling Hyperopt to Tune Machine Learning Models in Python. Retrieved Sep 18, 2020 from https://databricks.com/blog/2019/10/29/scaling-hyperopt-to-tune-machine-learning-models-in-python.htmlGoogle Scholar
Stefan Falkner, Aaron Klein, and Frank Hutter. 2018. BOHB: Robust and Efficient Hyperparameter Optimization at Scale. arXiv preprint arXiv:1807.01774 (2018).Google Scholar
David Ginsbourger, Janis Janusevskis, and Rodolphe Le Riche. 2011. Dealing with Asynchronicity in Parallel Gaussian Process based Global Optimization.Google Scholar
Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D Sculley. 2017. Google Vizier: A Service for Black-Box Optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1487--1495.Google ScholarDigital Library
Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren. 2019. Automated Machine Learning: Methods, Systems, Challenges. Springer Nature.Google Scholar
Kevin Jamieson and Ameet Talwalkar. 2016. Non-stochastic Best Arm Identification and Hyperparameter Optimization. In Artificial Intelligence and Statistics. 240--248.Google Scholar
Kirthevasan Kandasamy, Akshay Krishnamurthy, Jeff Schneider, and Barnabás Póczos. 2018. Parallelised Bayesian Optimisation via Thompson Sampling. In International Conference on Artificial Intelligence and Statistics. 133--142.Google Scholar
Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2017. Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization. The Journal of Machine Learning Research 18, 1 (2017), 6765--6816.Google ScholarDigital Library
Liam Li, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina, Moritz Hardt, Benjamin Recht, and Ameet Talwalkar. 2018. Massively Parallel Hyperparameter Tuning. arXiv preprint arXiv:1810.05934 (2018).Google Scholar
Richard Liaw, Eric Liang, Robert Nishihara, Philipp Moritz, Joseph E Gonzalez, and Ion Stoica. 2018. Tune: A Research Platform for Distributed Model Selection and Training. arXiv preprint arXiv:1807.05118 (2018).Google Scholar
Moritz Meister, Sina Sheikholeslami, Robin Andersson, Alexandru A Ormenisan, and Jim Dowling. 2020. Towards Distribution Transparency for Supervised ML With Oblivious Training Functions. In Workshop on MLOps Systems.Google Scholar
Richard Meyes, Melanie Lu, Constantin Waubert de Puiseau, and Tobias Meisen. 2019. Ablation Studies in Artificial Neural Networks. arXiv preprint arXiv:1901.08644 (2019).Google Scholar
Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I Jordan, et al. 2018. Ray: A Distributed Framework for Emerging AI Applications. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 561--577.Google ScholarDigital Library
Tom O'Malley, Elie Bursztein, James Long, François Chollet, Haifeng Jin, Luca Invernizzi, et al. 2019. Keras Tuner. Retrieved May 21, 2020 from https://github.com/keras-team/keras-tunerGoogle Scholar
Mohammad Pezeshki, Linxi Fan, Philemon Brakel, Aaron Courville, and Yoshua Bengio. 2016. Deconstructing the Ladder Network Architecture. In International Conference on Machine Learning. 2368--2376.Google Scholar
Lutz Prechelt. 1998. Early stopping-but when? In Neural Networks: Tricks of the trade. Springer, 55--69.Google Scholar
Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. 2015. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE 104, 1 (2015), 148--175.Google ScholarCross Ref
Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv preprint arXiv:1708.07747 (2017).Google Scholar
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Presented as part of the 9th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 12). 15--28.Google Scholar
Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, Ion Stoica, et al. 2010. Spark: Cluster computing with working sets. HotCloud 10, 10--10 (2010), 95.Google ScholarDigital Library

Index Terms

Maggy: Scalable Asynchronous Parallel Hyperparameter Search
1. Computing methodologies

Recommendations

Anatomy of machine learning algorithm implementations in MPI, Spark, and Flink

With the ever-increasing need to analyze large amounts of data to get useful insights, it is essential to develop complex parallel machine learning algorithms that can scale with data and number of parallel processes. These algorithms need to run on ...
Read More
Heterogeneous Big Data Parallel Computing Optimization Model using MPI/OpenMP Hybrid and Sensor Networks
For the heterogeneous big data parallel computing model, two levels of parallelism between nodes are not considered, resulting in low efficiency of heterogeneous big data parallel computing and bandwidth to send and receive information, high communication ...
Read More
Challenges for MapReduce in Big Data
SERVICES '14: Proceedings of the 2014 IEEE World Congress on Services

In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

DistributedML'20: Proceedings of the 1st Workshop on Distributed Machine Learning
December 2020
46 pages
ISBN:9781450381826
DOI:10.1145/3426745

Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 December 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Asynchronous Hyperparameter Optimization
Machine Learning
Scalable Hyperparameter Search
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate5of10submissions,50%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 167
  Total Downloads
- Downloads (Last 12 months)26
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Maggy: Scalable Asynchronous Parallel Hyperparameter Search

DistributedML'20: Proceedings of the 1st Workshop on Distributed Machine Learning

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Anatomy of machine learning algorithm implementations in MPI, Spark, and Flink

Heterogeneous Big Data Parallel Computing Optimization Model using MPI/OpenMP Hybrid and Sensor Networks

Challenges for MapReduce in Big Data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Maggy: Scalable Asynchronous Parallel Hyperparameter Search

DistributedML'20: Proceedings of the 1st Workshop on Distributed Machine Learning

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Anatomy of machine learning algorithm implementations in MPI, Spark, and Flink

Heterogeneous Big Data Parallel Computing Optimization Model using MPI/OpenMP Hybrid and Sensor Networks

Challenges for MapReduce in Big Data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media