skip to main content
10.1145/3426745.3431338acmconferencesArticle/Chapter ViewAbstractPublication PagesconextConference Proceedingsconference-collections
research-article

Maggy: Scalable Asynchronous Parallel Hyperparameter Search

Published:01 December 2020Publication History

ABSTRACT

Running extensive experiments is essential for building Machine Learning (ML) models. Such experiments usually require iterative execution of many trials with varying run times. In recent years, Apache Spark has become the de-facto standard for parallel data processing in the industry, in which iterative processes are implemented within the bulk-synchronous parallel (BSP) execution model. The BSP approach is also being used to parallelize ML trials in Spark. However, the BSP task synchronization barriers prevent asynchronous execution of trials, which leads to a reduced number of trials that can be run on a given computational budget. In this paper, we introduce Maggy, an open-source framework based on Spark, to execute ML trials asynchronously in parallel, with the ability to early stop poorly performing trials. In the experiments, we compare Maggy with the BSP execution of parallel trials in Spark and show that on random hyperparameter search on a convolutional neural network for the Fashion-MNIST dataset Maggy reduces the required time to execute a fixed number of trials by 33% to 58%, without any loss in the final model accuracy.

Skip Supplemental Material Section

Supplemental Material

3426745.3431338.mp4

mp4

38.9 MB

References

  1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ( {OSDI} 16). 265--283.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ahsan S Alvi, Binxin Ru, Jan Calliess, Stephen J Roberts, and Michael A Os-borne. 2019. Asynchronous Batch Bayesian Optimisation with Improved Local Penalisation. arXiv preprint arXiv:1901.10452 (2019).Google ScholarGoogle Scholar
  3. Bowen Baker, Otkrist Gupta, Ramesh Raskar, and Nikhil Naik. 2017. Practical Neural Network Performance Prediction for Early Stopping. arXiv preprint arXiv:1705.10823 2, 3 (2017), 6.Google ScholarGoogle Scholar
  4. James Bergstra, Daniel Yamins, and David Cox. 2012. Hyperopt: Distributed Asynchronous Hyper-parameter Optimization. Retrieved May 21, 2020 from http://hyperopt.github.io/hyperoptGoogle ScholarGoogle Scholar
  5. James Bergstra, Daniel Yamins, and David Cox. 2013. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. In International Conference on Machine Learning. 115--123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Franćois Chollet. 2020. Keras: The Next Five Years. Retrieved May 21, 2020 from https://www.youtube.com/watch?v=HBqCpWldPIIGoogle ScholarGoogle Scholar
  7. Databricks.2019. Scaling Hyperopt to Tune Machine Learning Models in Python. Retrieved Sep 18, 2020 from https://databricks.com/blog/2019/10/29/scaling-hyperopt-to-tune-machine-learning-models-in-python.htmlGoogle ScholarGoogle Scholar
  8. Stefan Falkner, Aaron Klein, and Frank Hutter. 2018. BOHB: Robust and Efficient Hyperparameter Optimization at Scale. arXiv preprint arXiv:1807.01774 (2018).Google ScholarGoogle Scholar
  9. David Ginsbourger, Janis Janusevskis, and Rodolphe Le Riche. 2011. Dealing with Asynchronicity in Parallel Gaussian Process based Global Optimization.Google ScholarGoogle Scholar
  10. Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D Sculley. 2017. Google Vizier: A Service for Black-Box Optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1487--1495.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren. 2019. Automated Machine Learning: Methods, Systems, Challenges. Springer Nature.Google ScholarGoogle Scholar
  12. Kevin Jamieson and Ameet Talwalkar. 2016. Non-stochastic Best Arm Identification and Hyperparameter Optimization. In Artificial Intelligence and Statistics. 240--248.Google ScholarGoogle Scholar
  13. Kirthevasan Kandasamy, Akshay Krishnamurthy, Jeff Schneider, and Barnabás Póczos. 2018. Parallelised Bayesian Optimisation via Thompson Sampling. In International Conference on Artificial Intelligence and Statistics. 133--142.Google ScholarGoogle Scholar
  14. Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2017. Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization. The Journal of Machine Learning Research 18, 1 (2017), 6765--6816.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Liam Li, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina, Moritz Hardt, Benjamin Recht, and Ameet Talwalkar. 2018. Massively Parallel Hyperparameter Tuning. arXiv preprint arXiv:1810.05934 (2018).Google ScholarGoogle Scholar
  16. Richard Liaw, Eric Liang, Robert Nishihara, Philipp Moritz, Joseph E Gonzalez, and Ion Stoica. 2018. Tune: A Research Platform for Distributed Model Selection and Training. arXiv preprint arXiv:1807.05118 (2018).Google ScholarGoogle Scholar
  17. Moritz Meister, Sina Sheikholeslami, Robin Andersson, Alexandru A Ormenisan, and Jim Dowling. 2020. Towards Distribution Transparency for Supervised ML With Oblivious Training Functions. In Workshop on MLOps Systems.Google ScholarGoogle Scholar
  18. Richard Meyes, Melanie Lu, Constantin Waubert de Puiseau, and Tobias Meisen. 2019. Ablation Studies in Artificial Neural Networks. arXiv preprint arXiv:1901.08644 (2019).Google ScholarGoogle Scholar
  19. Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I Jordan, et al. 2018. Ray: A Distributed Framework for Emerging AI Applications. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 561--577.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Tom O'Malley, Elie Bursztein, James Long, François Chollet, Haifeng Jin, Luca Invernizzi, et al. 2019. Keras Tuner. Retrieved May 21, 2020 from https://github.com/keras-team/keras-tunerGoogle ScholarGoogle Scholar
  21. Mohammad Pezeshki, Linxi Fan, Philemon Brakel, Aaron Courville, and Yoshua Bengio. 2016. Deconstructing the Ladder Network Architecture. In International Conference on Machine Learning. 2368--2376.Google ScholarGoogle Scholar
  22. Lutz Prechelt. 1998. Early stopping-but when? In Neural Networks: Tricks of the trade. Springer, 55--69.Google ScholarGoogle Scholar
  23. Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. 2015. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE 104, 1 (2015), 148--175.Google ScholarGoogle ScholarCross RefCross Ref
  24. Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv preprint arXiv:1708.07747 (2017).Google ScholarGoogle Scholar
  25. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Presented as part of the 9th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 12). 15--28.Google ScholarGoogle Scholar
  26. Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, Ion Stoica, et al. 2010. Spark: Cluster computing with working sets. HotCloud 10, 10--10 (2010), 95.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Maggy: Scalable Asynchronous Parallel Hyperparameter Search

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          DistributedML'20: Proceedings of the 1st Workshop on Distributed Machine Learning
          December 2020
          46 pages
          ISBN:9781450381826
          DOI:10.1145/3426745

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 December 2020

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate5of10submissions,50%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader