research-article

FLASH: Fast Bayesian Optimization for Data Analytic Pipelines

Authors:

Mohammad Taha Bahadori,

Jimeng SunAuthors Info & Claims

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 2065 - 2074

https://doi.org/10.1145/2939672.2939829

Published: 13 August 2016 Publication History

Abstract

Modern data science relies on data analytic pipelines to organize interdependent computational steps. Such analytic pipelines often involve different algorithms across multiple steps, each with its own hyperparameters. To achieve the best performance, it is often critical to select optimal algorithms and to set appropriate hyperparameters, which requires large computational efforts. Bayesian optimization provides a principled way for searching optimal hyperparameters for a single algorithm. However, many challenges remain in solving pipeline optimization problems with high-dimensional and highly conditional search space. In this work, we propose Fast LineAr SearcH (FLASH), an efficient method for tuning analytic pipelines. FLASH is a two-layer Bayesian optimization framework, which firstly uses a parametric model to select promising algorithms, then computes a nonparametric model to fine-tune hyperparameters of the promising algorithms. FLASH also includes an effective caching algorithm which can further accelerate the search process. Extensive experiments on a number of benchmark datasets have demonstrated that FLASH significantly outperforms previous state-of-the-art methods in both search speed and accuracy. Using 50% of the time budget, FLASH achieves up to 20% improvement on test error rate compared to the baselines. FLASH also yields state-of-the-art performance on a real-world application for healthcare predictive modeling.

References

[1]

A. Atkinson and A. Donev. Optimum experimental designs. 1992.

[2]

T. Bäck. Evolutionary algorithms in theory and practice. 1996.

Digital Library

[3]

J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. JMLR, 13(1), 2012.

Digital Library

[4]

J. Bergstra, D. Yamins, and D. Cox. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In ICML, 2013.

Digital Library

[5]

J. S. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl. Algorithms for hyper-parameter optimization. In NIPS, 2011.

Digital Library

[6]

S. Boyd and L. Vandenberghe. Convex optimization. Cambridge university press, 2004.

[7]

E. Brochu, V. M. Cora, and N. De Freitas. A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599, 2010.

[8]

G. Calinescu, C. Chekuri, M. Pál, and J. Vondrák. Maximizing a monotone submodular function subject to a matroid constraint. SIAM Journal on Computing, 40(6), 2011.

Digital Library

[9]

R. Chen, H. Su, Y. Zhen, M. Khalilia, D. Hirsch, M. Thompson, T. Davis, Y. Peng, S. Lin, J. Tejedor-Sojo, E. Searles, and J. Sun. Cloud-based predictive modeling system and its application to asthma readmission prediction. In AMIA. AMIA, 2015.

[10]

S. J. Coakes and L. Steed. SPSS: Analysis without anguish using SPSS version 14.0 for Windows. John Wiley & Sons, Inc., 2009.

Digital Library

[11]

D. Donoho. 50 years of Data Science. Technical report, University of California Berkeley, 2015.

[12]

K. Eggensperger, M. Feurer, F. Hutter, J. Bergstra, J. Snoek, H. Hoos, and K. Leyton-Brown. Towards an empirical foundation for assessing bayesian optimization of hyperparameters. In NIPS workshop on Bayesian Optimization in Theory and Practice, 2013.

[13]

M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter. Efficient and robust automated machine learning. In NIPS, 2015.

Digital Library

[14]

M. Feurer, T. Springenberg, and F. Hutter. Initializing bayesian hyperparameter optimization via meta-learning. In AAAI, 2015.

Digital Library

[15]

P. Flaherty, A. Arkin, and M. I. Jordan. Robust design of biological experiments. In NIPS, 2005.

Digital Library

[16]

C. J. Flynn, C. M. Hurvich, and J. S. Simonoff. Efficiency for regularization parameter selection in penalized likelihood estimation of misspecified models. JASA, 108(503), 2013.

[17]

M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), 2009.

Digital Library

[18]

X. He. Laplacian regularized d-optimal design for active learning and its application to image retrieval. Image Processing, IEEE Transactions on, 19(1), 2010.

Digital Library

[19]

J. C. Ho, J. Ghosh, and J. Sun. Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization. In KDD, 2014.

Digital Library

[20]

M. D. Hoffman, E. Brochu, and N. de Freitas. Portfolio allocation for bayesian optimization. In UAI. Citeseer, 2011.

[21]

M. D. Hoffman, B. Shahriari, and N. de Freitas. On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning. In AISTATS, 2014.

[22]

F. Hutter, H. H. Hoos, and K. Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In Learning and Intelligent Optimization. Springer, 2011.

Digital Library

[23]

M. Jaggi. Revisiting frank-wolfe: Projection-free sparse convex optimization. In ICML, 2013.

Digital Library

[24]

B. Komer, J. Bergstra, and C. Eliasmith. Hyperoptsklearn: Automatic hyperparameter configuration for scikitlearn. In ICML workshop on AutoML, 2014.

[25]

A. Krause and C. Guestrin. Submodularity and its applications in optimized information gathering. TIST, 2(4), 2011.

Digital Library

[26]

A. Kumar, R. McCann, J. Naughton, and J. M. Patel. Model selection management systems: The next frontier of advanced analytics. ACM SIGMOD Record, 2015.

Digital Library

[27]

D. J. Lizotte. Practical bayesian optimization. University of Alberta, 2008.

Digital Library

[28]

J. Lv and J. S. Liu. Model selection principles in misspecified models. JRSS-B, 76(1), 2014.

[29]

K. Malhotra, T. Hobson, S. Valkova, L. Pullum, and A. Ramanathan. Sequential pattern mining of electronic healthcare reimbursement claims: Experiences and challenges in uncovering how patients are treated by physicians. In Big Data, Oct 2015.

Digital Library

[30]

X. Meng, J. Bradley, E. Sparks, and S. Venkataraman. Ml pipelines: a new high-level api for mllib, 2015.

[31]

I. Mierswa, M. Wurst, R. Klinkenberg, M. Scholz, and T. Euler. Yale: Rapid prototyping for complex data mining tasks. In KDD, 2006.

Digital Library

[32]

J. Mockus, V. Tiesis, and A. Zilinskas. The application of bayesian methods for seeking the extremum. Towards Global Optimization, 2(117--129), 1978.

[33]

R. Munos. Optimistic optimization of deterministic functions without the knowledge of its smoothness. In Advances in neural information processing systems, 2011.

[34]

Y. Nesterov. Gradient methods for minimizing composite objective function, 2007.

[35]

K. Ng, A. Ghoting, S. R. Steinhubl, W. F. Stewart, B. Malin, and J. Sun. Paramo: A parallel predictive modeling platform for healthcare analytic research using electronic health records. Journal of biomedical informatics, 48, 2014.

Digital Library

[36]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al. Scikit-learn: Machine learning in python. JMLR, 12, 2011.

Digital Library

[37]

T. Robertazzi and S. Schwartz. An accelerated sequential algorithm for producing d-optimal designs. SIAM Journal on Scientific and Statistical Computing, 10(2), 1989.

Digital Library

[38]

B. Settles. Active Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 6(1), jun 2012.

[39]

B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. de Freitas. Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE, 104(1):148--175, 2016.

[40]

M. Shamaiah, S. Banerjee, and H. Vikalo. Greedy sensor selection: Leveraging submodularity. In CDC, 2010.

[41]

J. Snoek, H. Larochelle, and R. P. Adams. Practical bayesian optimization of machine learning algorithms. In NIPS, 2012.

Digital Library

[42]

J. Snoek, O. Rippel, K. Swersky, R. Kiros, N. Satish, N. Sundaram, M. Patwary, M. Ali, R. P. Adams, et al. Scalable bayesian optimization using deep neural networks. arXiv preprint arXiv:1502.05700, 2015.

[43]

N. Srinivas, A. Krause, S. M. Kakade, and M. Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. arXiv preprint arXiv:0912.3995, 2009.

[44]

C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown. Auto-weka: Combined selection and hyperparameter optimization of classification algorithms. In KDD, 2013.

Digital Library

[45]

J. Villemonteix, E. Vazquez, and E. Walter. An informational approach to the global optimization of expensive-to-evaluate functions. Journal of Global Optimization, 44(4), 2009.

Digital Library

[46]

Z. Wang, M. Zoghi, F. Hutter, D. Matheson, and N. De Freitas. Bayesian optimization in high dimensions via random embeddings. In IJCAI. Citeseer, 2013.

Digital Library

[47]

L. Wasserman. All of nonparametric statistics. Springer Science & Business Media, 2006.

Digital Library

[48]

H. White. Maximum likelihood estimation of misspecified models. Econometrica, 1982.

Cited By

Kampezidou STikayat Ray ABhat APinon Fischer OMavris D(2024)Fundamental Components and Principles of Supervised Machine Learning Workflows with Numerical and Categorical DataEng10.3390/eng50100215:1(384-416)Online publication date: 29-Feb-2024
https://doi.org/10.3390/eng5010021
Masouros DRetsinas GXydis SSoudris D(2024) Sparkle: Deep Learning Driven Autotuning for Taming High-Dimensionality of Spark Deployments IEEE Transactions on Cloud Computing10.1109/TCC.2024.343748412:4(1058-1073)Online publication date: Oct-2024
https://doi.org/10.1109/TCC.2024.3437484
Kim HKim YYang H(2024)Platform Design for Privacy-Preserving Federated Learning using Homomorphic Encryption : Wild-and-Crazy-Idea Paper2024 Forum on Specification & Design Languages (FDL)10.1109/FDL63219.2024.10673864(1-5)Online publication date: 4-Sep-2024
https://doi.org/10.1109/FDL63219.2024.10673864
Show More Cited By

Index Terms

FLASH: Fast Bayesian Optimization for Data Analytic Pipelines
1. Applied computing
  1. Life and medical sciences
    1. Health informatics
2. Information systems
  1. Information systems applications
    1. Data mining
    2. Decision support systems
      1. Data analytics

Recommendations

Accounting for Gaussian Process Imprecision in Bayesian Optimization
Integrated Uncertainty in Knowledge Modelling and Decision Making
Abstract
Bayesian optimization (BO) with Gaussian processes (GP) as surrogate models is widely used to optimize analytically unknown and expensive-to-evaluate functions. In this paper, we propose Prior-mean-RObust Bayesian Optimization (PROBO) that ...
TurBO: A cost-efficient configuration-based auto-tuning approach for cluster-based big data frameworks
Abstract
Big data processing frameworks such as Spark usually provide a large number of performance-related configuration parameters, how to auto-tune these parameters for a better performance has been a hot issue in academia as well as industry for ...
Highlights
- Analyze the limitations of vanilla Bayesian optimization considering tuning cost.
- Implement a cost-efficient configuration auto-tuning approach named TurBO.
- TurBO reduces iteration times via an adaptive pseudo point mechanism with ...
Imprecise Bayesian optimization
Abstract
Bayesian optimization (BO) with Gaussian processes (GPs) surrogate models is widely used to optimize analytically unknown and expensive-to-evaluate functions. In this paper, we propose a robust version of BO grounded in the theory of imprecise ...
Highlights
- We study the effect of Gaussian process misspecification on Bayesian optimization (BO).
- Prior mean parameters are found to have the highest impact on BO’s convergence.
- We prove that prior mean parameter misspecification leads to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 2016

2176 pages

ISBN:9781450342322

DOI:10.1145/2939672

General Chairs:
Balaji Krishnapuram
IBM
,
Mohak Shah
Bosch
,
Program Chairs:
Alex Smola
Amazon
,
Charu Aggarwal
IBM
,
Dou Shen
Baidu
,
Rajeev Rastogi
Amazon

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '16

Sponsor:

KDD '16: The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 13 - 17, 2016

California, San Francisco, USA

Acceptance Rates

KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
301
Total Downloads

Downloads (Last 12 months)26
Downloads (Last 6 weeks)2

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kampezidou STikayat Ray ABhat APinon Fischer OMavris D(2024)Fundamental Components and Principles of Supervised Machine Learning Workflows with Numerical and Categorical DataEng10.3390/eng50100215:1(384-416)Online publication date: 29-Feb-2024
https://doi.org/10.3390/eng5010021
Masouros DRetsinas GXydis SSoudris D(2024) Sparkle: Deep Learning Driven Autotuning for Taming High-Dimensionality of Spark Deployments IEEE Transactions on Cloud Computing10.1109/TCC.2024.343748412:4(1058-1073)Online publication date: Oct-2024
https://doi.org/10.1109/TCC.2024.3437484
Kim HKim YYang H(2024)Platform Design for Privacy-Preserving Federated Learning using Homomorphic Encryption : Wild-and-Crazy-Idea Paper2024 Forum on Specification & Design Languages (FDL)10.1109/FDL63219.2024.10673864(1-5)Online publication date: 4-Sep-2024
https://doi.org/10.1109/FDL63219.2024.10673864
Barbudo RRamírez ARomero J(2024)Grammar-based evolutionary approach for automated workflow composition with domain-specific operators and ensemble diversityApplied Soft Computing10.1016/j.asoc.2024.111292153(111292)Online publication date: Mar-2024
https://doi.org/10.1016/j.asoc.2024.111292
Siemenn ARen ZLi QBuonassisi T(2023)Fast Bayesian optimization of Needle-in-a-Haystack problems using zooming memory-based initialization (ZoMBI)npj Computational Materials10.1038/s41524-023-01048-x9:1Online publication date: 26-May-2023
https://doi.org/10.1038/s41524-023-01048-x
Morishita TKaneko H(2022)Initial Sample Selection in Bayesian Optimization for Combinatorial Optimization of Chemical CompoundsACS Omega10.1021/acsomega.2c051458:2(2001-2009)Online publication date: 30-Dec-2022
https://doi.org/10.1021/acsomega.2c05145
Mustafa ARahimi Azghadi M(2021)Automated Machine Learning for Healthcare and Clinical Notes AnalysisComputers10.3390/computers1002002410:2(24)Online publication date: 22-Feb-2021
https://doi.org/10.3390/computers10020024
Zöller MHuber M(2021)Benchmark and Survey of Automated Machine Learning FrameworksJournal of Artificial Intelligence Research10.1613/jair.1.1185470(409-472)Online publication date: 1-May-2021
https://dl.acm.org/doi/10.1613/jair.1.11854
Ooms RSpruit M(2020)Self-Service Data Science in Healthcare with Automated Machine LearningApplied Sciences10.3390/app1009299210:9(2992)Online publication date: 25-Apr-2020
https://doi.org/10.3390/app10092992
ALTUN STALU M(2020)Derin Sinir Ağları için Hiperparametre Metodlarının ve Kitlerinin İncelenmesiDÜMF Mühendislik Dergisi10.24012/dumf.767700Online publication date: 2-Dec-2020
https://doi.org/10.24012/dumf.767700
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten