skip to main content
10.1145/2939672.2939829acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

FLASH: Fast Bayesian Optimization for Data Analytic Pipelines

Published: 13 August 2016 Publication History

Abstract

Modern data science relies on data analytic pipelines to organize interdependent computational steps. Such analytic pipelines often involve different algorithms across multiple steps, each with its own hyperparameters. To achieve the best performance, it is often critical to select optimal algorithms and to set appropriate hyperparameters, which requires large computational efforts. Bayesian optimization provides a principled way for searching optimal hyperparameters for a single algorithm. However, many challenges remain in solving pipeline optimization problems with high-dimensional and highly conditional search space. In this work, we propose Fast LineAr SearcH (FLASH), an efficient method for tuning analytic pipelines. FLASH is a two-layer Bayesian optimization framework, which firstly uses a parametric model to select promising algorithms, then computes a nonparametric model to fine-tune hyperparameters of the promising algorithms. FLASH also includes an effective caching algorithm which can further accelerate the search process. Extensive experiments on a number of benchmark datasets have demonstrated that FLASH significantly outperforms previous state-of-the-art methods in both search speed and accuracy. Using 50% of the time budget, FLASH achieves up to 20% improvement on test error rate compared to the baselines. FLASH also yields state-of-the-art performance on a real-world application for healthcare predictive modeling.

References

[1]
A. Atkinson and A. Donev. Optimum experimental designs. 1992.
[2]
T. Bäck. Evolutionary algorithms in theory and practice. 1996.
[3]
J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. JMLR, 13(1), 2012.
[4]
J. Bergstra, D. Yamins, and D. Cox. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In ICML, 2013.
[5]
J. S. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl. Algorithms for hyper-parameter optimization. In NIPS, 2011.
[6]
S. Boyd and L. Vandenberghe. Convex optimization. Cambridge university press, 2004.
[7]
E. Brochu, V. M. Cora, and N. De Freitas. A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599, 2010.
[8]
G. Calinescu, C. Chekuri, M. Pál, and J. Vondrák. Maximizing a monotone submodular function subject to a matroid constraint. SIAM Journal on Computing, 40(6), 2011.
[9]
R. Chen, H. Su, Y. Zhen, M. Khalilia, D. Hirsch, M. Thompson, T. Davis, Y. Peng, S. Lin, J. Tejedor-Sojo, E. Searles, and J. Sun. Cloud-based predictive modeling system and its application to asthma readmission prediction. In AMIA. AMIA, 2015.
[10]
S. J. Coakes and L. Steed. SPSS: Analysis without anguish using SPSS version 14.0 for Windows. John Wiley & Sons, Inc., 2009.
[11]
D. Donoho. 50 years of Data Science. Technical report, University of California Berkeley, 2015.
[12]
K. Eggensperger, M. Feurer, F. Hutter, J. Bergstra, J. Snoek, H. Hoos, and K. Leyton-Brown. Towards an empirical foundation for assessing bayesian optimization of hyperparameters. In NIPS workshop on Bayesian Optimization in Theory and Practice, 2013.
[13]
M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter. Efficient and robust automated machine learning. In NIPS, 2015.
[14]
M. Feurer, T. Springenberg, and F. Hutter. Initializing bayesian hyperparameter optimization via meta-learning. In AAAI, 2015.
[15]
P. Flaherty, A. Arkin, and M. I. Jordan. Robust design of biological experiments. In NIPS, 2005.
[16]
C. J. Flynn, C. M. Hurvich, and J. S. Simonoff. Efficiency for regularization parameter selection in penalized likelihood estimation of misspecified models. JASA, 108(503), 2013.
[17]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), 2009.
[18]
X. He. Laplacian regularized d-optimal design for active learning and its application to image retrieval. Image Processing, IEEE Transactions on, 19(1), 2010.
[19]
J. C. Ho, J. Ghosh, and J. Sun. Marble: high-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization. In KDD, 2014.
[20]
M. D. Hoffman, E. Brochu, and N. de Freitas. Portfolio allocation for bayesian optimization. In UAI. Citeseer, 2011.
[21]
M. D. Hoffman, B. Shahriari, and N. de Freitas. On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning. In AISTATS, 2014.
[22]
F. Hutter, H. H. Hoos, and K. Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In Learning and Intelligent Optimization. Springer, 2011.
[23]
M. Jaggi. Revisiting frank-wolfe: Projection-free sparse convex optimization. In ICML, 2013.
[24]
B. Komer, J. Bergstra, and C. Eliasmith. Hyperoptsklearn: Automatic hyperparameter configuration for scikitlearn. In ICML workshop on AutoML, 2014.
[25]
A. Krause and C. Guestrin. Submodularity and its applications in optimized information gathering. TIST, 2(4), 2011.
[26]
A. Kumar, R. McCann, J. Naughton, and J. M. Patel. Model selection management systems: The next frontier of advanced analytics. ACM SIGMOD Record, 2015.
[27]
D. J. Lizotte. Practical bayesian optimization. University of Alberta, 2008.
[28]
J. Lv and J. S. Liu. Model selection principles in misspecified models. JRSS-B, 76(1), 2014.
[29]
K. Malhotra, T. Hobson, S. Valkova, L. Pullum, and A. Ramanathan. Sequential pattern mining of electronic healthcare reimbursement claims: Experiences and challenges in uncovering how patients are treated by physicians. In Big Data, Oct 2015.
[30]
X. Meng, J. Bradley, E. Sparks, and S. Venkataraman. Ml pipelines: a new high-level api for mllib, 2015.
[31]
I. Mierswa, M. Wurst, R. Klinkenberg, M. Scholz, and T. Euler. Yale: Rapid prototyping for complex data mining tasks. In KDD, 2006.
[32]
J. Mockus, V. Tiesis, and A. Zilinskas. The application of bayesian methods for seeking the extremum. Towards Global Optimization, 2(117--129), 1978.
[33]
R. Munos. Optimistic optimization of deterministic functions without the knowledge of its smoothness. In Advances in neural information processing systems, 2011.
[34]
Y. Nesterov. Gradient methods for minimizing composite objective function, 2007.
[35]
K. Ng, A. Ghoting, S. R. Steinhubl, W. F. Stewart, B. Malin, and J. Sun. Paramo: A parallel predictive modeling platform for healthcare analytic research using electronic health records. Journal of biomedical informatics, 48, 2014.
[36]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al. Scikit-learn: Machine learning in python. JMLR, 12, 2011.
[37]
T. Robertazzi and S. Schwartz. An accelerated sequential algorithm for producing d-optimal designs. SIAM Journal on Scientific and Statistical Computing, 10(2), 1989.
[38]
B. Settles. Active Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 6(1), jun 2012.
[39]
B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. de Freitas. Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE, 104(1):148--175, 2016.
[40]
M. Shamaiah, S. Banerjee, and H. Vikalo. Greedy sensor selection: Leveraging submodularity. In CDC, 2010.
[41]
J. Snoek, H. Larochelle, and R. P. Adams. Practical bayesian optimization of machine learning algorithms. In NIPS, 2012.
[42]
J. Snoek, O. Rippel, K. Swersky, R. Kiros, N. Satish, N. Sundaram, M. Patwary, M. Ali, R. P. Adams, et al. Scalable bayesian optimization using deep neural networks. arXiv preprint arXiv:1502.05700, 2015.
[43]
N. Srinivas, A. Krause, S. M. Kakade, and M. Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. arXiv preprint arXiv:0912.3995, 2009.
[44]
C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown. Auto-weka: Combined selection and hyperparameter optimization of classification algorithms. In KDD, 2013.
[45]
J. Villemonteix, E. Vazquez, and E. Walter. An informational approach to the global optimization of expensive-to-evaluate functions. Journal of Global Optimization, 44(4), 2009.
[46]
Z. Wang, M. Zoghi, F. Hutter, D. Matheson, and N. De Freitas. Bayesian optimization in high dimensions via random embeddings. In IJCAI. Citeseer, 2013.
[47]
L. Wasserman. All of nonparametric statistics. Springer Science & Business Media, 2006.
[48]
H. White. Maximum likelihood estimation of misspecified models. Econometrica, 1982.

Cited By

View all
  • (2024)Fundamental Components and Principles of Supervised Machine Learning Workflows with Numerical and Categorical DataEng10.3390/eng50100215:1(384-416)Online publication date: 29-Feb-2024
  • (2024) Sparkle: Deep Learning Driven Autotuning for Taming High-Dimensionality of Spark Deployments IEEE Transactions on Cloud Computing10.1109/TCC.2024.343748412:4(1058-1073)Online publication date: Oct-2024
  • (2024)Platform Design for Privacy-Preserving Federated Learning using Homomorphic Encryption : Wild-and-Crazy-Idea Paper2024 Forum on Specification & Design Languages (FDL)10.1109/FDL63219.2024.10673864(1-5)Online publication date: 4-Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2016
2176 pages
ISBN:9781450342322
DOI:10.1145/2939672
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. automated hyperparameter tuning
  2. bayesian optimization
  3. data analytic pipeline
  4. health analytics

Qualifiers

  • Research-article

Conference

KDD '16
Sponsor:

Acceptance Rates

KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)2
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Fundamental Components and Principles of Supervised Machine Learning Workflows with Numerical and Categorical DataEng10.3390/eng50100215:1(384-416)Online publication date: 29-Feb-2024
  • (2024) Sparkle: Deep Learning Driven Autotuning for Taming High-Dimensionality of Spark Deployments IEEE Transactions on Cloud Computing10.1109/TCC.2024.343748412:4(1058-1073)Online publication date: Oct-2024
  • (2024)Platform Design for Privacy-Preserving Federated Learning using Homomorphic Encryption : Wild-and-Crazy-Idea Paper2024 Forum on Specification & Design Languages (FDL)10.1109/FDL63219.2024.10673864(1-5)Online publication date: 4-Sep-2024
  • (2024)Grammar-based evolutionary approach for automated workflow composition with domain-specific operators and ensemble diversityApplied Soft Computing10.1016/j.asoc.2024.111292153(111292)Online publication date: Mar-2024
  • (2023)Fast Bayesian optimization of Needle-in-a-Haystack problems using zooming memory-based initialization (ZoMBI)npj Computational Materials10.1038/s41524-023-01048-x9:1Online publication date: 26-May-2023
  • (2022)Initial Sample Selection in Bayesian Optimization for Combinatorial Optimization of Chemical CompoundsACS Omega10.1021/acsomega.2c051458:2(2001-2009)Online publication date: 30-Dec-2022
  • (2021)Automated Machine Learning for Healthcare and Clinical Notes AnalysisComputers10.3390/computers1002002410:2(24)Online publication date: 22-Feb-2021
  • (2021)Benchmark and Survey of Automated Machine Learning FrameworksJournal of Artificial Intelligence Research10.1613/jair.1.1185470(409-472)Online publication date: 1-May-2021
  • (2020)Self-Service Data Science in Healthcare with Automated Machine LearningApplied Sciences10.3390/app1009299210:9(2992)Online publication date: 25-Apr-2020
  • (2020)Derin Sinir Ağları için Hiperparametre Metodlarının ve Kitlerinin İncelenmesiDÜMF Mühendislik Dergisi10.24012/dumf.767700Online publication date: 2-Dec-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media