ATConf: auto-tuning high dimensional configuration parameters for big data processing frameworks

Dou, Hui; Wang, Kang; Zhang, Yiwen; Chen, Pengfei

doi:10.1007/s10586-022-03767-0

ATConf: auto-tuning high dimensional configuration parameters for big data processing frameworks

Published: 14 October 2022

Volume 26, pages 2737–2755, (2023)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Hui Dou¹,
Kang Wang¹,
Yiwen Zhang ORCID: orcid.org/0000-0001-8709-1088¹ &
…
Pengfei Chen²

335 Accesses
Explore all metrics

Abstract

To support various application scenarios, big data processing frameworks (BDPFs) such as Spark usually provide users with a large number of performance-critical configuration parameters. Since manually configuring is both labor-intensive and time-consuming, automatically tuning configurations parameters for BDPFs to achieve better performance has been an urgent need. To simultaneously address the corresponding challenges such as high dimensional configuration space, we propose ATConf-a new black-box approach of automatically tuning the internal and external configuration parameters for BDPFs. Experimental results based on our local distributed Spark cluster show that the best execution time achieved by ATConf is as much as 46.52% less than the default configuration. Besides, compared with the four baselines, ATConf is able to further reduce the relative execution time over default by at least 4.10% under the same constraint of observation times.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big data analytics on Apache Spark

Article 13 October 2016

Big data analytics: a survey

Article Open access 01 October 2015

Big data preprocessing: methods and prospects

Article Open access 01 November 2016

Data availability

The data used to support the findings of this study are available from the corresponding author upon request.

References

Herodotou, H., Chen, Y., Jiaheng, L.: A survey on automatic parameter tuning for big data processing systems. ACM Comput. Surv. (CSUR) 53(2), 1–37 (2020)
Article Google Scholar
Zhu, Y., Liu, J., Guo, M., Bao, Y., Ma, W., Liu, Z., Song, K., Yang, Y.: Bestconfig: tapping the performance potential of systems via automatic configuration tuning. In: Proceedings of the 2017 symposium on cloud computing, pp. 338–350 (2017)
Van Aken, D., Pavlo, A., Gordon, G.J., Zhang, B.: Automatic database management system tuning through large-scale machine learning. In: Proceedings of the 2017 ACM international conference on management of data, pp 1009–1024 (2017)
Yu, Z., Bei, Z., Qian, X.: Datasize-aware high dimensional configurations auto-tuning of in-memory cluster computing. In: Proceedings of the twenty-third international conference on architectural support for programming languages and operating systems, pp. 564–577 (2018)
Zhang, J., Liu, Y., Zhou, K., Li, G., Xiao, Z., Cheng, B., Xing, J., Wang, Y., Cheng, T., et al.: An end-to-end automatic cloud database tuning system using deep reinforcement learning. In: Proceedings of the 2019 international conference on management of data, pp. 415–432 (2019)
Li, M., Liu, Z., Shi, X., Jin, H.: Atcs: auto-tuning configurations of big data frameworks based on generative adversarial nets. IEEE Access 8, 50485–50496 (2020)
Article Google Scholar
Tooley, R.: Auto-tuning spark with bayesian optimisation (2021)
Xiong, W., Bei, Z., Chengzhong, X., Zhibin, Yu.: Ath: auto-tuning hbase’s configuration via ensemble learning. IEEE Access 5, 13157–13170 (2017)
Article Google Scholar
Bei, Z., Zhibin, Yu., Zhang, H., Xiong, W., Chengzhong, X., Eeckhout, L., Feng, S.: Rfhoc: a random-forest approach to auto-tuning hadoop’s configuration. IEEE Trans. Parallel Distrib. Syst. 27(5), 1470–1483 (2015)
Article Google Scholar
Guo, Y., Shan, H., Huang, S., Hwang, K., Fan, J., Yu, Z.: Gml: Efficiently auto-tuning flink’s configurations via guided machine learning. In: Proceedings of the IEEE transactions on parallel and distributed systems (2021)
Jamshidi, P., Casale, G.: An uncertainty-aware approach to optimal configuration of stream processing systems. In: Proceedings of the 2016 IEEE 24th international symposium on modeling, analysis and simulation of computer and telecommunication systems (MASCOTS), pp. 39–48. IEEE (2016)
Spark configuration. http://spark.apache.org/docs/2.2.2/configuration.html. Accessed 8 July 2021
ZWang, Z., Zoghi, M., Hutter, F., Matheson, D., De Freitas, N.: Bayesian optimization in high dimensions via random embeddings. In: Proceedings of the twenty-third international joint conference on artificial intelligence (2013)
Wang, Z., Hutter, F., Zoghi, M., Matheson, D., de Feitas, N.: Bayesian optimization in a billion dimensions via random embeddings. J. Artif. Intell. Res. 55, 361–387 (2016)
Article MathSciNet MATH Google Scholar
Khan, M.M., Yu, W.: Robotune: high-dimensional configuration tuning for cluster-based data analytics. In: Proceedings of the 50th international conference on parallel processing, pp. 1–10 (2021)
Li, C., Gupta, S., Rana, S., Nguyen, V., Venkatesh, S., Shilton, A.: High dimensional bayesian optimization using dropout. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence, IJCAI-17, pp. 2096–2102 (2017)
Hibench. https://github.com/Intel-bigdata/HiBench. Accessed 8 June 2021
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(2), 281–305 (2012)
MathSciNet MATH Google Scholar
Dou, H., Chen, P., Zheng, Z.: Hdconfigor: automatically tuning high dimensional configuration parameters for log search engines. IEEE Access 8, 80638–80653 (2020)
Article Google Scholar
Atconf. https://cn.hk.uy/SrV
Bao, L., Liu, X., Xu, Z., Fang, B.: Autoconfig: automatic configuration tuning for distributed message systems. In: Proceedings of the 2018 33rd IEEE/ACM international conference on automated software engineering (ASE), pp. 29–40. IEEE (2018)
Wang, S., Li, C., Hoffmann, H., Lu, S., Sentosa, W., Kistijantoro, A.I.: Understanding and auto-adjusting performance-sensitive configurations. ACM SIGPLAN Not. 53(2), 154–168 (2018)
Article Google Scholar
Cereda, S., Valladares, S., Cremonesi, P., Doni, S.: Cgptuner: a contextual gaussian process bandit approach for the automatic tuning of it configurations under varying workload conditions. Proc. VLDB Endow. 14(8), 1401–1413 (2021)
Article Google Scholar
Zhang, X., Wu, H., Chang, Z., Jin, S., Tan, J., Li, F., Zhang, T., Cui, B.: Restune: resource oriented tuning boosted by meta-learning for cloud databases. In: Proceedings of the 2021 international conference on management of data, pp. 2102–2114 (2021)
Cao, Z., Kuenning, G., Zadok, E.: Carver: finding important parameters for storage system tuning. In: Proceedings of the 18th USENIX conference on file and storage technologies (FAST 20), pp. 43–57 (2020)
Kanellis, K., Alagappan, R., Venkataraman, S.: Too many knobs to tune? Towards faster database tuning by pre-selecting important knobs. In: Proceedings of the 12th USENIX workshop on hot topics in storage and file systems (HotStorage 20) (2020)
Heinze, T., Roediger, L., Meister, A., Ji, Y., Jerzak, Z., Fetzer, C.: Online parameter optimization for elastic data stream processing. In: Proceedings of the sixth ACM symposium on cloud computing, pp. 276–287 (2015)
Cao, P., Fan, Z., Gao, R.X., Tang, J.: Solving configuration optimization problem with multiple hard constraints: an enhanced multi-objective simulated annealing approach. http://arxiv.org/abs/1706.03141 (2017)
Zhang, J., Zhou, K., Li, G., Liu, Y., Xie, M., Cheng, B., Xing, J.: Cdbtune+: an efficient deep reinforcement learning-based automatic cloud database tuning system. VLDB J. 30, 1–29 (2021)
Article Google Scholar
Sagaama, H., Slimane, N.B., Marwani, M., Skhiri, S.: Automatic parameter tuning for big data pipelines with deep reinforcement learning. In: Proceedings of the 2021 IEEE symposium on computers and communications (ISCC), pp. 1–7 (2021)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Fekry, A., Carata, L., Pasquier, T., Rice, A.: Accelerating the configuration tuning of big data analytics with similarity-aware multitask bayesian optimization. In: Proceedings of the 2020 IEEE international conference on big data (Big Data), pp. 266–275. IEEE (2020)
Kandasamy, K., Schneider, J., Póczos, B.: High dimensional bayesian optimisation and bandits via additive models. In: Proceedings of the international conference on machine learning, pp. 295–304. PMLR (2015)
Li, C.L., Kandasamy, K., Póczos, B., Schneider, J.: High dimensional bayesian optimization via restricted projection pursuit models. In: Proceedings of the Artificial intelligence and statistics, pp. 884–892. PMLR (2016)
Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., De Freitas, N.: Taking the human out of the loop: a review of bayesian optimization. Proc. IEEE 104(1), 148–175 (2015)
Article Google Scholar
Ansible-playbook. https://docs.ansible.com/ansible/latest/cli/ansible-playbook.html. Accessed 8 June 2021
Cheng, G., Ying, S., Wang, B.: Tuning configuration of apache spark on public clouds by combining multi-objective optimization and performance prediction model. J. Syst. Softw. 180, 111028 (2021)
Article Google Scholar
Kim, S., Sim, A., Wu, K., Byna, S., Wang, T., Son, Y., Eom, H.: DCA-IO: a dynamic I/O control scheme for parallel and distributed file systems. In: Proceedings of the 19th IEEE/ACM international symposium on cluster, cloud and grid computing, CCGRID 2019, Larnaca, Cyprus, May 14–17, 2019, pp. 351–360. IEEE (2019)

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions. This work was supported by the National Natural Science Foundation of China under Grants 61902440, 61872002 and 61802448.

Funding

This work was supported by the National Natural Science Foundation of China under Grants 61902440, 61872002 and 61802448.

Author information

Authors and Affiliations

School of Computer Science and Technology, Anhui University, Hefei, 230601, Anhui, China
Hui Dou, Kang Wang & Yiwen Zhang
School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou, 510006, Guangdong, China
Pengfei Chen

Authors

Hui Dou
View author publications
You can also search for this author in PubMed Google Scholar
Kang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yiwen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Pengfei Chen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

HD contributed significantly to data analysis and manuscript written; KW performed most of the experiments and draw the figures; YZ and PC contributed a lot to the conception of the study.

Corresponding author

Correspondence to Yiwen Zhang.

Ethics declarations

Competing Interests

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Dou, H., Wang, K., Zhang, Y. et al. ATConf: auto-tuning high dimensional configuration parameters for big data processing frameworks. Cluster Comput 26, 2737–2755 (2023). https://doi.org/10.1007/s10586-022-03767-0

Download citation

Received: 16 March 2022
Revised: 29 July 2022
Accepted: 19 September 2022
Published: 14 October 2022
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10586-022-03767-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ATConf: auto-tuning high dimensional configuration parameters for big data processing frameworks

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Big data analytics: a survey

Big data preprocessing: methods and prospects

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

ATConf: auto-tuning high dimensional configuration parameters for big data processing frameworks

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Big data analytics: a survey

Big data preprocessing: methods and prospects

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation