Abstract
Big data analytics applications are increasingly deployed on cloud computing infrastructures, and it is still a big challenge to pick the optimal cloud configurations in a cost-effective way. In this paper, we address this problem with a high accuracy and a low overhead. We propose Apollo, a data-driven approach that can rapidly pick the optimal cloud configurations by reusing data from similar workloads. We first classify 12 typical workloads in BigDataBench by characterizing pairwise correlations in our offline benchmarks. When a new workload comes, we run it with several small datasets to rank its key characteristics and get its similar workloads. Based on the rank, we then limit the search space of cloud configurations through a classification mechanism. At last, we leverage a hierarchical regression model to measure which cluster is more suitable and use a local search strategy to pick the optimal cloud configurations in a few extra tests. Our evaluation on 12 typical workloads in HiBench shows that compared with state-of-the-art approaches, Apollo can improve up to 30% search accuracy, while reducing as much as 50% overhead for picking the optimal cloud configurations.
Similar content being viewed by others
References
Bilal M, Canini M, Rodrigues R. Finding the right cloud configuration for analytics clusters. In Proc. the 11th ACM Symposium on Cloud Computing, October 2020, pp.208-222. https://doi.org/10.1145/3419111.3421305.
Alipourfard O, Liu H H, Chen J, Venkataraman S, Yu M, Zhang M. Cherrypick: Adaptively unearthing the best cloud configurations for big data analytics. In Proc. the 14th USENIX Symposium on Networked Systems Design and Implementation, March 2017, pp.469-482.
Delimitrou C, Kozyrakis C. QoS-aware scheduling in heterogeneous datacenters with paragon. ACM Transactions on Computer Systems, 2013, 31(4): Article No. 12. https://doi.org/10.1145/2556583.
Venkataraman S, Yang Z, Franklin M, Recht B, Stoica I. Ernest: Efficient performance prediction for large-scale advanced analytics. In Proc. the 13th USENIX Symposium on Networked Systems Design and Implementation, March 2016, pp.363-378.
Hsu C J, Nair V, Freeh V W, Menzies T. Arrow: Low-level augmented Bayesian optimization for finding the best cloud VM. In Proc. the 38th IEEE International Conference on Distributed Computing Systems, July 2018, pp.660-670. https://doi.org/10.1109/ICDCS.2018.00070.
Wang H, Wang N, Yeung D Y. Collaborative deep learning for recommender systems. In Proc. the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2015, pp.1235-1244. https://doi.org/10.1145/2783258.2783273.
Abdi H. The Kendall rank correlation coefficient. In Encyclopedia of Measurement and Statistics, Salkind N J (ed.), SAGE, 2007, pp.508-510.
Leevy J L, Khoshgoftaar T M, Bauder R A, Seliya N. A survey on addressing high-class imbalance in big data. Journal of Big Data, 2018, 5(1): Article No. 42. https://doi.org/10.1186/s40537-018-0151-6.
Quinton C, Haderer N, Rouvoy R, Duchien L. Towards multi-cloud configurations using feature models and ontologies. In Proc. the 2013 International Workshop on Multi-Cloud Applications and Federated Clouds, April 2013, pp.21-26. https://doi.org/10.1145/2462326.2462332.
Herodotou H, Dong F, Babu S. No one (cluster) size fits all: Automatic cluster sizing for data-intensive analytics. In Proc. the 2nd ACM Symposium on Cloud Computing, October 2011, Article No. 18. https://doi.org/10.1145/2038916.2038934.
Jung G, Mukherjee T, Kunde S, Kim H, Sharma N, Goetz F. CloudAdvisor: A recommendation-as-a-service platform for cloud configuration and pricing. In Proc. the 9th IEEE World Congress on Services, June 28-July 3, 2013, pp.456-463. https://doi.org/10.1109/SERVICES.2013.55.
Grandl R, Chowdhury M, Akella A, Ananthanarayanan G. Altruistic scheduling in multi-resource clusters. In Proc. the 12th USENIX Symposium on Operating Systems Design and Implementation, November 2016, pp.65-80.
Wang L, Zhan J, Luo C, Zhu Y, Yang Q, He Y, Gao W, Jia Z, Shi Y, Zhang S. BigDataBench: A big data benchmark suite from Internet services. In Proc. the 20th IEEE International Symposium on High Performance Computer Architecture, Feb. 2014, pp.488-499. https://doi.org/10.1109/HPCA.2014.6835958.
Yadwadkar N J, Hariharan B, Gonzalez J E, Katz R. Multitask learning for straggler avoiding predictive job scheduling. The Journal of Machine Learning Research, 2016, 17(106): 1-37.
Zhang Z, Cherkasova L, Verma A, Loo B T. Automated profiling and resource management of pig programs for meeting service level objectives. In Proc. the 9th International Conference on Autonomic Computing, September 2012, pp.53-62. https://doi.org/10.1145/2371536.2371546.
Wagstaff K, Cardie C, Rogers S, Schrödl S. Constrained k-means clustering with background knowledge. In Proc. the 18th International Conference on Machine Learning, June 28-July 1, 2001, pp.577-584.
Yadwadkar N J, Hariharan B, Gonzalez J E, Smith B, Katz R H. Selecting the best VM across multiple public clouds: A data-driven performance modeling approach. In Proc. the 2017 Symposium on Cloud Computing, September 2017, pp. 452-465. https://doi.org/10.1145/3127479.3131614.
Lama P, Zhou X. AROMA: Automated resource allocation and configuration of MapReduce environment in the cloud. In Proc. the 9th International Conference on Autonomic Computing, September 2012, pp.63-72. https://doi.org/10.1145/2371536.2371547.
Kodinariya T M, Makwana P R. Review on determining number of cluster in K-means clustering. International Journal of Advance Research in Computer Science and Management Studies, 2013, 1(6): 90-95.
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M. TensorFlow: A system for large-scale machine learning. In Proc. the 12th USENIX Symposium on Operating Systems Design and Implementation, November 2016, pp.265-283.
Paszke A, Gross S, Massa F et al. Pytorch: An imperative style, high-performance deep learning library. In Proc. the 2019 Annual Conference on Neural Information Processing Systems, December 2019, pp.8026-8037.
Cortez E, Bonde A, Muzio A, Russinovich M, Fontoura M, Bianchini R. Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In Proc. the 26th Symposium on Operating Systems Principles, October 2017, pp.153-167. https://doi.org/10.1145/3132747.3132772.
Foga S, Scaramuzza P L, Guo S, Zhu Z, Dilley Jr R D, Beckmann T, Schmidt G L, Dwyer J L, Hughes M J, Laue B. Cloud detection algorithm comparison and validation for operational Landsat data products. Remote Sensing of Environment, 2017, 194: 379-390. https://doi.org/10.1016/j.rse.2017.03.026.
Basaru R R, Child C, Alonso E, Slabaugh G. Data-driven recovery of hand depth using CRRF on stereo images. IET Computer Vision, 2018, 12(5): 666-678. https://doi.org/10.1049/ietcvi.2017.0227.
Maricq A, Duplyakin D, Jimenez I, Maltzahn C, Stutsman R, Ricci R. Taming performance variability. In Proc. the 13th USENIX Symposium on Operating Systems Design and Implementation, October 2018, pp.409-425.
Uta A, Custura A, Duplyakin D, Jimenez I, Rellermeyer J, Maltzahn C, Ricci R, Iosup A. Is big data performance reproducible in modern cloud networks? In Proc. the 17th USENIX Symposium on Networked Systems Design and Implementation, February 2020, pp.513-527.
Baccarelli E, Cordeschi N, Mei A, Panella M, Shojafar M, Stefa J. Energy-efficient dynamic traffic offloading and reconfiguration of networked data centers for big data stream mobile computing: Review, challenges, and a case study. IEEE Network, 2016, 30(2): 54-61. https://doi.org/10.1109/MNET.2016.7437025.
Cohen M B, Elder S, Musco C, Musco C, Persu M. Dimensionality reduction for k-means clustering and low rank approximation. In Proc. the 47th Annual ACM Symposium on Theory of Computing, June 2015, pp.163-172. https://doi.org/10.1145/2746539.2746569.
Shi J, Zou J, Lu J, Cao Z, Li S, Wang C. MRTuner: A toolkit to enable holistic optimization for MapReduce jobs. Proceedings of the VLDB Endowment, 2014, 7(13): 1319-1330. https://doi.org/10.14778/2733004.2733005.
Delimitrou C, Kozyrakis C. Quasar: Resource-efficient and QoS-aware cluster management. ACM SIGPLAN Notices, 2014, 49(4): 127-144. https://doi.org/10.1145/2644865.2541941.
Author information
Authors and Affiliations
Corresponding author
Supplementary Information
ESM 1
(PDF 133 kb)
Rights and permissions
About this article
Cite this article
Wu, YW., Xu, YJ., Wu, H. et al. Apollo: Rapidly Picking the Optimal Cloud Configurations for Big Data Analytics Using a Data-Driven Approach. J. Comput. Sci. Technol. 36, 1184–1199 (2021). https://doi.org/10.1007/s11390-021-0232-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-021-0232-4