Orchestration Extensions for Interference- and Heterogeneity-Aware Placement for Data-Analytics

Tzenetopoulos, Achilleas; Masouros, Dimosthenis; Xydis, Sotirios; Soudris, Dimitrios

doi:10.1007/s10766-024-00771-2

Orchestration Extensions for Interference- and Heterogeneity-Aware Placement for Data-Analytics

Published: 28 May 2024

Volume 52, pages 298–323, (2024)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Achilleas Tzenetopoulos¹,
Dimosthenis Masouros¹,
Sotirios Xydis¹ &
…
Dimitrios Soudris¹

155 Accesses
1 Citation
Explore all metrics

Abstract

Today, there is an ever-increasing number of workloads pushed and executed on the Cloud. Data center operators and Cloud providers have embraced application co-location and multi-tenancy as first-class system design concerns to effectively serve and manage these huge computational demands. In addition, the continuous advancements in the computers’ hardware technology have made it possible to seamlessly leverage heterogeneous pools of physical machines in data center environments. Even though current modern Cloud schedulers and orchestrators adopt application-aware policies to achieve automation of time-consuming management tasks at scale, e.g., resource provisioning, they still rely on coarse-grained system metrics, such as CPU and/or memory utilization to place incoming applications, thus, not considering (1) interference effects that are provoked by co-located tasks, and (2) the impact on performance caused by the diversity of heterogeneous systems’ characteristics. The lack of such knowledge in existing state-of-the-art orchestration solutions results in their inability to perform efficient allocations, which negatively impacts the overall latency distribution delivered by the infrastructure. In this paper, to alleviate this inefficiency, we present a machine learning (ML) based Cloud orchestration extension that takes into account both resource interference and heterogeneity. The framework adequately schedules data-analytics applications on a pool of heterogeneous resources. We evaluate our proposed solution on different application mixes and co-location scenarios. We show that the proposed framework improves the tail latency of the distribution of the deployed applications by up to 3.6x compared to the state-of-the-art Kubernetes scheduler.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Towards Energy Efficient Orchestration of Cloud Computing Infrastructure

Novel dynamic load balancing algorithm for cloud-based big data analytics

Article 23 August 2021

Key Considerations in Optimizing the Deployment of Big Data Analytics-as-a-Service Utilizing Cloud Architecture and Machine Learning

Notes

References

2022 Global Hybrid Cloud Trends Report. https://www.cisco.com/c/en_au/solutions/hybrid-cloud/2022-trends-report-cte.html
Google Cloud Platform: https://www.cloud.google.com. Accessed 2 Feb 2022
Amazon web services: https://aws.amazon.com/ec2. Accessed 2 Jan 2022
Microsoft azure: cloud computing services. https://azure.microsoft.com. Accessed 2 Jan 2022
Guo, J., Chang, Z., Wang, S., Ding, H., Feng, Y., Mao, L., Bao, Y.: Who limits the resource efficiency of my datacenter: An analysis of alibaba datacenter traces. In: Proceedings of the international symposium on quality of service, pp. 1–10 (2019)
Wang, L., Li, M., Zhang, Y., Ristenpart, T., Swift, M.: Peeking behind the curtains of serverless platforms. In: 2018 USENIX annual technical conference (USENIX ATC 18), pp. 133–146 (2018)
Ferikoglou, A., Masouros, D., Tzenetopoulos, A., Xydis, S., Soudris, D.: Resource aware gpu scheduling in kubernetes infrastructure. In: 12th workshop on parallel programming and run-time management techniques for many-core architectures 10th workshop on design tools (2021)
Thinakaran, P., Gunasekaran, J.R., Sharma, B., Kandemir, M.T., Das, C.R.: Kube-knots: Resource harvesting through dynamic container orchestration in gpu-based datacenters. In: 2019 IEEE international conference on cluster computing (CLUSTER), pp. 1–13. IEEE (2019)
Mars, J., Tang, L.: Whare-map: Heterogeneity in" homogeneous" warehouse-scale computers. In: Proceedings of the 40th annual international symposium on computer architecture, pp. 619–630 (2013)
Mate, J., Daudjee, K., Kamali, S.: Robust multi-tenant server consolidation in the cloud for data analytics workloads. In: 2017 IEEE 37th international conference on distributed computing systems (ICDCS), pp. 2111–2118. IEEE (2017)
Bao, Y., Peng, Y., Wu, C.: Deep learning-based job placement in distributed machine learning clusters. In: IEEE INFOCOM 2019-IEEE conference on computer communications, pp. 505–513. IEEE (2019)
Cheng, Y., Iqbal, M.S., Gupta, A., Butt, A.R.: Cast: Tiering storage for data analytics in the cloud. In: Proceedings of the 24th international symposium on high-performance parallel and distributed computing, pp. 45–56 (2015)
Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A.D., Ailamaki, A., Falsafi, B.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware. Acm sigplan Not. 47(4), 37–48 (2012)
Article Google Scholar
Jia, Z., Zhan, J., Wang, L., Luo, C., Gao, W., Jin, Y., Han, R., Zhang, L.: Understanding big data analytics workloads on modern processors. IEEE Trans. Parallel Distrib. Syst. 28(6), 1797–1810 (2016)
Article Google Scholar
Ferikoglou, A., Chrysomeris, P., Tzenetopoulos, A., Katsaragakis, M., Masouros, D., Soudris, D.: Iris: interference and resource aware predictive orchestration for ml inference serving. IEEE CLOUD 2023 (2023)
Zhang, J., Figueiredo, R.J.: Application classification through monitoring and learning of resource consumption patterns. In: Proceedings 20th IEEE international parallel & distributed processing symposium, p. 10. IEEE (2006)
Zhuravlev, S., Blagodurov, S., Fedorova, A.: Addressing shared resource contention in multicore processors via scheduling. ACM Sigplan Not. 45(3), 129–142 (2010)
Article Google Scholar
Mars, J., Vachharajani, N., Hundt, R., Soffa, M.L.: Contention aware execution: online contention detection and response. In: Proceedings of the 8th annual IEEE/ACM international symposium on code generation and optimization, pp. 257–265. ACM (2010)
Giagos, D., Tzenetopoulos, A., Masouros, D., Soudris, D., Xydis, S.: Darly: deep reinforcement learning for qos-aware scheduling under resource heterogeneity optimizing serverless video analytics. IEEE CLOUD 2023 (2023)
Mars, J., Tang, L., Hundt, R., Skadron, K., Soffa, M.L.: Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In: Proceedings of the 44th annual IEEE/ACM international symposium on microarchitecture, pp. 248–259 (2011). ACM
Yang, H., Breslow, A., Mars, J., Tang, L.: Bubble-flux: precise online qos management for increased utilization in warehouse scale computers. ACM SIGARCH Comput. Archit. News 41(3), 607–618 (2013)
Article Google Scholar
Garefalakis, P., Karanasos, K., Pietzuch, P., Suresh, A., Rao, S.: M edea: scheduling of long running applications in shared production clusters. In: Proceedings of the thirteenth EuroSys conference, p. 4. ACM (2018)
Masouros, D., Xydis, S., Soudris, D.: Rusty: runtime interference-aware predictive monitoring for modern multi-tenant systems. IEEE Trans. Parallel Distrib. Syst. 32(1), 184–198 (2020)
Article Google Scholar
Tzenetopoulos, A., Masouros, D., Xydis, S., Soudris, D.: Interference-aware orchestration in kubernetes. In: International conference on high performance computing, pp. 321–330. Springer (2020)
Bauman, E., Ayoade, G., Lin, Z.: A survey on hypervisor-based monitoring: approaches, applications, and evolutions. ACM Comput. Surv. 48(1), 10 (2015)
Article Google Scholar
Thomas Willham, R.D.: Intel® performance counter monitor—a better way to measure cpu utilization (2012). https://software.intel.com/content/www/us/en/develop/articles/intel-performance-counter-monitor.html
Terpstra, D., Jagode, H., You, H., Dongarra, J.: Collecting performance data with papi-c. In: Tools for high performance computing 2009: Proceedings of the 3rd international workshop on parallel tools for high performance computing, September 2009, ZIH, Dresden, pp. 157–173, Springer (2010)
Prometheus-monitoring system & time series database. prometheus.io (2017)
Varia, J., Mathew, S., et al.: Overview of amazon web services. Amazon Web Serv. 105, 22 (2014)
Google Scholar
Kanev, S., Darago, J.P., Hazelwood, K., Ranganathan, P., Moseley, T., Wei, G.-Y., Brooks, D.: Profiling a warehouse-scale computer. In: Proceedings of the 42nd annual international symposium on computer architecture, pp. 158–169 (2015)
Blagodurov, S., Fedorova, A.: User-level scheduling on numa multicore systems under linux. In: Linux symposium, vol. 2011 (2011)
Pang, P., Li, Y., Liu, B., Chen, Q., Yu, Z., Yu, Z., Zeng, D., Leng, J., Zhao, J., Guo, M.: Pac: preference-aware co-location scheduling on heterogeneous numa architectures to improve resource utilization. In: Proceedings of the 37th international conference on supercomputing, pp. 75–86 (2023)
Wang, L., Zhan, J., Luo, C., Zhu, Y., Yang, Q., He, Y., Gao, W., Jia, Z., Shi, Y., Zhang, S., et al.: Bigdatabench: a big data benchmark suite from internet services. In: 2014 IEEE 20th international symposium on high performance computer architecture (HPCA), pp. 488–499. IEEE (2014)
Yasin, A., Ben-Asher, Y., Mendelson, A.: Deep-dive analysis of the data analytics workload in cloudsuite. In: 2014 IEEE international symposium on workload characterization (IISWC), pp. 202–211. IEEE (2014)
Tzenetopoulos, A., Masouros, D., Xydis, S., Soudris, D.: Interference-aware workload placement for improving latency distribution of converged hpc/big data cloud infrastructures. In: International conference on embedded computer systems, pp. 108–123. Springer (2022)
Marantos, C., Tzenetopoulos, A., Xydis, S., Soudris, D.: Cometes: Cross-device mapping for energy and time aware deployment on edge infrastructures. IEEE Embedded Systems Letters (2023)
Romero, F., Delimitrou, C.: Mage: online and interference-aware scheduling for multi-scale heterogeneous systems. In: Proceedings of the 27th international conference on parallel architectures and compilation techniques, pp. 1–13 (2018)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(oct), 2825–2830 (2011)
MathSciNet Google Scholar
Delimitrou, C., Kozyrakis, C.: ibench: quantifying interference for datacenter applications. In: 2013 IEEE international symposium on workload characterization (IISWC), pp. 23–33. IEEE (2013)
Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A.D., Ailamaki, A., Falsafi, B.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of the seventeenth international conference on architectural support for programming languages and operating systems (2012)
Mattson, P., Reddi, V.J., Cheng, C., Coleman, C., Diamos, G., Kanter, D., Micikevicius, P., Patterson, D., Schmuelling, G., Tang, H., et al.: Mlperf: an industry standard benchmark suite for machine learning performance. IEEE Micro 40(2), 8–16 (2020)
Article Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. IEEE (2009)
Mars, J., Tang, L., Hundt, R.: Heterogeneity in “homogeneous’’ warehouse-scale computers: a performance opportunity. IEEE Comput. Archit. Lett. 10(2), 29–32 (2011)
Article Google Scholar
Henning, J.L.: Spec cpu2006 benchmark descriptions. ACM SIGARCH Comput Archit News 34(4), 1–17 (2006)
Article Google Scholar
Lee, B.C., Brooks, D.M.: Accurate and efficient regression modeling for microarchitectural performance and power prediction. ACM SIGOPS Op Syst Rev 40(5), 185–194 (2006)
Article Google Scholar
McCalpin, J.D.: Stream benchmark. https://www.cs.virginia.edu/~mccalpin/STREAM_Benchmark_2005-01-25. (1995)
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)
Article Google Scholar
Pham, T.-P., Durillo, J.J., Fahringer, T.: Predicting workflow task execution time in the cloud using a two-stage machine learning approach. IEEE Trans. Cloud Comput. 8(1), 256–268 (2017)
Article Google Scholar
Chaudhury, M., Karami, A., Ghazanfar, M.A.: Large-scale music genre analysis and classification using machine learning with apache spark. Electronics 11(16), 2567 (2022)
Article Google Scholar
Tzenetopoulos, A., Masouros, D., Koliogeorgi, K., Xydis, S., Soudris, D., Chazapis, A., Kozanitis, C., Bilas, A., Pinto, C., Nguyen, H.-N., et al.: Evolve: towards converging big-data, high-performance and cloud-computing worlds. In: 2022 design, automation & test in europe conference & exhibition (DATE), pp. 975–980. IEEE (2022).
Kubernetes DaemonSet: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/
Naqvi, S.N.Z., Yfantidou, S., Zimányi, E.: Time series databases and influxdb. Studienarbeit, Université Libre de Bruxelles, 12 (2017)
Gan, Y., Zhang, Y., Hu, K., Cheng, D., He, Y., Pancholi, M., Delimitrou, C.: Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices. In: Proceedings of the twenty-fourth international conference on architectural support for programming languages and operating systems, pp. 19–33 (2019)
MySQL, A.: MySQL (2001)
Advanced Message Queuing Protocol. Website. https://www.amqp.org/
RabbitMQ: https://www.rabbitmq.com/. Accessed 1 Dec 2022

Download references

Acknowledgements

The research work was supported by the Hellenic Foundation for Research and Innovation (HFRI) under the 3rd Call for HFRI Ph.D. Fellowships (Fellowship Number: 5349), and it was partially funded by the EU Horizon 2020 research and innovation programme, under project AIatEDGE, grant agreement No. 101015922.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, National Technical University of Athens, Iroon Polytechniou 9, 15772, Athens, Attica, Greece
Achilleas Tzenetopoulos, Dimosthenis Masouros, Sotirios Xydis & Dimitrios Soudris

Authors

Achilleas Tzenetopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Dimosthenis Masouros
View author publications
You can also search for this author in PubMed Google Scholar
Sotirios Xydis
View author publications
You can also search for this author in PubMed Google Scholar
Dimitrios Soudris
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Achilleas Tzenetopoulos: Formal Analysis, Conceptualization, Methodology, Software, Investigation, Experiments, Visualization, Writing–original draft. Dimosthenis Masouros: Formal analysis, Conceptualization, Methodology, Investigation, Supervision, Writing–original draft, Writing–review and editing. Sotirios Xydis: Formal Analysis, Conceptualization, Methodology, Investigation, Supervision, Writing–review, and editing. Dimitrios Soudris: Conceptualization, Supervision, Writing-review.

Corresponding author

Correspondence to Achilleas Tzenetopoulos.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests, as described by Springer, or personal relationships that might be perceived to influence the results and/or discussion reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tzenetopoulos, A., Masouros, D., Xydis, S. et al. Orchestration Extensions for Interference- and Heterogeneity-Aware Placement for Data-Analytics. Int J Parallel Prog 52, 298–323 (2024). https://doi.org/10.1007/s10766-024-00771-2

Download citation

Received: 30 January 2023
Accepted: 01 May 2024
Published: 28 May 2024
Issue Date: August 2024
DOI: https://doi.org/10.1007/s10766-024-00771-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Orchestration Extensions for Interference- and Heterogeneity-Aware Placement for Data-Analytics

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Towards Energy Efficient Orchestration of Cloud Computing Infrastructure

Novel dynamic load balancing algorithm for cloud-based big data analytics

Key Considerations in Optimizing the Deployment of Big Data Analytics-as-a-Service Utilizing Cloud Architecture and Machine Learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Orchestration Extensions for Interference- and Heterogeneity-Aware Placement for Data-Analytics

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Towards Energy Efficient Orchestration of Cloud Computing Infrastructure

Novel dynamic load balancing algorithm for cloud-based big data analytics

Key Considerations in Optimizing the Deployment of Big Data Analytics-as-a-Service Utilizing Cloud Architecture and Machine Learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation