Skip to main content

Advertisement

Log in

Orchestration Extensions for Interference- and Heterogeneity-Aware Placement for Data-Analytics

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Today, there is an ever-increasing number of workloads pushed and executed on the Cloud. Data center operators and Cloud providers have embraced application co-location and multi-tenancy as first-class system design concerns to effectively serve and manage these huge computational demands. In addition, the continuous advancements in the computers’ hardware technology have made it possible to seamlessly leverage heterogeneous pools of physical machines in data center environments. Even though current modern Cloud schedulers and orchestrators adopt application-aware policies to achieve automation of time-consuming management tasks at scale, e.g., resource provisioning, they still rely on coarse-grained system metrics, such as CPU and/or memory utilization to place incoming applications, thus, not considering (1) interference effects that are provoked by co-located tasks, and (2) the impact on performance caused by the diversity of heterogeneous systems’ characteristics. The lack of such knowledge in existing state-of-the-art orchestration solutions results in their inability to perform efficient allocations, which negatively impacts the overall latency distribution delivered by the infrastructure. In this paper, to alleviate this inefficiency, we present a machine learning (ML) based Cloud orchestration extension that takes into account both resource interference and heterogeneity. The framework adequately schedules data-analytics applications on a pool of heterogeneous resources. We evaluate our proposed solution on different application mixes and co-location scenarios. We show that the proposed framework improves the tail latency of the distribution of the deployed applications by up to 3.6x compared to the state-of-the-art Kubernetes scheduler.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration.

  2. https://www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2023-0/c-statehtml.

  3. https://en.wikipedia.org/wiki/Radial_basis_function_kernelhtml.

  4. https://www.evolve-h2020.eu/.

References

  1. 2022 Global Hybrid Cloud Trends Report. https://www.cisco.com/c/en_au/solutions/hybrid-cloud/2022-trends-report-cte.html

  2. Google Cloud Platform: https://www.cloud.google.com. Accessed 2 Feb 2022

  3. Amazon web services: https://aws.amazon.com/ec2. Accessed 2 Jan 2022

  4. Microsoft azure: cloud computing services. https://azure.microsoft.com. Accessed 2 Jan 2022

  5. Guo, J., Chang, Z., Wang, S., Ding, H., Feng, Y., Mao, L., Bao, Y.: Who limits the resource efficiency of my datacenter: An analysis of alibaba datacenter traces. In: Proceedings of the international symposium on quality of service, pp. 1–10 (2019)

  6. Wang, L., Li, M., Zhang, Y., Ristenpart, T., Swift, M.: Peeking behind the curtains of serverless platforms. In: 2018 USENIX annual technical conference (USENIX ATC 18), pp. 133–146 (2018)

  7. Ferikoglou, A., Masouros, D., Tzenetopoulos, A., Xydis, S., Soudris, D.: Resource aware gpu scheduling in kubernetes infrastructure. In: 12th workshop on parallel programming and run-time management techniques for many-core architectures 10th workshop on design tools (2021)

  8. Thinakaran, P., Gunasekaran, J.R., Sharma, B., Kandemir, M.T., Das, C.R.: Kube-knots: Resource harvesting through dynamic container orchestration in gpu-based datacenters. In: 2019 IEEE international conference on cluster computing (CLUSTER), pp. 1–13. IEEE (2019)

  9. Mars, J., Tang, L.: Whare-map: Heterogeneity in" homogeneous" warehouse-scale computers. In: Proceedings of the 40th annual international symposium on computer architecture, pp. 619–630 (2013)

  10. Mate, J., Daudjee, K., Kamali, S.: Robust multi-tenant server consolidation in the cloud for data analytics workloads. In: 2017 IEEE 37th international conference on distributed computing systems (ICDCS), pp. 2111–2118. IEEE (2017)

  11. Bao, Y., Peng, Y., Wu, C.: Deep learning-based job placement in distributed machine learning clusters. In: IEEE INFOCOM 2019-IEEE conference on computer communications, pp. 505–513. IEEE (2019)

  12. Cheng, Y., Iqbal, M.S., Gupta, A., Butt, A.R.: Cast: Tiering storage for data analytics in the cloud. In: Proceedings of the 24th international symposium on high-performance parallel and distributed computing, pp. 45–56 (2015)

  13. Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A.D., Ailamaki, A., Falsafi, B.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware. Acm sigplan Not. 47(4), 37–48 (2012)

    Article  Google Scholar 

  14. Jia, Z., Zhan, J., Wang, L., Luo, C., Gao, W., Jin, Y., Han, R., Zhang, L.: Understanding big data analytics workloads on modern processors. IEEE Trans. Parallel Distrib. Syst. 28(6), 1797–1810 (2016)

    Article  Google Scholar 

  15. Ferikoglou, A., Chrysomeris, P., Tzenetopoulos, A., Katsaragakis, M., Masouros, D., Soudris, D.: Iris: interference and resource aware predictive orchestration for ml inference serving. IEEE CLOUD 2023 (2023)

  16. Zhang, J., Figueiredo, R.J.: Application classification through monitoring and learning of resource consumption patterns. In: Proceedings 20th IEEE international parallel & distributed processing symposium, p. 10. IEEE (2006)

  17. Zhuravlev, S., Blagodurov, S., Fedorova, A.: Addressing shared resource contention in multicore processors via scheduling. ACM Sigplan Not. 45(3), 129–142 (2010)

    Article  Google Scholar 

  18. Mars, J., Vachharajani, N., Hundt, R., Soffa, M.L.: Contention aware execution: online contention detection and response. In: Proceedings of the 8th annual IEEE/ACM international symposium on code generation and optimization, pp. 257–265. ACM (2010)

  19. Giagos, D., Tzenetopoulos, A., Masouros, D., Soudris, D., Xydis, S.: Darly: deep reinforcement learning for qos-aware scheduling under resource heterogeneity optimizing serverless video analytics. IEEE CLOUD 2023 (2023)

  20. Mars, J., Tang, L., Hundt, R., Skadron, K., Soffa, M.L.: Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In: Proceedings of the 44th annual IEEE/ACM international symposium on microarchitecture, pp. 248–259 (2011). ACM

  21. Yang, H., Breslow, A., Mars, J., Tang, L.: Bubble-flux: precise online qos management for increased utilization in warehouse scale computers. ACM SIGARCH Comput. Archit. News 41(3), 607–618 (2013)

    Article  Google Scholar 

  22. Garefalakis, P., Karanasos, K., Pietzuch, P., Suresh, A., Rao, S.: M edea: scheduling of long running applications in shared production clusters. In: Proceedings of the thirteenth EuroSys conference, p. 4. ACM (2018)

  23. Masouros, D., Xydis, S., Soudris, D.: Rusty: runtime interference-aware predictive monitoring for modern multi-tenant systems. IEEE Trans. Parallel Distrib. Syst. 32(1), 184–198 (2020)

    Article  Google Scholar 

  24. Tzenetopoulos, A., Masouros, D., Xydis, S., Soudris, D.: Interference-aware orchestration in kubernetes. In: International conference on high performance computing, pp. 321–330. Springer (2020)

  25. Bauman, E., Ayoade, G., Lin, Z.: A survey on hypervisor-based monitoring: approaches, applications, and evolutions. ACM Comput. Surv. 48(1), 10 (2015)

    Article  Google Scholar 

  26. Thomas Willham, R.D.: Intel® performance counter monitor—a better way to measure cpu utilization (2012). https://software.intel.com/content/www/us/en/develop/articles/intel-performance-counter-monitor.html

  27. Terpstra, D., Jagode, H., You, H., Dongarra, J.: Collecting performance data with papi-c. In: Tools for high performance computing 2009: Proceedings of the 3rd international workshop on parallel tools for high performance computing, September 2009, ZIH, Dresden, pp. 157–173, Springer (2010)

  28. Prometheus-monitoring system & time series database. prometheus.io (2017)

  29. Varia, J., Mathew, S., et al.: Overview of amazon web services. Amazon Web Serv. 105, 22 (2014)

    Google Scholar 

  30. Kanev, S., Darago, J.P., Hazelwood, K., Ranganathan, P., Moseley, T., Wei, G.-Y., Brooks, D.: Profiling a warehouse-scale computer. In: Proceedings of the 42nd annual international symposium on computer architecture, pp. 158–169 (2015)

  31. Blagodurov, S., Fedorova, A.: User-level scheduling on numa multicore systems under linux. In: Linux symposium, vol. 2011 (2011)

  32. Pang, P., Li, Y., Liu, B., Chen, Q., Yu, Z., Yu, Z., Zeng, D., Leng, J., Zhao, J., Guo, M.: Pac: preference-aware co-location scheduling on heterogeneous numa architectures to improve resource utilization. In: Proceedings of the 37th international conference on supercomputing, pp. 75–86 (2023)

  33. Wang, L., Zhan, J., Luo, C., Zhu, Y., Yang, Q., He, Y., Gao, W., Jia, Z., Shi, Y., Zhang, S., et al.: Bigdatabench: a big data benchmark suite from internet services. In: 2014 IEEE 20th international symposium on high performance computer architecture (HPCA), pp. 488–499. IEEE (2014)

  34. Yasin, A., Ben-Asher, Y., Mendelson, A.: Deep-dive analysis of the data analytics workload in cloudsuite. In: 2014 IEEE international symposium on workload characterization (IISWC), pp. 202–211. IEEE (2014)

  35. Tzenetopoulos, A., Masouros, D., Xydis, S., Soudris, D.: Interference-aware workload placement for improving latency distribution of converged hpc/big data cloud infrastructures. In: International conference on embedded computer systems, pp. 108–123. Springer (2022)

  36. Marantos, C., Tzenetopoulos, A., Xydis, S., Soudris, D.: Cometes: Cross-device mapping for energy and time aware deployment on edge infrastructures. IEEE Embedded Systems Letters (2023)

  37. Romero, F., Delimitrou, C.: Mage: online and interference-aware scheduling for multi-scale heterogeneous systems. In: Proceedings of the 27th international conference on parallel architectures and compilation techniques, pp. 1–13 (2018)

  38. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(oct), 2825–2830 (2011)

    MathSciNet  Google Scholar 

  39. Delimitrou, C., Kozyrakis, C.: ibench: quantifying interference for datacenter applications. In: 2013 IEEE international symposium on workload characterization (IISWC), pp. 23–33. IEEE (2013)

  40. Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A.D., Ailamaki, A., Falsafi, B.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of the seventeenth international conference on architectural support for programming languages and operating systems (2012)

  41. Mattson, P., Reddi, V.J., Cheng, C., Coleman, C., Diamos, G., Kanter, D., Micikevicius, P., Patterson, D., Schmuelling, G., Tang, H., et al.: Mlperf: an industry standard benchmark suite for machine learning performance. IEEE Micro 40(2), 8–16 (2020)

    Article  Google Scholar 

  42. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. IEEE (2009)

  43. Mars, J., Tang, L., Hundt, R.: Heterogeneity in “homogeneous’’ warehouse-scale computers: a performance opportunity. IEEE Comput. Archit. Lett. 10(2), 29–32 (2011)

    Article  Google Scholar 

  44. Henning, J.L.: Spec cpu2006 benchmark descriptions. ACM SIGARCH Comput Archit News 34(4), 1–17 (2006)

    Article  Google Scholar 

  45. Lee, B.C., Brooks, D.M.: Accurate and efficient regression modeling for microarchitectural performance and power prediction. ACM SIGOPS Op Syst Rev 40(5), 185–194 (2006)

    Article  Google Scholar 

  46. McCalpin, J.D.: Stream benchmark. https://www.cs.virginia.edu/~mccalpin/STREAM_Benchmark_2005-01-25. (1995)

  47. Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)

    Article  Google Scholar 

  48. Pham, T.-P., Durillo, J.J., Fahringer, T.: Predicting workflow task execution time in the cloud using a two-stage machine learning approach. IEEE Trans. Cloud Comput. 8(1), 256–268 (2017)

    Article  Google Scholar 

  49. Chaudhury, M., Karami, A., Ghazanfar, M.A.: Large-scale music genre analysis and classification using machine learning with apache spark. Electronics 11(16), 2567 (2022)

    Article  Google Scholar 

  50. Tzenetopoulos, A., Masouros, D., Koliogeorgi, K., Xydis, S., Soudris, D., Chazapis, A., Kozanitis, C., Bilas, A., Pinto, C., Nguyen, H.-N., et al.: Evolve: towards converging big-data, high-performance and cloud-computing worlds. In: 2022 design, automation & test in europe conference & exhibition (DATE), pp. 975–980. IEEE (2022).

  51. Kubernetes DaemonSet: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/

  52. Naqvi, S.N.Z., Yfantidou, S., Zimányi, E.: Time series databases and influxdb. Studienarbeit, Université Libre de Bruxelles, 12 (2017)

  53. Gan, Y., Zhang, Y., Hu, K., Cheng, D., He, Y., Pancholi, M., Delimitrou, C.: Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices. In: Proceedings of the twenty-fourth international conference on architectural support for programming languages and operating systems, pp. 19–33 (2019)

  54. MySQL, A.: MySQL (2001)

  55. Advanced Message Queuing Protocol. Website. https://www.amqp.org/

  56. RabbitMQ: https://www.rabbitmq.com/. Accessed 1 Dec 2022

Download references

Acknowledgements

The research work was supported by the Hellenic Foundation for Research and Innovation (HFRI) under the 3rd Call for HFRI Ph.D. Fellowships (Fellowship Number: 5349), and it was partially funded by the EU Horizon 2020 research and innovation programme, under project AIatEDGE, grant agreement No. 101015922.

Author information

Authors and Affiliations

Authors

Contributions

Achilleas Tzenetopoulos: Formal Analysis, Conceptualization, Methodology, Software, Investigation, Experiments, Visualization, Writing–original draft. Dimosthenis Masouros: Formal analysis, Conceptualization, Methodology, Investigation, Supervision, Writing–original draft, Writing–review and editing. Sotirios Xydis: Formal Analysis, Conceptualization, Methodology, Investigation, Supervision, Writing–review, and editing. Dimitrios Soudris: Conceptualization, Supervision, Writing-review.

Corresponding author

Correspondence to Achilleas Tzenetopoulos.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests, as described by Springer, or personal relationships that might be perceived to influence the results and/or discussion reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tzenetopoulos, A., Masouros, D., Xydis, S. et al. Orchestration Extensions for Interference- and Heterogeneity-Aware Placement for Data-Analytics. Int J Parallel Prog 52, 298–323 (2024). https://doi.org/10.1007/s10766-024-00771-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-024-00771-2

Keywords

Navigation