Skip to main content
Log in

A modular approach to build a hardware testbed for cloud resource management research

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Research on resource management focuses on optimizing system performance and energy efficiency by distributing shared resources like processor cores, caches, and main memory among competing applications. This research spans a wide range of applications, including those from high-performance computing, machine learning, and mobile computing. Existing research frameworks often simplify research by concentrating on specific characteristics, such as the architecture of the computing nodes, resource monitoring, and representative workloads. For instance, this is typically the case with cloud systems, which introduce additional complexity regarding hardware and software requirements. To avoid this complexity during research, experimental frameworks are being developed. Nevertheless, proposed frameworks often fail regarding the types of nodes included, virtualization support, and management of critical shared resources. This paper presents Stratus, an experimental framework that overcomes these limitations. Stratus includes different types of nodes, a comprehensive virtualization stack, and the ability to partition the major shared resources of the system. Even though Stratus was originally conceived to perform cloud research, its modular design allows Stratus to be extended, broadening its research use on different computing domains and platforms, matching the complexity of modern cloud environments, as shown in the case studies presented in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

Stratus is publicly available at https://github.com/Lupones/Stratus.git. Tailbench applications are publicly available at https://github.com/supreethkurpad/Tailbench. NAS Parallel benchmarks can be downloaded in https://www.nas.nasa.gov/software/npb.html. Stress_ng and iperf3 microbenchmarks can be downloaded using common package manager tools if Linux distributions (e.g., Ubuntu’s APT). Datasets from PMLB are publicly available at https://github.com/EpistasisLab/pmlb, as well as the code for the Decision Tree classifier ML method https://github.com/rhiever/sklearn-benchmarks.

Notes

  1. Performance events: mem_load_retired.l3_miss, mem_load_retired.l3_hit and inst_retired.any.

  2. Performance events: Intel (inst_retired.any and cpu_clk_unhalted.ref_tsc), ARM (inst_retired and cycles).

References

  1. Mars J, Tang L (2013) Whare-map: heterogeneity in" homogeneous" warehouse-scale computers. In: Proceedings of ISCA, pp. 619–630

  2. Tang L, Mars J, Zhang X, Hagmann R, Hundt R, Tune E (2013) Optimizing Google’s warehouse scale computers: The NUMA experience. In: Proceedings of HPCA, pp. 188–197

  3. Gupta A, Milojicic D (2011) Evaluation of HPC applications on cloud. In: 2011 Sixth open cirrus summit, pp 22–26

  4. Netto MAS, Calheiros RN, Rodrigues ER, Cunha RLF, Buyya R (2018) HPC cloud for scientific and business applications: taxonomy, vision, and research challenges. ACM Comput Surv 51(1):1–29

    Article  Google Scholar 

  5. Hormozi E, Hormozi H, Akbari MK, Javan MS (2012) Using of machine learning into cloud environment (A Survey): Managing and scheduling of resources in cloud systems. In: Proceedings of 3PGCIC, pp. 363–368

  6. Sahoo J, Mohapatra S, Lath R (2010) Virtualization: a survey on concepts, taxonomy and associated security issues. In: Proceedings of ICCNT, pp. 222–226

  7. Serrano D, Bouchenak S, Kouki Y, de Oliveira Jr FA, Ledoux T, Lejeune J, Sopena J, Arantes L, Sens P (2016) SLA guarantees for cloud services. Futur Gener Comput Syst 54:233–246

    Article  Google Scholar 

  8. Buyya R, Ranjan R, Calheiros RN (2009) Modeling and simulation of scalable Cloud computing environments and the CloudSim toolkit: Challenges and opportunities. In: Proceedings of HPCS, pp. 1–11

  9. Kasture H, Sanchez D (2016) Tailbench: a benchmark suite and evaluation methodology for latency-critical applications. In: Proceedings of IISWC, pp. 1–10

  10. Masouros D, Xydis S, Soudris D (2020) Rusty: runtime interference-aware predictive monitoring for modern multi-tenant systems. IEEE Trans Parallel Distrib Syst 32(1):184–198

    Article  Google Scholar 

  11. Shekhar S, Abdel-Aziz H, Bhattacharjee A, Gokhale A, Koutsoukos X (2018) Performance interference-aware vertical elasticity for cloud-hosted latency-sensitive applications. In: Proceedings of CLOUD, pp. 82–89

  12. Chen S, Delimitrou C, Martínez JF (2019) PARTIES: QoS-aware resource partitioning for multiple interactive services. In: Proceedings of ASPLOS, pp. 107–120

  13. Pons L, Feliu J, Sahuquillo J, Gómez ME, Petit S, Pons J, Huang C (2023) Cloud white: detecting and estimating QoS degradation of latency-critical workloads in the public cloud. Futur Gener Comput Syst 138:13–25

    Article  Google Scholar 

  14. Chen Q, Xue S, Zhao S, Chen S, Wu Y, Xu Y, Song Z, Ma T, Yang Y, Guo M (2020) Alita: comprehensive performance isolation through bias resource management for public clouds. In: Proceedings of SC, pp. 1–13

  15. Gan Y, Liang M, Dev S, Lo D, Delimitrou C (2021) Sage: practical and scalable ML-driven performance debugging in microservices. In: Proceedings of ASPLOS, pp. 135–151

  16. Suresh A, Gandhi A (2021) ServerMore: opportunistic execution of serverless functions in the cloud. In: Proceedings of SoCC, pp. 570–584

  17. Patel T, Tiwari D (2020) CLITE: Efficient and QoS-aware co-location of multiple latency-critical jobs for warehouse scale computers. In: Proceedings of HPCA, pp. 193–206

  18. Chen S, Jin A, Delimitrou C, Martínez JF (2022) ReTail: opting for learning simplicity to enable QoS-aware power management in the cloud. In: Proceedings of HPCA, pp. 155–168

  19. Nishtala R, Petrucci V, Carpenter P, Sjalander M (2020) Twig: multi-agent task management for colocated latency-critical cloud services. In: Proceedings of HPCA, pp. 167–179

  20. Javadi SA, Suresh A, Wajahat M, Gandhi A (2019) Scavenger: a black-box batch workload resource manager for improving utilization in cloud environments. In: Proceedings of SoCC, pp. 272–285

  21. Pons L, Feliu J, Puche J, Huang C, Petit S, Pons J, Gómez ME, Sahuquillo J (2022) Effect of hyper-threading in latency-critical multithreaded cloud applications and utilization analysis of the major system resources. Futur Gener Comput Syst 131:194–208

    Article  Google Scholar 

  22. Ferdman M, Adileh A, Kocberber O, Volos S, Alisafaee M, Jevdjic D, Kaynak C, Popescu AD, Ailamaki A, Falsafi B (2012) Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of ASPLOS

  23. Canonical Ltd: Ubuntu manpage: stress-ng. Available at https://manpages.ubuntu.com/manpages/artful/man1/stress-ng.1.html. Accessed: 2022-11-20 (2020)

  24. ESnet, NLANR, DAST: iPerf tool for network bandwidth measurements. Available at https://iperf.fr/. Accessed: 2022-11-20 (2020)

  25. Why would a cloud computing company use the SPEC CPU2017 benchmark suite? Available at https://www.spec.org/cpu2017/publications/DO-case-study.html. Accessed: 2019-08-02 (2017)

  26. Pons L, Petit S, Pons J, Gómez ME, Huang C, Sahuquillo J (2023) Stratus: A hardware/software infrastructure for controlled cloud research. In: Proceedings of PDP, pp. 299–306

  27. Belalem G, Tayeb FZ, Zaoui W (2010) Approaches to improve the resources management in the simulator CloudSim. In: Proceedings of ICICA, pp. 189–196. Springer

  28. Liang B, Dong X, Wang Y, Zhang X (2020) Memory-aware resource management algorithm for low-energy cloud data centers. Futur Gener Comput Syst 113:329–342

    Article  Google Scholar 

  29. Liu C, Li W, Wan J, Li L, Ma Z, Wang Y (2022) Resource management in cloud based on deep reinforcement learning. In: Proceedings of ICCCI, pp. 28–33

  30. Badia S, Carpen-Amarie A, Lèbre A, Nussbaum L (2013) Enabling large-scale testing of IaaS cloud platforms on the grid’5000 testbed. In: Proceedings of the International Workshop on Testing the Cloud, pp. 7–12

  31. Duplyakin D, Ricci R, Maricq A, Wong G, Duerig J, Eide E, Stoller L, Hibler M, Johnson D, Webb K, et al.: (2019) The design and operation of CloudLab. In: Proceedings of USENIX ATC, pp. 1–14

  32. Keahey K, Anderson J, Zhen Z, Riteau P, Ruth P, Stanzione D, Cevik M, Colleran J, Gunawi HS, Hammock C, et al.: (2020) Lessons learned from the chameleon testbed. In: Proceedings of USENIX ATC, pp. 219–233

  33. Sfakianakis Y, Marazakis M, Bilas A (2021) Skynet: Performance-driven resource management for dynamic workloads. In: Proceedings of CLOUD, pp. 527–539

  34. Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 72–81

  35. Cai B, Li K, Zhao L, Zhang R (2022) Less provisioning: a hybrid resource scaling engine for long-running services with tail latency guarantees. IEEE Trans Cloud Comput 10(3):1941–1957

    Article  Google Scholar 

  36. Ma L, Liu Z, Xiong J, Jiang D (2022) QWin: Core allocation for enforcing differentiated tail latency SLOs at shared storage backend. In: 2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS), pp. 1098–1109

  37. Zhang Y, Chen J, Jiang X, Liu Q, Steiner IM, Herdrich AJ, Shu K, Das R, Cui L, Jiang L (2021) LIBRA: clearing the cloud through dynamic memory bandwidth management. In: Proceedings of HPCA, pp. 815–826

  38. Dean J, Barroso LA (2013) The tail at scale. Commun ACM 56(2):74–80

    Article  Google Scholar 

  39. Nishtala R, Fugal H, Grimm S, Kwiatkowski M, Lee H, Li HC, McElroy R, Paleczny M, Peek D, Saab P et al.: (2013) Scaling Memcache at Facebook. In: Proceedings of NSDI, pp. 385–398

  40. Li J, Sharma NK, Ports DR, Gribble SD (2014) Tales of the tail: Hardware, os, and application-level sources of tail latency. In: Proceedings of SOCC, pp. 1–14

  41. Google Cloud Compute Engine - CPU platforms [online]. Available at https://cloud.google.com/compute/docs/cpu-platforms. Accessed: 2022-11-14 (2022)

  42. Amazon’s EC2 [online]. Available at https://aws.amazon.com/ec2/instance-types/?nc1=h_ls. Accessed: 2022-11-14 (2022)

  43. Huawei Elastic Cloud Server (ECS) [online]. Available at https://www.huaweicloud.com/intl/en-us/product/ecs.html. Accessed: 2022-11-14 (2022)

  44. Sefraoui O, Aissaoui M, Eleuldj M (2012) OpenStack: toward an open-source solution for cloud computing. Int J Comput Appl 55(3):38–42

    Google Scholar 

  45. Kivity A, Kamay Y, Laor D, Lublin U, Liguori A (2007) kvm: the Linux virtual machine monitor. In: Proceedings of the Linux Symposium, vol. 1, pp. 225–230

  46. Barham P, Dragovic B, Fraser K, Hand S, Harris T, Ho A, Neugebauer R, Pratt I, Warfield A (2003) Xen and the art of virtualization. ACM SIGOPS Op Syst Rev 37(5):164–177

    Article  Google Scholar 

  47. Amazon Web Services [online]. Available at https://aws.amazon.com/ec2/faqs/?nc1=h_ls. Accessed: 2022-11-28 (2022)

  48. Google Compute Engine FAQ [online]. Available at https://cloud.google.com/compute/docs/faq. Accessed: 2022-11-28 (2022)

  49. Libvirt: The virtualization API [online]. Available at https://libvirt.org. Accessed: 2022-11-28 (2022)

  50. QEMU [online]. Available at https://www.qemu.org. Accessed: 2022-11-28 (2022)

  51. Pfaff B, Pettit J, Koponen T, Jackson E, Zhou A, Rajahalme J, Gross J, Wang A, Stringer J, Shelar P, Amidon K, Casado M (2015) The design and implementation of open vSwitch. In: Proceedings of NSDI, pp. 117–130. USENIX Association

  52. Russell R (2008) virtio: towards a de-facto standard for virtual I/O devices. ACM SIGOPS Op Syst Rev 42(5):95–103

    Article  Google Scholar 

  53. DPDK [online]. Available at https://www.dpdk.org/. Accessed: 2022-11-28 (2022)

  54. Weil S, Brandt S, Miller E, Long D, Maltzahn C (2006) Ceph: A scalable, high-performance distributed file system. In: Proceedings of OSDI, pp. 307–320

  55. Padoin EL, Pilla LL, Castro M, Boito FZ, Alexandre Navaux PO, Méhaut J-F (2015) Performance/energy trade-off in scientific computing: the case of ARM big. LITTLE and Intel Sandy Bridge. IET Computers & Digital Techniques 9(1), 27–35

  56. Criado J, Garcia-Gasulla M, Kumbhar P, Awile O, Magkanaris I, Mantovani F (2020) CoreNEURON: performance and energy efficiency evaluation on intel and arm CPUs. In: Proceedings of CLUSTER, pp. 540–548 . IEEE

  57. Mitra G, Johnston B, Rendell AP, McCreath E, Zhou J (2013) Use of SIMD vector operations to accelerate application code performance on low-powered ARM and intel platforms. In: Proceedings of IPDPSW, pp. 1107–1116

  58. Flynn P, Yi X, Yan Y (2022) Exploring source-to-source compiler transformation of OpenMP SIMD constructs for Intel AVX and Arm SVE vector architectures. In: Proceedings of PMAM, pp. 11–20

  59. T. Gleixner, I.M (2009) Performance counters for Linux

  60. Intel Corporation: Intel RDT Library. Available athttps://github.com/intel/intel-cmt-cat (2021)

  61. Jia R, Yang Y, Grundy J, Keung J, Hao L (2021) A systematic review of scheduling approaches on multi-tenancy cloud platforms. Inf Softw Technol 132:106478

    Article  Google Scholar 

  62. Wang Z, Xu C, Agrawal K, Li J (2022) Adaptive scheduling of multiprogrammed dynamic-multithreading applications. J Parallel Distrib Comput 162:76–88

    Article  Google Scholar 

  63. Lu C, Ye K, Xu G, Xu C-Z, Bai T (2017) Imbalance in the cloud: An analysis on Alibaba cluster trace. In: Proceedings of Big Data, pp. 2884–2892

  64. Liu Q, Yu Z (2018) The Elasticity and Plasticity in Semi-Containerized Co-Locating Cloud Workload: A View from Alibaba Trace. In: Proceedings of SoCC, pp. 347–360

  65. Cortez E, Bonde A, Muzio A, Russinovich M, Fontoura M, Bianchini R (2017) Resource central: understanding and predicting workloads for improved resource management in large cloud platforms. In: Proceedings of SOSP, pp. 153–167

  66. Intel: Improving real-time performance by utilizing cache allocation technology. Intel Corporation, April (2015)

  67. Andrew H, Abbasi Khawar M, Marcel C (2019) Introduction to Memory Bandwidth Allocation. Available at https://software.intel.com/en-us/articles/introduction-to-memory-bandwidth-allocation

  68. Lo D, Cheng L, Govindaraju R, Ranganathan P, Kozyrakis C (2015) Heracles: Improving resource efficiency at scale. In: Proceedings of ISCA, pp. 450–462

  69. Yang H, Breslow A, Mars J, Tang L (2013) Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers. In: Proceedings of ISCA, pp. 607–618. Association for Computing Machinery, New York, NY, USA

  70. Moradi H, Wang W, Zhu D (2020) DiHi: distributed and hierarchical performance modeling of multi-VM cloud running applications. In: Proceedings of HPCC/SmartCity/DSS, pp. 1–10

  71. Gupta A, Kale LV, Gioachin F, March V, Suen CH, Lee B-S, Faraboschi P, Kaufmann R, Milojicic D (2013) The Who, What, Why, and How of High Performance Computing in the Cloud. In: Proceedings of CloudCom, vol. 1, pp. 306–314

  72. Gupta A, Kalé LV, Milojicic D, Faraboschi P, Balle SM (2013) HPC-Aware VM Placement in Infrastructure Clouds. In: Proceedings of IC2E, pp. 11–20

  73. Jin H-Q, Frumkin M, Yan J (1999) The OpenMP implementation of NAS parallel benchmarks and its performance

  74. Jackson A, Turner A, Weiland M, Johnson N, Perks O, Parsons M (2019) Evaluating the arm ecosystem for high performance computing. In: Proceedings of PASC, pp. 1–11. Association for Computing Machinery

  75. Chen S, Galon S, Delimitrou C, Manne S, Martínez JF (2017) Workload characterization of interactive cloud services on big and small server platforms. In: Proceedings of IISWC, pp. 125–134

  76. Hammond SD, Hughes C, Levenhagen MJ, Vaughan CT, Younge AJ, Schwaller B, Aguilar MJ, Pedretti KT, Laros JH (2019) Evaluating the Marvell ThunderX2 Server Processor for HPC Workloads. In: Proceedings of HPCS, pp. 416–423

  77. Olson RS, La Cava W, Orzechowski P, Urbanowicz RJ, Moore JH (2017) PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Mining 10(1):1–13

    Article  Google Scholar 

  78. Olson RS, La Cava W, Mustahsan Z, Varik A, Moore JH (2017) Data-driven Advice for Applying Machine Learning to Bioinformatics Problems. arXiv e-print. arXiv:1708.05070

  79. Soria Pardos V (2019) Characterization of HPC applications for ARM SIMD instructions. B.S. thesis, Universitat Politècnica de Catalunya

  80. Feliu J, Sahuquillo J, Petit S, Duato J (2013) L1-bandwidth aware thread allocation in multicore SMT processors. In: Proceedings of PACT, pp. 123–132

  81. Slimani S, Hamrouni T, Ben Charrada F (2021) Service-oriented replication strategies for improving quality-of-service in cloud computing: a survey. Clust Comput 24:361–392

    Article  Google Scholar 

  82. Barroso LA, Clidaras J, Hölzle U (2013) The datacenter as a computer: an introduction to the design of warehouse-scale machines. Second Edition. https://doi.org/10.2200/S00516ED2V01Y201306CAC024

    Article  Google Scholar 

  83. Kang Y, Zheng Z, Lyu MR (2012) A latency-aware co-deployment mechanism for cloud-based services. In: Proceedings of CLOUD, pp. 630–637

  84. Zhang Y, Hua W, Zhou Z, Suh GE, Delimitrou C (2021) Sinan: ML-Based and QoS-Aware resource management for cloud microservices. In: Proceedings of ASPLOS. Association for Computing Machinery, New York, NY, USA

  85. Zhang I, Raybuck A, Patel P, Olynyk K, Nelson J, Leija OSN, Martinez A, Liu J, Simpson AK, Jayakar S et al.: (2021) The demikernel datapath os architecture for microsecond-scale datacenter systems. In: Proceedings of SOSP, pp. 195–211

  86. Michael Bayer et al.: Mako Templates. Available at http://www.makotemplates.org/ (2019)

  87. Mills DL (1991) Internet time synchronization: the network time protocol. IEEE Trans Commun 39(10):1482–1493

    Article  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

This work has been supported by the Spanish Ministerio de Universidades under the grant FPU18/01948 and by the Spanish Ministerio de Ciencia e Innovación and European ERDF under grants PID2021-123627OB-C51 and TED2021-130233B-C32, and by Generalitat Valenciana under Grant AICO/2021/266.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed equally to the work’s conception and design. Platform design and software development was performed by LP, SP, and JP. Data collection and representation were performed by LP. JS was in charge of the project administration, supervision, and data validation. JS, MEG, and SP were responsible for the funding acquisition and administration. All the authors contributed in the writing and reviewing of the manuscript.

Corresponding author

Correspondence to Lucia Pons.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethical approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Stratus resource and application manager

Appendix A: Stratus resource and application manager

To ease framework reproducibility, this appendix shows the main methods used in Stratus Resource and Application Manager. Below, each of the steps performed to carry out a typical experiment with one or more VMs running applications is discussed in detail.

1.1 A.1: Execution of experiments

1) Define experiment: workload and parameters. As a prior step, the workload (i.e., VMs and applications to be run on them) and experimental conditions must be defined. To ease this task, Stratus makes use of MAKO templates [86], which provides a simple and intuitive language to specify the parameters of the experiments. Figure 14 shows an example of a MAKO template with a configuration to run the VM with the xapian Tailbench application.

Fig. 14
figure 14

Example of a template to execute the TailBench application xapian in a VM

The template begins with an include directive. This directive allows including detailed information (stored in applications.mako file) regarding the command to execute of the applications. In the tasks section of the template, the user must indicate which applications (app) are going to be executed, specifying their VM (domain_name, snapshot_name, ip, and port). The VCPUs are set according to the physical CPUs defined in the cpus field. Regarding the application arguments, the arguments field is used to configure application-specific input parameters to the server-side workload (e.g., number of requests). Similarly, the client-arguments field defines the client-side arguments like the number of queries per second.

The cmd section details information about the execution of the framework like the length of each interval or quantum in seconds (ti), the maximum number of intervals that will be executed (mi), the core(s) where the framework process is pinned (cpu-affinity) and the list of the events to be monitored using hardware performance counters (event).

Notice the template can also specify if VMs are only allowed to use a partition of a shared resource (LLC, memory bandwidth, network bandwidth, or disk bandwidth). For example, in Fig. 14, the VM can only make use of 4 of the 11 available LLC.

2) Execute launch script to start the manager. To start running an experiment, the user executes the launch script. First, the script prepares the execution environment. For instance, fixing the processor frequency to avoid variability among experiments.

figure a

Additionally, the server clocks of both the server and client machines are synchronized to ensure server- and client-collected metrics are aligned, using the Network Time Protocol (NTP) [87] with a known NTP time server (e.g., europe.pool.ntp.org).

figure b

When the environment is ready, the configuration file is generated with the workloads to execute and all the experiment parameters from the MAKO template. Then, the manager starts to run.

3) Prepare VMs for execution. The first step the manager performs is setting up and starting the VMs using both libvirt and Ceph libraries and utilities. To reduce the start-up overhead, the manager uses the snapshots feature of Ceph. A snapshot is a copy of the state of a VM, including the disk and main memory contents. This feature preserves a VM’s actual state and data at a given time. Therefore, this state can be reverted at any moment. For each VM, we have taken a snapshot that has already performed the OS boot process and is ready to receive the command to launch the target benchmark. Thus, we load the snapshot and then start running the VM:

figure c

Once the VMs are started and the snapshots are loaded, the number of CPUs of each VM (i.e., VCPUs) can be modified in case a multi-threaded application is going to be executed and more than one CPU is required.

figure d

After the VCPUs are added, they are enabled in the guest.

figure e

In addition, we can specify the affinity of the VCPUs to cores of the host machine (defined in cpumap):

figure f

4) Setup resource monitoring and partitioning. With QEMU, each VCPU is associated with a processor ID (PID) in the host OS. These PIDs are required to monitor hardware performance counters with Perf individually for each thread (i.e., VCPU) of the VM. Similarly, LLC and memory bandwidth monitoring is performed on a PID basis. Therefore, the manager must get the list of PIDs of the VCPUs:

figure g

The remaining resources, network, and disk bandwidth are monitored per VM. The manager also allows partitioning of the main system shared resources and assigns each VM a share of a given resource. Therefore, if specified in the template, a resource share is allocated to the VM (more details in Section XX).

5) Start running applications in the VMs. When the VMs are operative and ready to start executing the applications, an SSH command is sent to each VM to start the execution of each workload.

figure h

Stratus’ manager is adapted to support the execution of client–server workloads (e.g., TailBench benchmark suite) and best-effort or batch workloads (e.g., stressor microbenchmarks or SPEC CPU benchmarks). In the case of a client–server workload, an SSH command is sent to the client node to start running the clients, which send requests to the server (already running).

6) Perform actions in each quantum. Once the execution starts, the manager executes the main loop (see Fig. 4) for the rest of the execution time. In each iteration, the manager is suspended for a given quantum length (established in the template). Then, data is collected from different sources (e.g., hardware performance counters, Linux file system, Intel library, or libvirt) to monitor the main system resources (CPU usage, LLC occupancy, main memory, network, and disk bandwidth). Additionally, the manager is adapted to allow implementing and applying QoS policies. For instance, policies that manage resource sharing among VMs [12, 14, 20], predict interference among VMs [11, 13, 17, 18, 61] or schedule VMs [62].

7) Execution end. The main loop ends when the manager detects that all the VMs have finished running their applications, the moment at which it shuts down the running VMs.

figure i

All the data collected from the hardware performance counters and system resources are stored in CSV files, ready to be processed. Events monitored by performance hardware counters are those specified in the template.mako (events field in cmd section). The events supported can be checked with # perf list or looking at the events that are supported by the microarchitecture of the machine. To monitor the main shared resources of the system, the framework collects data regarding LLC occupancy and memory, disk, and network bandwidth (more details can be found in the following section).

Additionally, for characterization and debugging purposes, statistics and data are also collected inside the VMs. For instance, in Tailbench workloads, the clients report results such as latency per query, queries per interval, and tail latency.

1.2 A.2: Monitoring and partitioning main shared resources

Due to the high popularity of resource management research in the last few years, server processors have been provided with advanced technologies that allow monitoring and partitioning of the major system resources.

Below, we explain how monitoring and partitioning of each shared resource is implemented in Stratus without relying on any external tool.

CPU Utilization. To obtain the utilization of each CPU, we use the data collected from the file /proc/stat, which reports statistics about the kernel activity aggregated since the system first booted. Additionally, statistics are given on the CPU utilization of each VM (i.e., vCPU’s utilization) using libvirt’s function virDomainGetCPUStats.

Last Level Cache (LLC). Recently, some processor manufacturers like Intel have developed technologies that allow monitoring and partitioning of the LLC. In Intel processors, these technologies are known as Cache Monitoring Technology (CMT) and Cache Allocation Technology (CAT) [66]. Partitioning is performed using Classes of Service (CLOS), which can be defined either as groups of applications (PIDs) or as groups of logical cores to which a partition of the LLC is assigned. The LLC is partitioned in a per-way basis; that is, a cache way acts as the granularity unit of CLOS.

To monitor the exact amount of space occupied by each application, the performance event PQOS_MON_EVENT_L3_OCCUP returns LLC space occupied in bytes.

Memory Bandwidth. Recent Intel Xeon Scalable processors introduce Memory Bandwidth Allocation (MBA) [67], which allows to distribute memory bandwidth between the running applications. More precisely, it allows controlling the memory bandwidth between the L2 and the L3 (i.e., LLC) caches. Similarly to CAT, MBA works using CLOS. That is, MBA bandwidth limits apply only to CLOS, to which the user can assign tasks (PIDs) or cores. However, MBA works on a per-core basis. If the individual memory bandwidths of two applications running on the same core are limited with different values, the maximum limitation is the one that will apply to that core.

Regarding memory bandwidth monitoring, three performance events are available:

  1. 1.

    PQOS_MON_EVENT_LMEM_BW monitors the local memory bandwidth (that is, reading memory from the same processor socket).

  2. 2.

    PQOS_MON_EVENT_RMEM_BW monitors the remote memory bandwidth (that is, reading memory from another processor socket).

  3. 3.

    PQOS_MON_EVENT_TMEM_BW monitors the total memory bandwidth.

These events return the number of bytes that have been read from memory up to the time the performance counter is polled. Therefore, to calculate the actual memory bandwidth, we must subtract the current memory reading from the previous one and divide this value by the time elapsed between both readings (i.e., the interval length).

To monitor and partition the LLC and memory bandwidth from Stratus, we have used the Intel (PQoS) RDT library [60]. This library provides methods and directives to configure and use all resource partitioning technologies. More specifically, we have used the following classes:

  1. 1.

    pqos.h which contains the platform QoS API and data structure definition. Different methods are defined for LLC and memory bandwidth.

  2. 2.

    os_monitoring.h which contains the methods that belong to the PQoS OS monitoring module. It is not possible to use the monitoring methods from pqos.h since their use is incompatible with Perf, therefore, this module must be used. The same methods are used to monitor the LLC occupancy and memory bandwidth. Firstly, at the start of the execution, these monitoring capabilities must be set up for each task using the method monitor_setup. For the rest of the execution, in each interval, the values of each performance event are retrieved using the method monitor_get_values.

Disk Bandwidth. I/O access to the disks can be monitored using the virsh tool or libvirt’s API. Both mechanisms offer the same functionality and allow monitoring the number of read, write, and flush operations, the number of bytes read and written, as well as the total duration of the read, write, and flush operations. To integrate disk I/O monitoring into Stratus without relying on any external tool, we make use of libvirt’s API. In the libvirt source code, the function cmdDomblkstat is the one that performs the main steps to monitor disk I/O. This function performs different checks on the requested options, retrieves the target device to be monitored, calls the function virDomainBlockStats, which is the one that actually collects the disk bandwidth statistics and, finally, reports these statistics in a human-readable way.

Regarding the partitioning disk I/O capabilities among different VMs, Stratus makes use of the blkdeviotune function of libvirt’s API. This method allows to set the maximum overall, read, and write bandwidths either in bytes per second or in I/O operations per second.

Network Bandwidth. The number of network packets or bytes that go through a network interface can be monitored with libvirt’s API. The functions cmdDomIfstat and virDomainInterfaceStats of the API are employed to obtain the bandwidth consumed by a network interface.

Regarding network partitioning, libvirt allows limiting the bandwidth consumed by a given VM in both directions: inbound and outbound. Three main parameters can be specified to limit the consumption:

  • average: The target average bandwidth consumption in KB/s.

  • peak: The maximum allowed consumption in KB/s.

  • burst: The maximum number of KB/s allowed at peak speed.

Stratus makes use of libvirt’s API function virDomainSetInterfaceParameters to establish the desired bandwidth limit for a given VM.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pons, L., Petit, S., Pons, J. et al. A modular approach to build a hardware testbed for cloud resource management research. J Supercomput 80, 10552–10583 (2024). https://doi.org/10.1007/s11227-023-05856-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05856-2

Keywords

Navigation