A modular approach to build a hardware testbed for cloud resource management research

Pons, Lucia; Petit, Salvador; Pons, Julio; Gómez, María E.; Sahuquillo, Julio

doi:10.1007/s11227-023-05856-2

A modular approach to build a hardware testbed for cloud resource management research

Published: 27 December 2023

Volume 80, pages 10552–10583, (2024)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Lucia Pons¹,
Salvador Petit¹,
Julio Pons¹,
María E. Gómez¹ &
…
Julio Sahuquillo¹

168 Accesses
Explore all metrics

Abstract

Research on resource management focuses on optimizing system performance and energy efficiency by distributing shared resources like processor cores, caches, and main memory among competing applications. This research spans a wide range of applications, including those from high-performance computing, machine learning, and mobile computing. Existing research frameworks often simplify research by concentrating on specific characteristics, such as the architecture of the computing nodes, resource monitoring, and representative workloads. For instance, this is typically the case with cloud systems, which introduce additional complexity regarding hardware and software requirements. To avoid this complexity during research, experimental frameworks are being developed. Nevertheless, proposed frameworks often fail regarding the types of nodes included, virtualization support, and management of critical shared resources. This paper presents Stratus, an experimental framework that overcomes these limitations. Stratus includes different types of nodes, a comprehensive virtualization stack, and the ability to partition the major shared resources of the system. Even though Stratus was originally conceived to perform cloud research, its modular design allows Stratus to be extended, broadening its research use on different computing domains and platforms, matching the complexity of modern cloud environments, as shown in the case studies presented in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Energy efficiency in cloud computing data centers: a survey on software technologies

Article 30 August 2022

Fault-tolerant allocation of deadline-constrained tasks through preemptive migration in heterogeneous cloud environments

Article 27 May 2024

A survey of compute nodes with 100 TFLOPS and beyond for supercomputers

Article 23 May 2024

Data availability

Stratus is publicly available at https://github.com/Lupones/Stratus.git. Tailbench applications are publicly available at https://github.com/supreethkurpad/Tailbench. NAS Parallel benchmarks can be downloaded in https://www.nas.nasa.gov/software/npb.html. Stress_ng and iperf3 microbenchmarks can be downloaded using common package manager tools if Linux distributions (e.g., Ubuntu’s APT). Datasets from PMLB are publicly available at https://github.com/EpistasisLab/pmlb, as well as the code for the Decision Tree classifier ML method https://github.com/rhiever/sklearn-benchmarks.

Notes

Performance events: mem_load_retired.l3_miss, mem_load_retired.l3_hit and inst_retired.any.
Performance events: Intel (inst_retired.any and cpu_clk_unhalted.ref_tsc), ARM (inst_retired and cycles).

References

Mars J, Tang L (2013) Whare-map: heterogeneity in" homogeneous" warehouse-scale computers. In: Proceedings of ISCA, pp. 619–630
Tang L, Mars J, Zhang X, Hagmann R, Hundt R, Tune E (2013) Optimizing Google’s warehouse scale computers: The NUMA experience. In: Proceedings of HPCA, pp. 188–197
Gupta A, Milojicic D (2011) Evaluation of HPC applications on cloud. In: 2011 Sixth open cirrus summit, pp 22–26
Netto MAS, Calheiros RN, Rodrigues ER, Cunha RLF, Buyya R (2018) HPC cloud for scientific and business applications: taxonomy, vision, and research challenges. ACM Comput Surv 51(1):1–29
Article Google Scholar
Hormozi E, Hormozi H, Akbari MK, Javan MS (2012) Using of machine learning into cloud environment (A Survey): Managing and scheduling of resources in cloud systems. In: Proceedings of 3PGCIC, pp. 363–368
Sahoo J, Mohapatra S, Lath R (2010) Virtualization: a survey on concepts, taxonomy and associated security issues. In: Proceedings of ICCNT, pp. 222–226
Serrano D, Bouchenak S, Kouki Y, de Oliveira Jr FA, Ledoux T, Lejeune J, Sopena J, Arantes L, Sens P (2016) SLA guarantees for cloud services. Futur Gener Comput Syst 54:233–246
Article Google Scholar
Buyya R, Ranjan R, Calheiros RN (2009) Modeling and simulation of scalable Cloud computing environments and the CloudSim toolkit: Challenges and opportunities. In: Proceedings of HPCS, pp. 1–11
Kasture H, Sanchez D (2016) Tailbench: a benchmark suite and evaluation methodology for latency-critical applications. In: Proceedings of IISWC, pp. 1–10
Masouros D, Xydis S, Soudris D (2020) Rusty: runtime interference-aware predictive monitoring for modern multi-tenant systems. IEEE Trans Parallel Distrib Syst 32(1):184–198
Article Google Scholar
Shekhar S, Abdel-Aziz H, Bhattacharjee A, Gokhale A, Koutsoukos X (2018) Performance interference-aware vertical elasticity for cloud-hosted latency-sensitive applications. In: Proceedings of CLOUD, pp. 82–89
Chen S, Delimitrou C, Martínez JF (2019) PARTIES: QoS-aware resource partitioning for multiple interactive services. In: Proceedings of ASPLOS, pp. 107–120
Pons L, Feliu J, Sahuquillo J, Gómez ME, Petit S, Pons J, Huang C (2023) Cloud white: detecting and estimating QoS degradation of latency-critical workloads in the public cloud. Futur Gener Comput Syst 138:13–25
Article Google Scholar
Chen Q, Xue S, Zhao S, Chen S, Wu Y, Xu Y, Song Z, Ma T, Yang Y, Guo M (2020) Alita: comprehensive performance isolation through bias resource management for public clouds. In: Proceedings of SC, pp. 1–13
Gan Y, Liang M, Dev S, Lo D, Delimitrou C (2021) Sage: practical and scalable ML-driven performance debugging in microservices. In: Proceedings of ASPLOS, pp. 135–151
Suresh A, Gandhi A (2021) ServerMore: opportunistic execution of serverless functions in the cloud. In: Proceedings of SoCC, pp. 570–584
Patel T, Tiwari D (2020) CLITE: Efficient and QoS-aware co-location of multiple latency-critical jobs for warehouse scale computers. In: Proceedings of HPCA, pp. 193–206
Chen S, Jin A, Delimitrou C, Martínez JF (2022) ReTail: opting for learning simplicity to enable QoS-aware power management in the cloud. In: Proceedings of HPCA, pp. 155–168
Nishtala R, Petrucci V, Carpenter P, Sjalander M (2020) Twig: multi-agent task management for colocated latency-critical cloud services. In: Proceedings of HPCA, pp. 167–179
Javadi SA, Suresh A, Wajahat M, Gandhi A (2019) Scavenger: a black-box batch workload resource manager for improving utilization in cloud environments. In: Proceedings of SoCC, pp. 272–285
Pons L, Feliu J, Puche J, Huang C, Petit S, Pons J, Gómez ME, Sahuquillo J (2022) Effect of hyper-threading in latency-critical multithreaded cloud applications and utilization analysis of the major system resources. Futur Gener Comput Syst 131:194–208
Article Google Scholar
Ferdman M, Adileh A, Kocberber O, Volos S, Alisafaee M, Jevdjic D, Kaynak C, Popescu AD, Ailamaki A, Falsafi B (2012) Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of ASPLOS
Canonical Ltd: Ubuntu manpage: stress-ng. Available at https://manpages.ubuntu.com/manpages/artful/man1/stress-ng.1.html. Accessed: 2022-11-20 (2020)
ESnet, NLANR, DAST: iPerf tool for network bandwidth measurements. Available at https://iperf.fr/. Accessed: 2022-11-20 (2020)
Why would a cloud computing company use the SPEC CPU2017 benchmark suite? Available at https://www.spec.org/cpu2017/publications/DO-case-study.html. Accessed: 2019-08-02 (2017)
Pons L, Petit S, Pons J, Gómez ME, Huang C, Sahuquillo J (2023) Stratus: A hardware/software infrastructure for controlled cloud research. In: Proceedings of PDP, pp. 299–306
Belalem G, Tayeb FZ, Zaoui W (2010) Approaches to improve the resources management in the simulator CloudSim. In: Proceedings of ICICA, pp. 189–196. Springer
Liang B, Dong X, Wang Y, Zhang X (2020) Memory-aware resource management algorithm for low-energy cloud data centers. Futur Gener Comput Syst 113:329–342
Article Google Scholar
Liu C, Li W, Wan J, Li L, Ma Z, Wang Y (2022) Resource management in cloud based on deep reinforcement learning. In: Proceedings of ICCCI, pp. 28–33
Badia S, Carpen-Amarie A, Lèbre A, Nussbaum L (2013) Enabling large-scale testing of IaaS cloud platforms on the grid’5000 testbed. In: Proceedings of the International Workshop on Testing the Cloud, pp. 7–12
Duplyakin D, Ricci R, Maricq A, Wong G, Duerig J, Eide E, Stoller L, Hibler M, Johnson D, Webb K, et al.: (2019) The design and operation of CloudLab. In: Proceedings of USENIX ATC, pp. 1–14
Keahey K, Anderson J, Zhen Z, Riteau P, Ruth P, Stanzione D, Cevik M, Colleran J, Gunawi HS, Hammock C, et al.: (2020) Lessons learned from the chameleon testbed. In: Proceedings of USENIX ATC, pp. 219–233
Sfakianakis Y, Marazakis M, Bilas A (2021) Skynet: Performance-driven resource management for dynamic workloads. In: Proceedings of CLOUD, pp. 527–539
Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 72–81
Cai B, Li K, Zhao L, Zhang R (2022) Less provisioning: a hybrid resource scaling engine for long-running services with tail latency guarantees. IEEE Trans Cloud Comput 10(3):1941–1957
Article Google Scholar
Ma L, Liu Z, Xiong J, Jiang D (2022) QWin: Core allocation for enforcing differentiated tail latency SLOs at shared storage backend. In: 2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS), pp. 1098–1109
Zhang Y, Chen J, Jiang X, Liu Q, Steiner IM, Herdrich AJ, Shu K, Das R, Cui L, Jiang L (2021) LIBRA: clearing the cloud through dynamic memory bandwidth management. In: Proceedings of HPCA, pp. 815–826
Dean J, Barroso LA (2013) The tail at scale. Commun ACM 56(2):74–80
Article Google Scholar
Nishtala R, Fugal H, Grimm S, Kwiatkowski M, Lee H, Li HC, McElroy R, Paleczny M, Peek D, Saab P et al.: (2013) Scaling Memcache at Facebook. In: Proceedings of NSDI, pp. 385–398
Li J, Sharma NK, Ports DR, Gribble SD (2014) Tales of the tail: Hardware, os, and application-level sources of tail latency. In: Proceedings of SOCC, pp. 1–14
Google Cloud Compute Engine - CPU platforms [online]. Available at https://cloud.google.com/compute/docs/cpu-platforms. Accessed: 2022-11-14 (2022)
Amazon’s EC2 [online]. Available at https://aws.amazon.com/ec2/instance-types/?nc1=h_ls. Accessed: 2022-11-14 (2022)
Huawei Elastic Cloud Server (ECS) [online]. Available at https://www.huaweicloud.com/intl/en-us/product/ecs.html. Accessed: 2022-11-14 (2022)
Sefraoui O, Aissaoui M, Eleuldj M (2012) OpenStack: toward an open-source solution for cloud computing. Int J Comput Appl 55(3):38–42
Google Scholar
Kivity A, Kamay Y, Laor D, Lublin U, Liguori A (2007) kvm: the Linux virtual machine monitor. In: Proceedings of the Linux Symposium, vol. 1, pp. 225–230
Barham P, Dragovic B, Fraser K, Hand S, Harris T, Ho A, Neugebauer R, Pratt I, Warfield A (2003) Xen and the art of virtualization. ACM SIGOPS Op Syst Rev 37(5):164–177
Article Google Scholar
Amazon Web Services [online]. Available at https://aws.amazon.com/ec2/faqs/?nc1=h_ls. Accessed: 2022-11-28 (2022)
Google Compute Engine FAQ [online]. Available at https://cloud.google.com/compute/docs/faq. Accessed: 2022-11-28 (2022)
Libvirt: The virtualization API [online]. Available at https://libvirt.org. Accessed: 2022-11-28 (2022)
QEMU [online]. Available at https://www.qemu.org. Accessed: 2022-11-28 (2022)
Pfaff B, Pettit J, Koponen T, Jackson E, Zhou A, Rajahalme J, Gross J, Wang A, Stringer J, Shelar P, Amidon K, Casado M (2015) The design and implementation of open vSwitch. In: Proceedings of NSDI, pp. 117–130. USENIX Association
Russell R (2008) virtio: towards a de-facto standard for virtual I/O devices. ACM SIGOPS Op Syst Rev 42(5):95–103
Article Google Scholar
DPDK [online]. Available at https://www.dpdk.org/. Accessed: 2022-11-28 (2022)
Weil S, Brandt S, Miller E, Long D, Maltzahn C (2006) Ceph: A scalable, high-performance distributed file system. In: Proceedings of OSDI, pp. 307–320
Padoin EL, Pilla LL, Castro M, Boito FZ, Alexandre Navaux PO, Méhaut J-F (2015) Performance/energy trade-off in scientific computing: the case of ARM big. LITTLE and Intel Sandy Bridge. IET Computers & Digital Techniques 9(1), 27–35
Criado J, Garcia-Gasulla M, Kumbhar P, Awile O, Magkanaris I, Mantovani F (2020) CoreNEURON: performance and energy efficiency evaluation on intel and arm CPUs. In: Proceedings of CLUSTER, pp. 540–548 . IEEE
Mitra G, Johnston B, Rendell AP, McCreath E, Zhou J (2013) Use of SIMD vector operations to accelerate application code performance on low-powered ARM and intel platforms. In: Proceedings of IPDPSW, pp. 1107–1116
Flynn P, Yi X, Yan Y (2022) Exploring source-to-source compiler transformation of OpenMP SIMD constructs for Intel AVX and Arm SVE vector architectures. In: Proceedings of PMAM, pp. 11–20
T. Gleixner, I.M (2009) Performance counters for Linux
Intel Corporation: Intel RDT Library. Available athttps://github.com/intel/intel-cmt-cat (2021)
Jia R, Yang Y, Grundy J, Keung J, Hao L (2021) A systematic review of scheduling approaches on multi-tenancy cloud platforms. Inf Softw Technol 132:106478
Article Google Scholar
Wang Z, Xu C, Agrawal K, Li J (2022) Adaptive scheduling of multiprogrammed dynamic-multithreading applications. J Parallel Distrib Comput 162:76–88
Article Google Scholar
Lu C, Ye K, Xu G, Xu C-Z, Bai T (2017) Imbalance in the cloud: An analysis on Alibaba cluster trace. In: Proceedings of Big Data, pp. 2884–2892
Liu Q, Yu Z (2018) The Elasticity and Plasticity in Semi-Containerized Co-Locating Cloud Workload: A View from Alibaba Trace. In: Proceedings of SoCC, pp. 347–360
Cortez E, Bonde A, Muzio A, Russinovich M, Fontoura M, Bianchini R (2017) Resource central: understanding and predicting workloads for improved resource management in large cloud platforms. In: Proceedings of SOSP, pp. 153–167
Intel: Improving real-time performance by utilizing cache allocation technology. Intel Corporation, April (2015)
Andrew H, Abbasi Khawar M, Marcel C (2019) Introduction to Memory Bandwidth Allocation. Available at https://software.intel.com/en-us/articles/introduction-to-memory-bandwidth-allocation
Lo D, Cheng L, Govindaraju R, Ranganathan P, Kozyrakis C (2015) Heracles: Improving resource efficiency at scale. In: Proceedings of ISCA, pp. 450–462
Yang H, Breslow A, Mars J, Tang L (2013) Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers. In: Proceedings of ISCA, pp. 607–618. Association for Computing Machinery, New York, NY, USA
Moradi H, Wang W, Zhu D (2020) DiHi: distributed and hierarchical performance modeling of multi-VM cloud running applications. In: Proceedings of HPCC/SmartCity/DSS, pp. 1–10
Gupta A, Kale LV, Gioachin F, March V, Suen CH, Lee B-S, Faraboschi P, Kaufmann R, Milojicic D (2013) The Who, What, Why, and How of High Performance Computing in the Cloud. In: Proceedings of CloudCom, vol. 1, pp. 306–314
Gupta A, Kalé LV, Milojicic D, Faraboschi P, Balle SM (2013) HPC-Aware VM Placement in Infrastructure Clouds. In: Proceedings of IC2E, pp. 11–20
Jin H-Q, Frumkin M, Yan J (1999) The OpenMP implementation of NAS parallel benchmarks and its performance
Jackson A, Turner A, Weiland M, Johnson N, Perks O, Parsons M (2019) Evaluating the arm ecosystem for high performance computing. In: Proceedings of PASC, pp. 1–11. Association for Computing Machinery
Chen S, Galon S, Delimitrou C, Manne S, Martínez JF (2017) Workload characterization of interactive cloud services on big and small server platforms. In: Proceedings of IISWC, pp. 125–134
Hammond SD, Hughes C, Levenhagen MJ, Vaughan CT, Younge AJ, Schwaller B, Aguilar MJ, Pedretti KT, Laros JH (2019) Evaluating the Marvell ThunderX2 Server Processor for HPC Workloads. In: Proceedings of HPCS, pp. 416–423
Olson RS, La Cava W, Orzechowski P, Urbanowicz RJ, Moore JH (2017) PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Mining 10(1):1–13
Article Google Scholar
Olson RS, La Cava W, Mustahsan Z, Varik A, Moore JH (2017) Data-driven Advice for Applying Machine Learning to Bioinformatics Problems. arXiv e-print. arXiv:1708.05070
Soria Pardos V (2019) Characterization of HPC applications for ARM SIMD instructions. B.S. thesis, Universitat Politècnica de Catalunya
Feliu J, Sahuquillo J, Petit S, Duato J (2013) L1-bandwidth aware thread allocation in multicore SMT processors. In: Proceedings of PACT, pp. 123–132
Slimani S, Hamrouni T, Ben Charrada F (2021) Service-oriented replication strategies for improving quality-of-service in cloud computing: a survey. Clust Comput 24:361–392
Article Google Scholar
Barroso LA, Clidaras J, Hölzle U (2013) The datacenter as a computer: an introduction to the design of warehouse-scale machines. Second Edition. https://doi.org/10.2200/S00516ED2V01Y201306CAC024
Article Google Scholar
Kang Y, Zheng Z, Lyu MR (2012) A latency-aware co-deployment mechanism for cloud-based services. In: Proceedings of CLOUD, pp. 630–637
Zhang Y, Hua W, Zhou Z, Suh GE, Delimitrou C (2021) Sinan: ML-Based and QoS-Aware resource management for cloud microservices. In: Proceedings of ASPLOS. Association for Computing Machinery, New York, NY, USA
Zhang I, Raybuck A, Patel P, Olynyk K, Nelson J, Leija OSN, Martinez A, Liu J, Simpson AK, Jayakar S et al.: (2021) The demikernel datapath os architecture for microsecond-scale datacenter systems. In: Proceedings of SOSP, pp. 195–211
Michael Bayer et al.: Mako Templates. Available at http://www.makotemplates.org/ (2019)
Mills DL (1991) Internet time synchronization: the network time protocol. IEEE Trans Commun 39(10):1482–1493
Article Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

This work has been supported by the Spanish Ministerio de Universidades under the grant FPU18/01948 and by the Spanish Ministerio de Ciencia e Innovación and European ERDF under grants PID2021-123627OB-C51 and TED2021-130233B-C32, and by Generalitat Valenciana under Grant AICO/2021/266.

Author information

Authors and Affiliations

Universitat Politècnica de València, Valencia, Spain
Lucia Pons, Salvador Petit, Julio Pons, María E. Gómez & Julio Sahuquillo

Authors

Lucia Pons
View author publications
You can also search for this author in PubMed Google Scholar
Salvador Petit
View author publications
You can also search for this author in PubMed Google Scholar
Julio Pons
View author publications
You can also search for this author in PubMed Google Scholar
María E. Gómez
View author publications
You can also search for this author in PubMed Google Scholar
Julio Sahuquillo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed equally to the work’s conception and design. Platform design and software development was performed by LP, SP, and JP. Data collection and representation were performed by LP. JS was in charge of the project administration, supervision, and data validation. JS, MEG, and SP were responsible for the funding acquisition and administration. All the authors contributed in the writing and reviewing of the manuscript.

Corresponding author

Correspondence to Lucia Pons.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethical approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Stratus resource and application manager

To ease framework reproducibility, this appendix shows the main methods used in Stratus Resource and Application Manager. Below, each of the steps performed to carry out a typical experiment with one or more VMs running applications is discussed in detail.

1.1 A.1: Execution of experiments

1) Define experiment: workload and parameters. As a prior step, the workload (i.e., VMs and applications to be run on them) and experimental conditions must be defined. To ease this task, Stratus makes use of MAKO templates [86], which provides a simple and intuitive language to specify the parameters of the experiments. Figure 14 shows an example of a MAKO template with a configuration to run the VM with the xapian Tailbench application.

The template begins with an include directive. This directive allows including detailed information (stored in applications.mako file) regarding the command to execute of the applications. In the tasks section of the template, the user must indicate which applications (app) are going to be executed, specifying their VM (domain_name, snapshot_name, ip, and port). The VCPUs are set according to the physical CPUs defined in the cpus field. Regarding the application arguments, the arguments field is used to configure application-specific input parameters to the server-side workload (e.g., number of requests). Similarly, the client-arguments field defines the client-side arguments like the number of queries per second.

The cmd section details information about the execution of the framework like the length of each interval or quantum in seconds (ti), the maximum number of intervals that will be executed (mi), the core(s) where the framework process is pinned (cpu-affinity) and the list of the events to be monitored using hardware performance counters (event).

Notice the template can also specify if VMs are only allowed to use a partition of a shared resource (LLC, memory bandwidth, network bandwidth, or disk bandwidth). For example, in Fig. 14, the VM can only make use of 4 of the 11 available LLC.

2) Execute launch script to start the manager. To start running an experiment, the user executes the launch script. First, the script prepares the execution environment. For instance, fixing the processor frequency to avoid variability among experiments.

Additionally, the server clocks of both the server and client machines are synchronized to ensure server- and client-collected metrics are aligned, using the Network Time Protocol (NTP) [87] with a known NTP time server (e.g., europe.pool.ntp.org).

When the environment is ready, the configuration file is generated with the workloads to execute and all the experiment parameters from the MAKO template. Then, the manager starts to run.

3) Prepare VMs for execution. The first step the manager performs is setting up and starting the VMs using both libvirt and Ceph libraries and utilities. To reduce the start-up overhead, the manager uses the snapshots feature of Ceph. A snapshot is a copy of the state of a VM, including the disk and main memory contents. This feature preserves a VM’s actual state and data at a given time. Therefore, this state can be reverted at any moment. For each VM, we have taken a snapshot that has already performed the OS boot process and is ready to receive the command to launch the target benchmark. Thus, we load the snapshot and then start running the VM:

Once the VMs are started and the snapshots are loaded, the number of CPUs of each VM (i.e., VCPUs) can be modified in case a multi-threaded application is going to be executed and more than one CPU is required.

After the VCPUs are added, they are enabled in the guest.

In addition, we can specify the affinity of the VCPUs to cores of the host machine (defined in cpumap):

4) Setup resource monitoring and partitioning. With QEMU, each VCPU is associated with a processor ID (PID) in the host OS. These PIDs are required to monitor hardware performance counters with Perf individually for each thread (i.e., VCPU) of the VM. Similarly, LLC and memory bandwidth monitoring is performed on a PID basis. Therefore, the manager must get the list of PIDs of the VCPUs:

The remaining resources, network, and disk bandwidth are monitored per VM. The manager also allows partitioning of the main system shared resources and assigns each VM a share of a given resource. Therefore, if specified in the template, a resource share is allocated to the VM (more details in Section XX).

5) Start running applications in the VMs. When the VMs are operative and ready to start executing the applications, an SSH command is sent to each VM to start the execution of each workload.

Stratus’ manager is adapted to support the execution of client–server workloads (e.g., TailBench benchmark suite) and best-effort or batch workloads (e.g., stressor microbenchmarks or SPEC CPU benchmarks). In the case of a client–server workload, an SSH command is sent to the client node to start running the clients, which send requests to the server (already running).

6) Perform actions in each quantum. Once the execution starts, the manager executes the main loop (see Fig. 4) for the rest of the execution time. In each iteration, the manager is suspended for a given quantum length (established in the template). Then, data is collected from different sources (e.g., hardware performance counters, Linux file system, Intel library, or libvirt) to monitor the main system resources (CPU usage, LLC occupancy, main memory, network, and disk bandwidth). Additionally, the manager is adapted to allow implementing and applying QoS policies. For instance, policies that manage resource sharing among VMs [12, 14, 20], predict interference among VMs [11, 13, 17, 18, 61] or schedule VMs [62].

7) Execution end. The main loop ends when the manager detects that all the VMs have finished running their applications, the moment at which it shuts down the running VMs.

All the data collected from the hardware performance counters and system resources are stored in CSV files, ready to be processed. Events monitored by performance hardware counters are those specified in the template.mako (events field in cmd section). The events supported can be checked with # perf list or looking at the events that are supported by the microarchitecture of the machine. To monitor the main shared resources of the system, the framework collects data regarding LLC occupancy and memory, disk, and network bandwidth (more details can be found in the following section).

Additionally, for characterization and debugging purposes, statistics and data are also collected inside the VMs. For instance, in Tailbench workloads, the clients report results such as latency per query, queries per interval, and tail latency.

1.2 A.2: Monitoring and partitioning main shared resources

Due to the high popularity of resource management research in the last few years, server processors have been provided with advanced technologies that allow monitoring and partitioning of the major system resources.

Below, we explain how monitoring and partitioning of each shared resource is implemented in Stratus without relying on any external tool.

CPU Utilization. To obtain the utilization of each CPU, we use the data collected from the file /proc/stat, which reports statistics about the kernel activity aggregated since the system first booted. Additionally, statistics are given on the CPU utilization of each VM (i.e., vCPU’s utilization) using libvirt’s function virDomainGetCPUStats.

Last Level Cache (LLC). Recently, some processor manufacturers like Intel have developed technologies that allow monitoring and partitioning of the LLC. In Intel processors, these technologies are known as Cache Monitoring Technology (CMT) and Cache Allocation Technology (CAT) [66]. Partitioning is performed using Classes of Service (CLOS), which can be defined either as groups of applications (PIDs) or as groups of logical cores to which a partition of the LLC is assigned. The LLC is partitioned in a per-way basis; that is, a cache way acts as the granularity unit of CLOS.

To monitor the exact amount of space occupied by each application, the performance event PQOS_MON_EVENT_L3_OCCUP returns LLC space occupied in bytes.

Memory Bandwidth. Recent Intel Xeon Scalable processors introduce Memory Bandwidth Allocation (MBA) [67], which allows to distribute memory bandwidth between the running applications. More precisely, it allows controlling the memory bandwidth between the L2 and the L3 (i.e., LLC) caches. Similarly to CAT, MBA works using CLOS. That is, MBA bandwidth limits apply only to CLOS, to which the user can assign tasks (PIDs) or cores. However, MBA works on a per-core basis. If the individual memory bandwidths of two applications running on the same core are limited with different values, the maximum limitation is the one that will apply to that core.

Regarding memory bandwidth monitoring, three performance events are available:

1.
PQOS_MON_EVENT_LMEM_BW monitors the local memory bandwidth (that is, reading memory from the same processor socket).
2.
PQOS_MON_EVENT_RMEM_BW monitors the remote memory bandwidth (that is, reading memory from another processor socket).
3.
PQOS_MON_EVENT_TMEM_BW monitors the total memory bandwidth.

These events return the number of bytes that have been read from memory up to the time the performance counter is polled. Therefore, to calculate the actual memory bandwidth, we must subtract the current memory reading from the previous one and divide this value by the time elapsed between both readings (i.e., the interval length).

To monitor and partition the LLC and memory bandwidth from Stratus, we have used the Intel (PQoS) RDT library [60]. This library provides methods and directives to configure and use all resource partitioning technologies. More specifically, we have used the following classes:

1.
pqos.h which contains the platform QoS API and data structure definition. Different methods are defined for LLC and memory bandwidth.
2.
os_monitoring.h which contains the methods that belong to the PQoS OS monitoring module. It is not possible to use the monitoring methods from pqos.h since their use is incompatible with Perf, therefore, this module must be used. The same methods are used to monitor the LLC occupancy and memory bandwidth. Firstly, at the start of the execution, these monitoring capabilities must be set up for each task using the method monitor_setup. For the rest of the execution, in each interval, the values of each performance event are retrieved using the method monitor_get_values.

Disk Bandwidth. I/O access to the disks can be monitored using the virsh tool or libvirt’s API. Both mechanisms offer the same functionality and allow monitoring the number of read, write, and flush operations, the number of bytes read and written, as well as the total duration of the read, write, and flush operations. To integrate disk I/O monitoring into Stratus without relying on any external tool, we make use of libvirt’s API. In the libvirt source code, the function cmdDomblkstat is the one that performs the main steps to monitor disk I/O. This function performs different checks on the requested options, retrieves the target device to be monitored, calls the function virDomainBlockStats, which is the one that actually collects the disk bandwidth statistics and, finally, reports these statistics in a human-readable way.

Regarding the partitioning disk I/O capabilities among different VMs, Stratus makes use of the blkdeviotune function of libvirt’s API. This method allows to set the maximum overall, read, and write bandwidths either in bytes per second or in I/O operations per second.

Network Bandwidth. The number of network packets or bytes that go through a network interface can be monitored with libvirt’s API. The functions cmdDomIfstat and virDomainInterfaceStats of the API are employed to obtain the bandwidth consumed by a network interface.

Regarding network partitioning, libvirt allows limiting the bandwidth consumed by a given VM in both directions: inbound and outbound. Three main parameters can be specified to limit the consumption:

average: The target average bandwidth consumption in KB/s.
peak: The maximum allowed consumption in KB/s.
burst: The maximum number of KB/s allowed at peak speed.

Stratus makes use of libvirt’s API function virDomainSetInterfaceParameters to establish the desired bandwidth limit for a given VM.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Pons, L., Petit, S., Pons, J. et al. A modular approach to build a hardware testbed for cloud resource management research. J Supercomput 80, 10552–10583 (2024). https://doi.org/10.1007/s11227-023-05856-2

Download citation

Accepted: 02 December 2023
Published: 27 December 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s11227-023-05856-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A modular approach to build a hardware testbed for cloud resource management research

Abstract

Access this article

Similar content being viewed by others

Energy efficiency in cloud computing data centers: a survey on software technologies

Fault-tolerant allocation of deadline-constrained tasks through preemptive migration in heterogeneous cloud environments

A survey of compute nodes with 100 TFLOPS and beyond for supercomputers

Data availability

Notes

References

Acknowledgements

Funding