Abstract
The proliferation of multi-core, accelerator-enabled embedded systems has introduced new opportunities to consolidate real-time systems of increasing complexity. But the road to build confidence on the temporal behavior of co-running applications has presented formidable challenges. Most prominently, the main memory subsystem represents a performance bottleneck for both CPUs and accelerators. And industry-viable frameworks for full-system main memory management and performance analysis are past due. In this paper, we propose our Envelope-aWare Predictive model, or E-WarP for short. E-WarP is a methodology and technological framework to: (1) analyze the memory demand of applications following a profile-driven approach; (2) make realistic predictions on the temporal behavior of workload deployed on CPUs and accelerators; and (3) perform saturation-aware system consolidation. This work aims at providing the technological foundations as well as the theoretical grassroots for truly workload-aware analysis of real-time systems. This work combines traditional CPU-centric bandwidth regulation techniques with state-of-the-art hardware support for memory traffic shaping via the ARM QoS extensions. We make three key observations. First, our profile-driven methodology achieves, on average, 6% over-prediction on the runtime of bandwidth-regulated applications. Second, we experimentally validate that the calculated bounds hold system-wide if the main memory subsystem operates below saturation. Third, we show that the E-WarP methodology is practical even when applications exhibit input-dependent memory access patterns. We provide a full implementation of our techniques on a commercial platform (NXP S32V234).
Similar content being viewed by others
Notes
Contributions indicated with a * are new additions in the journal extension.
This was required to overcome the lack of a PSCI firmware provided by the vendor to control CPU shutdown.
https://github.com/rntmancuso/jailhouse-rt
The DRAM operates at half the frequency of the CPUs.
Figure 15b, c: original photos by Alexander Klein and Stefan Wernthaler, respectively, from https://www.stereoscopy.com/; Figure 15e, f: original video frames from the Visual Tracker Benchmark, respectively Basketball and CarScale data sets available at http://cvlab.hanyang.ac.kr/tracker_benchmark/datasets.html. The original photos have been scaled and/or cropped to match the same resolution and aspect ratios as the default SD-VBS image files.
References
Agrawal A, Fohler G, Freitag J, Nowotsch J, Uhrig S, Paulitsch M (2017) Contention-aware dynamic memory bandwidth isolation with predictability in COTS multicores: an avionics case study. In: 29th Euromicro conference on real-time systems (ECRTS 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik
Agrawal A, Mancuso R, Pellizzoni R, Fohler G (2018) Analysis of dynamic memory bandwidth regulation in multi-core real-time systems. IEEE Real-Time Syst Symp (RTSS) 2018:230–241
Akesson B, Goossens K, Ringhofer M (2007) Predator: a predictable SDRAM memory controller. In: 2007 5th IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis (CODES + ISSS). pp 251–256
Altmeyer S, Burguière CM (2011) Cache-related preemption delay via useful cache blocks: survey and redefinition. J Syst Architect 57(7):707–719
Altmeyer S, Maiza C, Reineke J (2010) Resilience analysis: tightening the CRPD bound for set-associative caches. ACM Sigplan Notices 45(4):153–162
ARM (2010) AMBA network interconncet(NIC-301) technical reference manual. accessed 07 Jan 2020
ARM (2011) ARM$\text{\textregistered} $ CoreLink$^{\rm TM}$ QoS-301 network interconnect advanced quality of service. accessed 07 Jan 2020
ARM (2013) ARM$\text{\textregistered} $ CoreLink$^{\rm TM}$ QoS-400 network interconnect advanced quality of service. accessed 07 Jan 2020
Arm (2018–2020) arm architecture reference manual supplement memory system resource partitioning and monitoring (MPAM), for Armv8-A. accessed 16 Oct 2020
Bui D, Lee E, Liu I, Patel H, Reineke J (2011) Temporal isolation on multiprocessing architectures. In: 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC). pp 274–279
C. A. S. Team (2016) Multi-core processors position paper. accessed 07 Jan 2020
Dall C, Li S-W, Lim JT, Nieh J, Koloventzos G (2016) ARM virtualization: performance and architectural implications. In: (2016) ACM/IEEE 43rd annual international symposium on computer architecture (ISCA). IEEE, pp 304–316
Dinges P, Agha G (2014) Targeted test input generation using symbolic-concrete backward execution. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering, ser. ASE ’14. New York, NY, USA: Association for Computing Machinery, pp 31–36. https://doi.org/10.1145/2642937.2642951
Freitag J, Uhrig S, Ungerer T (2018) Virtual timing isolation for mixed-criticality systems. In: 30th Euromicro conference on real-time systems (ECRTS 2018) ser. Leibniz. In: Altmeyer S (ed) International proceedings in informatics (LIPIcs), vol. 106. Dagstuhl, Germany: Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. pp 13:1–13:23. http://drops.dagstuhl.de/opus/volltexte/2018/8990
Gracioli G, Tabish R, Mancuso R, Mirosanlou R, Pellizzoni R, Caccamo M (2019) Designing mixed criticality applications on modern heterogeneous MPSoC platforms. In: 31st Euromicro conference on real-time systems (ECRTS 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik
Gustafson JL (2011) Little’s law. Springer, Boston, pp 1038–1041
Hassan M (2019) Reduced latency DRAM for multi-core safety-critical real-time systems. Real-Time Syst 56:1–36
Hassan M, Pellizzoni R (2020) Analysis of memory-contention in heterogeneous COTS MPSoCs (ECRTS2020)
Houdek P, Sojka M, Hanzálek Z (2017) Towards predictable execution model on ARM-based heterogeneous platforms. In: 2017 IEEE 26th international symposium on industrial electronics (ISIE). IEEE pp. 1297–1302
Intel (2019) Resource director technology reference manual. accessed 07 Jan 2020
Kim H, Rajkumar R (2016) Real-time cache management for multi-core virtualization. Int Conf Embedded Softw (EMSOFT) 2016:1–10
Kim H, De Niz D, Andersson B, Klein M, Mutlu O, Rajkumar R (2014) Bounding memory interference delay in COTS-based multi-core systems. In: 2014 IEEE 19th real-time and embedded technology and applications symposium (RTAS). pp 145–154
Kiszka J, Sinitsin V, Schild H, contributors, Jailhouse Hypervisor. accessed 07 Jan 2020 https://github.com/siemens/jailhouse
Kloda MST, Mancuso R, Capodieci N, Valente P, Bertogna M (2019) Deterministic memory hierarchy and virtualization for modern multi-core embedded systems. In: 25th IEEE real-time and embedded technology and applications symposium (RTAS 2019), Montreal, Canada, conference, pp 1–14
Li Y, Akesson K, Goossens K (2016) Architecture and analysis of a dynamically-scheduled real-time memory controller. Real-Time Syst 52(5):675–729
Maiza C, Rihani H, Rivas JM, Goossens J, Altmeyer S, Davis RI (2019) A survey of timing verification techniques for multi-core real-time systems. ACM Comput. Surv. 52(3):1–38. https://doi.org/10.1145/3323212
Mancuso R, Dudko R, Betti E, Cesati M, Caccamo M, Pellizzoni R (2013) Real-time cache management framework for multi-core architectures. In: 19th IEEE real-time and embedded technology and applications symposium (RTAS 2013), Philadelphia, PA, USA. pp 45–54
Mancuso R, Pellizzoni R, Caccamo M, Sha L, Yun H (2015) WCET(m) estimation in multi-core systems using single core equivalence. In: 2015 27th Euromicro conference on real-time systems, pp 174–183
Modica P, Biondi A, Buttazzo G, Patel A (2018) Supporting temporal and spatial isolation in a hypervisor for ARM multicore platforms. IEEE Int Conf Ind Technol (ICIT) 2018:1651–1657
Neill R, Drebes A, Pop A (2017) Fuse: accurate multiplexing of hardware performance counters across executions. ACM Trans Archit Code Optim (TACO) 14(4):1–26
Nelissen G, Fonseca J, Raravi G, Nélis V (2015) Timing analysis of fixed priority self-suspending sporadic tasks. In: 2015 27th Euromicro conference on real-time systems. pp 80–89
Nguyen KT (2016) Introduction to memory bandwidth monitoring in the Intel$\text{\textregistered} $ Xeon$\text{\textregistered} $ processor. accessed 07 Jan 2020
NXP (2015) P4080 multicore communication processor reference manual. accessed 07 Jan 2020
NXP (2016) QorIQ T2080 reference manual. accessed 07 Jan 2020
NXP (2020a) P-series in QorIQ processing platforms
NXP (2020b) T-series in QorIQ processing platforms
NXP (2020) S32V234 reference manual. accessed 07 Jan 2020
Pagani M, Balsini A, Biondi A, Marinoni M, Buttazzo G (2017) A Linux-based support for developing real-time applications on heterogeneous platforms with dynamic FPGA reconfiguration. In: 2017 30th IEEE international system-on-chip conference (SOCC). pp 96–101
Pellizzoni R, Yun H (2016) Memory servers for multicore systems. In: IEEE Real-time and embedded technology and applications symposium (RTAS). pp. 1–12
Roozkhosh S, Mancuso R (2020) The potential of programmable logic in the middle: cache bleaching. In: 2020 IEEE real-time and embedded technology and applications symposium (RTAS). IEEE pp 296–309
Scirdino C, Cuomoand L, Solieri M, Sojka M (2018) HERCULES: high-performance real-time architectures for low-power embedded systems. accessed 07 Jan 2020
Serrano-Cases A, Reina JM, Abella J, Mezzetti E, Cazorla FJ (2021) Leveraging hardware QoS to control contention in the Xilinx Zynq UltraScale+ MPSoC
Sohal P, Tabish R, Drepper U, Mancuso R (2020) E-WarP: a system-wide framework for memory bandwidth profiling and management. In: 2020 IEEE real-time systems symposium (RTSS), pp 345–357
Valsan PK, Yun H (2015) MEDUSA: a predictable and high-performance DRAM controller for multicore based embedded systems. In: 2015 IEEE 3rd international conference on cyber-physical systems, networks, and applications. pp 86–93
Venkata SK, Ahn I, Jeon D, Gupta A, Louie C, Garcia S, Belongie S, Taylor MB (2009) SD-VBS: the san diego vision benchmark suite. In: 2009 IEEE international symposium on workload characterization (IISWC). IEEE, pp 55–64
Vivante, Vega Cores for 3D. accessed 07 Jan 2020. http://www.vivantecorp.com/index.php/en/technology/3d.html
Ward BC, Herman JL, Kenna CJ, Anderson JH (2013) Outstanding paper award: making shared caches more predictable on multicore platforms. In: 2013 25th Euromicro conference on real-time systems. IEEE, pp 157–167
Xilinx (2016) ZCU102 user guide. accessed 07 Jan 2020
Xilinx (2017) AXI4 reference guide. accessed 07 Jan 2020
Yao G, Yun H, Wu ZP, Pellizzoni R, Caccamo M, Sha L (2016) Schedulability analysis for memory bandwidth regulated multicore real-time systems. IEEE Trans Comput 65(2):601–614
Ye Y, West R, Cheng Z, Li Y (2014) Coloris: a dynamic cache partitioning system using page coloring. In: 2014 23rd international conference on parallel architecture and compilation techniques (PACT). IEEE. pp 381–392
Yun H, Yao G, Pellizzoni R, Caccamo M, Sha L (2013) MemGuard: memory bandwidth reservation system for efficient performance isolation in multi-core platforms. In: 2013 IEEE 19th real-time and embedded technology and applications symposium (RTAS). pp 55–64
Yun H, Mancuso R, Wu Z-P, Pellizzoni R (2014) PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms. In: IEEE 19th real-time and embedded technology and applications symposium (RTAS). IEEE pp 155–166
Yun H, Ali W, Gondi S, Biswas S (2017) BWLOCK: a dynamic memory access control framework for soft real-time applications on multicore platforms. IEEE Trans Comput 66(7):1247–1252
Acknowledgements
The material presented in this paper is based upon work supported by the National Science Foundation (NSF) under Grant Number CCF-2008799. The work was also supported through the Red Hat Research program. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the NSF.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sohal, P., Tabish, R., Drepper, U. et al. Profile-driven memory bandwidth management for accelerators and CPUs in QoS-enabled platforms. Real-Time Syst 58, 235–274 (2022). https://doi.org/10.1007/s11241-022-09382-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11241-022-09382-x