ABSTRACT
Unarbitrated contention over shared resources at different levels of the memory hierarchy represents a major source of temporal interference. Hardware manufacturers are increasingly more receptive to issues with temporal interference and are starting to propose concrete solutions to mitigate the problem. Intel Resource Director Technology (RDT) represents one such attempt. Given the wide adoption of Intel platforms, RDT features can be an invaluable asset for the consolidation of real-time systems on complex multi- and many-core machines.
Unfortunately, to date, a systematic analysis of the capabilities introduced by the RDT framework has not yet been conducted. Moreover, no clear understanding has been matured about the implementation-specific behavior of RDT primitives across processor generations. And ultimately, the ability of RDT to provide real-time guarantees is yet to be established.
In our work, we aim at conducting a systematic investigation of the RDT mechanisms from a real-time perspective. We experimentally evaluate the functionality and interpretability of RDT-aided allocation and monitoring controls across the two most recent processor generations. Our evaluations show that while some features like Cache Allocation Technology (CAT) yield promising results, the implementation of other primitives such as Memory Bandwidth Allocation (MBA) has much room for improvement. Moreover, in some cases, the presented interfaces range from blurry to incomplete, as is the case for MBA and Memory Bandwidth Monitoring (MBM).
- Luca Abeni, Luigi Palopoli, Giuseppe Lipari, and Jonathan Walpole. 2002. Analysis of a Reservation-Based Feedback Scheduler. In 23rd IEEE Real-Time Systems Symposium, 2002. RTSS 2002. IEEE, 71–80.Google ScholarCross Ref
- Homa Aghilinasab, Waqar Ali, Heechul Yun, and Rodolfo Pellizzoni. 2020. Dynamic Memory Bandwidth Allocation for Real-Time GPU-based SoC Platforms. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 11(2020), 3348–3360.Google ScholarCross Ref
- Sebastian Altmeyer, Roeland Douma, Will Lunniss, and Robert I Davis. 2014. Outstanding Paper: Evaluation of Cache Partitioning for Hard Real-Time Systems. In 2014 26th Euromicro Conference on Real-Time Systems. IEEE, 15–26.Google Scholar
- The Linux Kernel Archives. 2001. NO_HZ: Reducing Scheduling-Clock Ticks. https://www.kernel.org/doc/Documentation/timers/NO_HZ.txt. Accessed on 04.12.2022.Google Scholar
- Arm. 2018-2020. Arm Architecture Reference Manual Supplement Memory System Resource Partitioning and Monitoring(MPAM), for Armv8-A. Accessed on 10.16.2020.Google Scholar
- Arm. 2020. ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition. Accessed on 10.16.2021.Google Scholar
- Michael Bechtel and Heechul Yun. 2019. Denial-of-service Attacks on Shared Cache in Multicore: Analysis and Prevention. In 2019 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 357–367.Google Scholar
- Michael Bechtel and Heechul Yun. 2021. Memory-Aware Denial-of-Service Attacks on Shared Cache in Multicore Real-Time Systems. IEEE Trans. Comput. (2021).Google ScholarCross Ref
- Dai Bui, Edward A. Lee, Isaac Liu, Hiren Patel, and Jan Reineke. 2011. Temporal Isolation on Multiprocessing Architectures. In Design Automation Conference (DAC). 274 – 279. http://chess.eecs.berkeley.edu/pubs/839.htmlGoogle Scholar
- Intel Corporation. 2015. Intel® Resource Director Technology (Intel® RDT) Framework. https://www.intel.com/content/www/us/en/architecture-and-technology/resource-director-technology.html. Accessed on 03.09.2019.Google Scholar
- Intel Corporation. 2019. Welcome to the intel-cmt-cat Wiki, https://github.com/intel/intel-cmt-cat/wiki. Accessed on 01.23.2022.Google Scholar
- Cédric Courtaud, Julien Sopena, Gilles Muller, and Daniel Gracia Pérez. 2019. Improving Prediction Accuracy of Memory Interferences for Multicore Platforms. In 2019 IEEE Real-Time Systems Symposium (RTSS). IEEE, 246–259.Google Scholar
- Farzad Farshchi, Qijing Huang, and Heechul Yun. 2020. BRU: Bandwidth Regulation Unit for Real-Time Multicore Processors. In 2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS). 364–375. https://doi.org/10.1109/RTAS48715.2020.00011Google Scholar
- Farzad Farshchi, Qijing Huang, and Heechul Yun. 2020. Bru: Bandwidth regulation unit for real-time multicore processors. In 2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 364–375.Google ScholarCross Ref
- Farzad Farshchi, Prathap Kumar Valsan, Renato Mancuso, and Heechul Yun. 2018. Deterministic Memory Abstraction and Supporting Multicore System Architecture. In 30th Euromicro Conference on Real-Time Systems (ECRTS 2018) (Dagstuhl, Germany) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 106), Sebastian Altmeyer (Ed.). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Barcelona, Spain, 1:1–1:25. https://doi.org/10.4230/LIPIcs.ECRTS.2018.1Google Scholar
- Alireza Farshin, Amir Roozbeh, Gerald Q Maguire Jr, and Dejan Kostić. 2019. Make the Most out of Last Level Cache in Intel Processors. In Proceedings of the Fourteenth EuroSys Conference 2019. 1–17.Google ScholarDigital Library
- Andrei Frumusanu. 2021. Intel 3rd Gen Xeon Scalable (Ice Lake Sp) review: Generationally Big, competitively small. https://www.anandtech.com/show/16594/intel-3rd-gen-xeon-scalable-review/4Google Scholar
- Golsana Ghaemi, Dharmesh Tarapore, and Renato Mancuso. 2021. Governing with Insights: Towards Profile-Driven Cache Management of Black-Box Applications. In 33rd Euromicro Conference on Real-Time Systems (ECRTS 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik.Google Scholar
- Robert Gifford, Neeraj Gandhi, Linh Thi Xuan Phan, and Andreas Haeberlen. 2021. DNA: Dynamic Resource Allocation for Soft Real-Time Multicore Systems. In 2021 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 196–209.Google Scholar
- Nan Guan, Mingsong Lv, Wang Yi, and Ge Yu. 2014. WCET Analysis with MRU Cache: Challenging LRU for Predictability. ACM Transactions on Embedded Computing Systems (TECS) 13, 4s(2014), 1–26.Google ScholarDigital Library
- Red Hat. 2011. Isolating CPUs Using Tuned-Profiles-Realtime. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_for_real_time/7/html/tuning_guide/isolating_cpus_using_tuned-profiles-realtime. Accessed on 01.23.2019.Google Scholar
- Herdrich, Andrew J. and Cornu, Marcel and Abbasi, Khawar Munir. 2019. Introduction to Memory Bandwidth Allocation. Data Center Documentation (March 2019). https://software.intel.com/en-us/articles/introduction-to-memory-bandwidth-allocation Accessed on 01.23.2021.Google Scholar
- Intel Cloud Technology. 2017. Are Noisy Neighbors in Your Data Center Keeping You Up at Night?Technical Report. Accessed on 08.11.2019.Google Scholar
- Author Andi Kleen Intel Corporation. 2009. Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 2A: Instruction Set Reference, A-M, 3-180 CPUID reference. Accessed on 01.23.2022.Google Scholar
- IntelCorporation. 2016. Increasing Platform Determinism with Platform Quality of Service for the Data Plane Development Kit. 8–9 pages.Google Scholar
- IntelCorporation. 2019. Intel 64 and IA-32 Architectures Software Developer’s Manual (volume 3 ed.). 17–64–17–68 pages.Google Scholar
- IntelCorporation. 2019. Intel® Resource Director Technology (Intel® RDT) on 2nd Generation Intel® Xeon® Scalable Processors Reference Manual. 4–24 pages.Google Scholar
- IntelCorporation. 2021. Intel® Architecture Instruction Set Extensions and Future Features. 10–2–10–4 pages.Google Scholar
- Gorka Irazoqui, Thomas Eisenbarth, and Berk Sunar. 2015. Systematic Reverse Engineering of Cache Slice Selection in Intel Processors. In 2015 Euromicro Conference on Digital System Design. IEEE, 629–636.Google Scholar
- Hyoseung Kim, Dionisio De Niz, Björn Andersson, Mark Klein, Onur Mutlu, and Ragunathan Rajkumar. 2014. Bounding Memory Interference Delay in COTS-based Multi-Core Systems. In 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 145–154.Google Scholar
- Namhoon Kim, Jeremy P Erickson, and James H Anderson. 2014. Mixed-criticality on Multicore (MC2): A Status Report. OSPERT 2014 (2014), 45.Google Scholar
- NG Chetan Kumar, Sudhanshu Vyas, Ron K Cytron, Christopher D Gill, Joseph Zambreno, and Phillip H Jones. 2014. Cache Design for Mixed Criticality Real-Time Systems. In 2014 IEEE 32nd International Conference on Computer Design (ICCD). IEEE, 513–516.Google Scholar
- Linux. 2014. Performance Analysis Tools for Linux. Accessed on 01.23.2022.Google Scholar
- Fangfei Liu, Qian Ge, Yuval Yarom, Frank Mckeen, Carlos Rozas, Gernot Heiser, and Ruby B Lee. 2016. Catalyst: Defeating Last-Level Cache Side channel Attacks in Cloud Computing. In 2016 IEEE international symposium on high performance computer architecture (HPCA). IEEE, 406–418.Google ScholarCross Ref
- Tamara Lugo, Santiago Lozano, Javier Fernández, and Jesus Carretero. 2022. A Survey of Techniques for Reducing Interference in Real-Time Applications on Multicore Platforms. IEEE Access 10(2022), 21853–21882. https://doi.org/10.1109/ACCESS.2022.3151891Google ScholarCross Ref
- Jiuyue Ma, Xiufeng Sui, Ninghui Sun, Yupeng Li, Zihao Yu, Bowen Huang, Tianni Xu, Zhicheng Yao, Yun Chen, Haibin Wang, 2015. Supporting differentiated services in computers via programmable architecture for resourcing-on-demand (PARD). In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems. 131–143.Google ScholarDigital Library
- Cláudio Maia, Luis Nogueira, Luis Miguel Pinho, and Daniel Gracia Pérez. 2016. A Closer Look into the AER Model. In 2016 IEEE 21st International Conference on Emerging Technologies and Factory Automation (ETFA). IEEE, 1–8.Google ScholarDigital Library
- Claire Maiza, Hamza Rihani, Juan M Rivas, Joël Goossens, Sebastian Altmeyer, and Robert I Davis. 2018. A Survey of Timing Verification Techniques for Multi-Core Real-Time Systems. Technical Report. Verimag Research Report TR-2018-9 (Technical Report).Google Scholar
- Renato Mancuso, Roman Dudko, Emiliano Betti, Marco Cesati, Marco Caccamo, and Rodolfo Pellizzoni. 2013. Real-Time Cache Management Framework for Multi-Core Architectures. In 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS). 45–54. https://doi.org/10.1109/RTAS.2013.6531078Google ScholarDigital Library
- Renato Mancuso, Rodolfo Pellizzoni, Marco Caccamo, Lui Sha, and Heechul Yun. 2015. WCET (m) Estimation in Multi-Core Systems using Single Core Equivalence. In 2015 27th Euromicro Conference on Real-Time Systems. IEEE, 174–183.Google ScholarDigital Library
- Renato Mancuso, Heechul Yun, and Isabelle Puaut. 2019. Impact of DM-LRU on WCET: A Static Analysis Approach. Leibniz international proceedings in informatics 133 (2019).Google Scholar
- Clémentine Maurice, Nicolas le Scouarnec, Christoph Neumann, Olivier Heen, and Aurélien Francillon. 2015. Reverse engineering Intel last-level cache complex addressing using performance counters. In International Symposium on Recent Advances in Intrusion Detection. Springer, 48–65.Google ScholarDigital Library
- Thomas Moscibroda and Onur Mutlu. 2007. Memory Performance Attacks: Denial of Memory Service in Multi-Core Systems. In USENIX Security Symposium. USENIX.Google Scholar
- Marco Pagani, Enrico Rossi, Alessandro Biondi, Mauro Marinoni, Giuseppe Lipari, and Giorgio Buttazzo. 2019. A Bandwidth Reservation Mechanism for AXI-based Hardware Accelerators on FPGAs. In 31st Euromicro Conference on Real-Time Systems (ECRTS 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.Google Scholar
- Linux Man Pages. 2004. numactl - Control NUMA policy for processes or shared memory. https://linux.die.net/man/8/numactl. Accessed on 04.19.2019.Google Scholar
- Jinsu Park, Seongbeom Park, and Woongki Baek. 2019. CoPart: Coordinated Partitioning of Last-Level Cache and Memory Bandwidth for Fairness-Aware Workload Consolidation on Commodity Servers. In Proceedings of the Fourteenth EuroSys Conference 2019. 1–16.Google ScholarDigital Library
- Rodolfo Pellizzoni, Emiliano Betti, Stanley Bak, Gang Yao, John Criswell, Marco Caccamo, and Russell Kegley. 2011. A predictable execution model for COTS-based embedded systems. In 2011 17th IEEE Real-Time and Embedded Technology and Applications Symposium. IEEE, 269–279.Google ScholarDigital Library
- Lui Sha, Marco Caccamo, Renato Mancuso, Jung-Eun Kim, Man-Ki Yoon, Rodolfo Pellizzoni, Heechul Yun, Russell B Kegley, Dennis R Perlman, Greg Arundale, 2016. Real-time Computing on Multicore Processors. Computer 49, 9 (2016), 69–77.Google Scholar
- Parul Sohal, Rohan Tabish, Ulrich Drepper, and Renato Mancuso. 2020. E-WarP: A System-Wide Framework for Memory Bandwidth Profiling and Management. In 2020 IEEE Real-Time Systems Symposium (RTSS). IEEE, 345–357.Google Scholar
- Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, and Onur Mutlu. 2015. The Application Slowdown Model: Quantifying and Controlling the Impact of Inter-Application Interference at Shared Caches and Main Memory. In Proceedings of the 48th International Symposium on Microarchitecture. ACM, 62–75.Google ScholarDigital Library
- Noriaki Suzuki, Hyoseung Kim, Dionisio De Niz, Bjorn Andersson, Lutz Wrage, Mark Klein, and Ragunathan Rajkumar. 2013. Coordinated Bank and Cache Coloring for Temporal Protection of Memory Accesses. In 2013 IEEE 16th International Conference on Computational Science and Engineering. IEEE, 685–692.Google Scholar
- Taylor IoT Kidd. 2014. Power Management States: P-States, C-States, and Package C-States. Intel® Xeon Phi™ Processor Documentation (April 2014). https://software.intel.com/en-us/articles/power-management-states-p-states-c-states-and-package-c-statesGoogle Scholar
- Dan Terpstra, Heike Jagode, Haihang You, and Jack Dongarra. 2010. Collecting performance data with PAPI-C. In Tools for High Performance Computing 2009. Springer, 157–173.Google Scholar
- Theo Ungerer, Francisco Cazorla, Pascal Sainrat, Guillem Bernat, Zlatko Petrov, Christine Rochange, Eduardo Quinones, Mike Gerdes, Marco Paolieri, Julian Wolf, 2010. Merasa: Multicore Execution of Hard Real-Time applications Supporting Analyzability. IEEE Micro 30, 5 (2010), 66–75.Google ScholarDigital Library
- Hiroyuki Usui, Lavanya Subramanian, Kevin Kai-Wei Chang, and Onur Mutlu. 2016. DASH: Deadline-Aware High-Performance Memory Scheduler for Heterogeneous Systems with Hardware Accelerators. ACM Transactions on Architecture and Code Optimization (TACO) 12, 4(2016), 1–28.Google ScholarDigital Library
- Prathap Kumar Valsan and Heechul Yun. 2015. MEDUSA: A Pedictable and High-Performance DRAM Controller for Multicore based Embedded Systems. In 2015 IEEE 3rd International Conference on Cyber-Physical Systems, Networks, and Applications. IEEE, 86–93.Google ScholarDigital Library
- Prathap Kumar Valsan, Heechul Yun, and Farzad Farshchi. 2016. Taming Non-Blocking Caches to Improve Isolation in Multicore Real-Time systems. In 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 1–12.Google ScholarCross Ref
- Bryan C Ward, Jonathan L Herman, Christopher J Kenna, and James H Anderson. 2013. Outstanding Paper Award: Making Shared Caches more Predictable on Multicore Platforms. In 2013 25th Euromicro Conference on Real-Time Systems. IEEE, 157–167.Google ScholarDigital Library
- Yaocheng Xiang, Chencheng Ye, Xiaolin Wang, Yingwei Luo, and Zhenlin Wang. 2019. EMBA: Efficient Memory Bandwidth Allocation to Improve Performance on Intel Commodity Processor. In Proceedings of the 48th International Conference on Parallel Processing. 1–12.Google ScholarDigital Library
- Meng Xu, Robert Gifford, and Linh Thi Xuan Phan. 2019. Holistic Multi-Resource Allocation for Multicore Real-Time Virtualization. In Proceedings of the 56th Annual Design Automation Conference (DAC). IEEE, 1–6.Google ScholarDigital Library
- Meng Xu, Linh Thi Xuan Phan, Hyon-Young Choi, and Insup Lee. 2016. Analysis and Implementation of Global Preemptive Fixed-Priority Scheduling with Dynamic Cache Allocation. In 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 1–12.Google Scholar
- Meng Xu, Linh Thi, Xuan Phan, Hyon-Young Choi, and Insup Lee. 2017. vCAT: Dynamic Cache Management using CAT Virtualization. In 2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 211–222.Google ScholarCross Ref
- Heechul Yun, Waqar Ali, Santosh Gondi, and Siddhartha Biswas. 2016. BWLOCK: A Dynamic Memory Access Control Framework for Soft Real-Time Applications on Multicore Platforms. IEEE Trans. Comput. 66, 7 (2016), 1247–1252.Google ScholarDigital Library
- Heechul Yun, Renato Mancuso, Zheng-Pei Wu, and Rodolfo Pellizzoni. 2014. PALLOC: DRAM Bank-Aware Memory Allocator for Performance Isolation on Multicore Platforms. In 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 155–166.Google Scholar
- Heechul Yun, Rodolfo Pellizzon, and Prathap Kumar Valsan. 2015. Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems. In 2015 27th Euromicro Conference on Real-Time Systems. IEEE, 184–195.Google Scholar
- Heechul Yun, Gang Yao, Rodolfo Pellizzoni, Marco Caccamo, and Lui Sha. 2013. Memguard: Memory bandwidth Reservation System for Efficient Performance Isolation in Multi-Core Platforms. In 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 55–64.Google Scholar
- Yanqi Zhou and David Wentzlaff. 2016. MITTS: Memory Inter-Arrival Time Traffic Shaping. ACM SIGARCH Computer Architecture News 44, 3 (2016), 532–544.Google ScholarDigital Library
Recommendations
Enabling Hybrid PCM Memory System with Inherent Memory Management
RACS '16: Proceedings of the International Conference on Research in Adaptive and Convergent SystemsReplacing the traditional volatile main memory, e.g., DRAM, with a non-volatile phase change memory (PCM) has become a possible solution to reduce the energy consumption of computing systems. To further reduce the bit cost of PCM, the development trend ...
NVM duet: unified working memory and persistent store architecture
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systemsEmerging non-volatile memory (NVM) technologies have gained a lot of attention recently. The byte-addressability and high density of NVM enable computer architects to build large-scale main memory systems. NVM has also been shown to be a promising ...
A case for small row buffers in non-volatile main memories
ICCD '12: Proceedings of the 2012 IEEE 30th International Conference on Computer Design (ICCD 2012)DRAM-based main memories have read operations that destroy the read data, and as a result, must buffer large amounts of data on each array access to keep chip costs low. Unfortunately, system-level trends such as increased memory contention in multi-...
Comments