Abstract
Adoption of multi- and many-core processors in real-time systems has so far been slowed down, if not totally barred, due do the difficulty in providing analytical real-time guarantees on worst-case execution times. The Predictable Execution Model (PREM) has been proposed to solve this problem, but its practical support requires significant code refactoring, a task better suited for a compilation tool chain than human programmers. Implementing a PREM compiler presents significant challenges to conform to PREM requirements, such as guaranteed upper bounds on memory footprint and the generation of efficient schedulable non-preemptive regions. This article presents a comprehensive description on how a PREM compiler can be implemented, based on several years of experience from the community. We provide accumulated insights on how to best balance conformance to real-time requirements and performance and present novel techniques that extend the applicability from simple benchmark suites to real-world applications. We show that code transformed by the PREM compiler enables timing predictable execution on modern commercial off-the-shelf hardware, providing novel insights on how PREM can protect 99.4% of memory accesses on random replacement policy caches at only 16% performance loss on benchmarks from the PolyBench benchmark suite. Finally, we show that the requirements imposed on the programming model are well-aligned with current coding guidelines for timing critical software, promoting easy adoption.
- Alexy Torres Aurora Dugo, Jean-Baptiste Lefoul, Felipe Gohring De Magalhaes, Dahman Assal, and Gabriela Nicolescu. 2019. Cache locking content selection algorithms for ARINC-653 compliant RTOS. ACM Trans. Embed. Comput. Syst. 18, 5s (Oct. 2019). DOI:https://doi.org/10.1145/3358196Google Scholar
- Hyoseung Kim and Ragunathan (Raj) Rajkumar. 2017. Predictable shared cache management for multi-core real-time virtualization. ACM Trans. Embed. Comput. Syst. 17, 1 (Dec. 2017). DOI:https://doi.org/10.1145/3092946Google Scholar
- Heechul Yun, Gang Yao, Rodolfo Pellizzoni, Marco Caccamo, and Lui Sha. 2013. Memguard: Memory bandwidth reservation system for efficient performance isolation in multi-core platforms. In RTAS’13. IEEE.Google Scholar
- Heechul Yun, Waqar Ali, Santosh Gondi, and Siddhartha Biswas. 2017. BWLOCK: A dynamic memory access control framework for soft real-time applications on multicore platforms. IEEE Trans. Comput. 66, 7 (2017).Google ScholarDigital Library
- Sakshi Tiwari, Shreshth Tuli, Isaar Ahmad, Ayushi Agarwal, Preeti Ranjan Panda, and Sreenivas Subramoney. 2019. REAL: REquest arbitration in last level caches. ACM Trans. Embed. Comput. Syst. 18, 6 (Nov. 2019). DOI:https://doi.org/10.1145/3362100Google Scholar
- Dominic Oehlert, Selma Saidi, and Heiko Falk. 2019. Code-inherent traffic shaping for hard real-time systems. ACM Trans. Embed. Comput. Syst. 18, 5s (Oct. 2019). DOI:https://doi.org/10.1145/3358215Google ScholarDigital Library
- Ahmed Alhammad and Rodolfo Pellizzoni. 2014. Schedulability analysis of global memory-predictable scheduling. In EMSOFT’14. DOI:https://doi.org/10.1145/2656045.2656070Google Scholar
- G. Yao, R. Pellizzoni, S. Bak, H. Yun, and M. Caccamo. 2016. Global real-time memory-centric scheduling for multicore systems. IEEE Trans. Comput. 65, 9 (Sep. 2016), 2739–2751. DOI:https://doi.org/10.1109/TC.2015.2500572.Google Scholar
- Arno Luppold, Dominic Oehlert, and Heiko Falk. 2020. Compiling for the worst case: Memory allocation for multi-task and multi-core hard real-time systems. ACM Trans. Embed. Comput. Syst. 19, 2, (Mar. 2020). DOI:https://doi.org/10.1145/3381752Google ScholarDigital Library
- Christoph M. Kirsch and Ana Sokolova. 2012. The logical execution time paradigm. In Advances in Real-time Systems. Springer, 103–120.Google Scholar
- Rodolfo Pellizzoni, Emiliano Betti, Stanley Bak, Gang Yao, John Criswell, Marco Caccamo, and Russell Kegley. 2011. A predictable execution model for COTS-based embedded systems. In RTAS’11.Google Scholar
- Muhammad Refaat Soliman and Rodolfo Pellizzoni. 2017. WCET-Driven dynamic data scratchpad management with compiler-directed prefetching. In ECRTS’17. DOI:https://doi.org/10.4230/LIPIcs.ECRTS.2017.24Google Scholar
- Bjorn Forsberg, Luca Benini, and Andrea Marongiu. 2018. HePREM: Enabling predictable GPU execution on heterogeneous SoC. In DATE’18.Google Scholar
- B. Forsberg, L. Benini, and A. Marongiu. 2020. HePREM: A predictable execution model for GPU-based heterogeneous SoCs. IEEE Trans. Comput. (2020). DOI:https://doi.org/10.1109/TC.2020.2980520Google Scholar
- Joel Matejka, Björn Forsberg, Michal Sojka, Premysl Sucha, Luca Benini, Andrea Marongiu, and Zdeněk Hanzalek. 2019. Combining PREM compilation and static scheduling for high-performance and predictable MPSoC execution. Parallel Comput. (2019). DOI:https://doi.org/10.1016/j.parco.2018.11.002Google Scholar
- Muhammad R. Soliman and Rodolfo Pellizzoni. 2019. PREM-based optimal task segmentation under fixed priority scheduling. In ECRTS’19. 1–24.Google Scholar
- R. Pellizzoni, A. Schranzhofer, Jian-Jia Chen, M. Caccamo, and L. Thiele. 2010. Worst case delay analysis for memory interference in multicore systems. In DATE’10.Google Scholar
- R. Cavicchioli, N. Capodieci, and M. Bertogna. 2017. Memory interference characterization between CPU cores and integrated GPUs in mixed-criticality platforms. In ETFA’17.Google Scholar
- H. Kim, D. de Niz, B. Andersson, M. Klein, O. Mutlu, and R. Rajkumar. 2014. Bounding memory interference delay in COTS-based multi-core systems. In RTAS’14.Google Scholar
- D. Dasari, B. Andersson, V. Nelis, S. M. Petters, A. Easwaran, and J. Lee. 2011. Response time analysis of COTS-based multicores considering the contention on the shared memory bus. In TrustCom’19.Google Scholar
- S. Saidi and A. Syring. 2018. Exploiting locality for the performance analysis of shared memory systems in MPSoCs. In RTSS’18.Google Scholar
- S. Bak, G. Yao, R. Pellizzoni, and M. Caccamo. 2012. Memory-aware scheduling of multicore task sets for real-time systems. In RTCSA’12.Google Scholar
- Gang Yao, Rodolfo Pellizzoni, Stanley Bak, Emiliano Betti, and Marco Caccamo. 2012. Memory-centric scheduling for multicore hard real-time systems. Real-Time Syst. 48, 6 (2012), 681–715.Google ScholarDigital Library
- Ahmed Alhammad and Rodolfo Pellizzoni. 2014. Time-predictable execution of multithreaded applications on multicore systems. In DATE’14.Google Scholar
- A. Alhammad, S. Wasly, and R. Pellizzoni. 2015. Memory efficient global scheduling of real-time tasks. In RTAS’15. DOI:https://doi.org/10.1109/RTAS.2015.7108452Google Scholar
- R. Mancuso, R. Dudko, and M. Caccamo. 2014. Light-PREM: Automated software refactoring for predictable execution on COTS embedded systems. In RTCSA’14.Google Scholar
- Ralf Ramsauer, Jan Kiszka, Daniel Lohmann, and Wolfgang Mauerer. 2017. Deterministic memory hierarchy and virtualization for modern multi-core embedded systems. In Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’19). 1–14. DOI:10.1109/RTAS.2019.00009Google Scholar
- Tomasz Kloda, Marco Solieri, Renato Mancuso, Nicola Capodieci, Paolo Valente, and Marko Bertogna. 2019. Deterministic memory hierarchy and virtualization for modern multi-core embedded systems. In RTAS’19.Google Scholar
- H. Yun, R. Mancuso, Z. Wu, and R. Pellizzoni. 2014. PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms. In RTAS’14.Google Scholar
- Motor Industry Research Association. 2013. MISRA C:2012: Guidelines for the Use of the C Language in Critical Systems. Motor Industry Research Association.Google Scholar
- Randy Allen and Ken Kennedy. 2001. Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann.Google Scholar
- Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization, 2004. IEEE, 75–86.Google ScholarDigital Library
- Kelefouras Vasilios, Keramidas Georgios, and Voros Nikolaos. 2018. Combining software cache partitioning and loop tiling for effective shared cache management. ACM Trans. Embed. Comput. Syst. 17, 3 (May 2018). DOI:https://doi.org/10.1145/3202663Google ScholarDigital Library
- Louis-Noël Pouchet. [n.d.]. Polybench: The polyhedral benchmark suite. Retrieved from http://www.cs.ucla.edu/pouchet/software/polybench.Google Scholar
- 2019. NVIDIA Jetson TX2 Developer Kit. Retrieved on June 25th, 2021 from https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-tx2/.Google Scholar
- ARM Limited. 2014. ARM Cortex-A57 MPCore Processor Technical Reference Manual (7th ed.).Google Scholar
- Kecheng Ji, Ming Ling, Longxing Shi, and Jianping Pan. 2018. An analytical cache performance evaluation framework for embedded out-of-order processors using software characteristics. ACM Trans. Embed. Comput. Syst. 17, 4 (Aug. 2018). DOI:https://doi.org/10.1145/3233182Google ScholarDigital Library
- Ignacio Sañudo, Paolo Cortimiglia, Luca Miccio, Marco Solieri, Paolo Burgio, Christian Di Biagio, Franco Felici, Giovanni Nuzzo, and Marko Bertogna. 2018. The key role of memory in next-generation embedded systems for military applications. In SEDA’2018’. Springer International.Google Scholar
- Sebastian Altmeyer, Liliana Cucu-Grosjean, and Robert I. Davis. 2015. Static probabilistic timing analysis for real-time systems using random replacement caches. Real-Time Syst. 51, 1 (2015), 77–123.Google ScholarDigital Library
- Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. 2013. Online object tracking: A benchmark. In CVPR. 2411–2418.Google Scholar
- O. J. Dahl, E. W. Dijkstra, and C. A. R. Hoare (Eds.). 1972. Structured Programming. Academic Press Ltd., GBR.Google Scholar
Index Terms
- The Predictable Execution Model in Practice: Compiling Real Applications for COTS Hardware
Recommendations
Time-Predictable Out-of-Order Execution for Hard Real-Time Systems
Superscalar out-of-order CPU designs can achieve higher performance than simpler in-order designs through exploitation of instruction-level parallelism in software. However, these CPU designs are often considered to be unsuitable for hard real-time ...
A Predictable Execution Model for COTS-Based Embedded Systems
RTAS '11: Proceedings of the 2011 17th IEEE Real-Time and Embedded Technology and Applications SymposiumBuilding safety-critical real-time systems out of inexpensive, non-real-time, COTS components is challenging. Although COTS components generally offer high performance, they can occasionally incur significant timing delays. To prevent this, we propose ...
Dynamic Constraints for Mixed-Criticality Systems
COINS '19: Proceedings of the International Conference on Omni-Layer Intelligent SystemsWe define quality of service requirements for mixed-criticality systems based on min-plus algebra rather than discrete criticality levels. The requirements (1) unify a spectrum of weakly-hard real-time requirements with strongly-hard real-time and soft ...
Comments