Abstract
In multitasking real-time systems, the worst-case execution time (WCET) of each task and also the effects of interferences between tasks in the worst-case scenario need to be calculated. This is especially complex in the presence of data caches. In this article, we propose a small instruction-driven data cache (256 bytes) that effectively exploits locality. It works by preselecting a subset of memory instructions that will have data cache replacement permission. Selection of such instructions is based on data reuse theory. Since each selected memory instruction replaces its own data cache line, it prevents pollution and performance in tasks becomes independent of the size of the associated data structures. We have modeled several memory configurations using the Lock-MS WCET analysis method. Our results show that, on average, our data cache effectively services 88% of program data of the tested benchmarks. Such results double the worst-case performance of our tested multitasking experiments. In addition, in the worst case, they reach between 75% and 89% of the ideal case of always hitting in instruction and data caches. As well, we show that using partitioning on our proposed hardware only provides marginal benefits in worst-case performance, so using partitioning is discouraged. Finally, we study the viability of our proposal in the MiBench application suite by characterizing its data reuse, achieving hit ratios beyond 90% in most programs.
- S. Altmeyer, C. Maiza, and J. Reineke. 2010. Resilience analysis: Tightening the CRPD bound for set-associative caches. ACM SIGPLAN Notices 45, 4, 153--162. Google ScholarDigital Library
- L. C. Aparicio, J. Segarra, C. Rodríguez, J. L. Villarroel, and V. Viñals. 2008. Avoiding the WCET overestimation on LRU instruction cache. In Proceedings of the IEEE International Conference on Embedded and Real-Time Computing Systems and Applications. 393--398. Google ScholarDigital Library
- L. C. Aparicio, J. Segarra, C. Rodríguez, and V. Viñals. 2010. Combining prefetch with instruction cache locking in multitasking real-time systems. In Proceedings of the IEEE International Conference on Embedded and Real-Time Computing Systems and Applications. 319--328. Google ScholarDigital Library
- L. C. Aparicio, J. Segarra, C. Rodríguez, and V. Viñals. 2011. Improving the WCET computation in the presence of a lockable instruction cache in multitasking real-time systems. Journal of Systems Architecture 57, 695--706. Google ScholarDigital Library
- M. Geiger, S. McKee, and G. Tyson. 2005. Beyond basic region caching: Specializing cache structures for high performance and energy conservation. In Proceedings of the International Conference on High-Performance and Embedded Architectures and Compilers. 102--115. Google ScholarDigital Library
- S. Ghosh, M. Martonosi, and S. Malik. 1999. Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems 21, 4, 703--746. Google ScholarDigital Library
- A. González, C. Aliagas, and M. Valero. 1995. A data cache with multiple caching strategies tuned to different types of locality. In Proceedings of the International Conference on Supercomputing. 338--347. Google ScholarDigital Library
- R. Gonzalez-Alberquilla, F. Castro, L. Pinuel, and F. Tirado. 2010. Stack filter: Reducing L1 data cache power consumption. Journal of Systems Architecture 56, 12, 685--695. Google ScholarDigital Library
- M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE International Workshop on Workload Characterization. 3--14. Google ScholarDigital Library
- H. S. Lee and G. S. Tyson. 2000. Region-based caching: An energy-delay efficient memory architecture for embedded processors. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. 120--127. Google ScholarDigital Library
- Y. T. S. Li, S. Malik, and A. Wolfe. 1996. Cache modeling for real-time software: Beyond direct mapped instruction caches. In Proceedings of the IEEE Real-Time Systems Symposium. 254--264. Google ScholarDigital Library
- T. Lundqvist and P. Stenström. 1999. An integrated path and timing analysis method based on cycle-level symbolic execution. Real-Time Systems 17, 2--3, 183--207. Google ScholarDigital Library
- A. Martí Campoy, Á. Perles Ivars, and J. V. Busquets Mataix. 2001. Static use of locking caches in multitask preemptive real-time systems. In Proceedings of the IEEE Real-Time Embedded System Workshop.Google Scholar
- A. Martí Campoy, Á. Perles Ivars, F. Rodríguez, and J. V. Busquets Mataix. 2003a. Static use of locking caches vs. dynamic use of locking caches for real-time systems. In Proceedings of the Canadian Conference on Electrical and Computer Engineering.Google Scholar
- A. Martí Campoy, S. Sáez, Á. Perles Ivars, and J. V. Busquets Mataix. 2003b. Performance comparison of locking caches under static and dynamic schedulers. In Proceedings of the 27th IFAC/IFIP/IEEE Workshop on Real-Time Programming.Google Scholar
- Microprocessor-Report. 2008. Chart watch: High-performance embedded processor cores. Microprocessor Report 22, 26--27.Google Scholar
- N. Muralimanohar, T. Balasubramonian, and N. P. Jouppi. 2007. Cacti 6.0: A Tool to Understand Large Caches. Technical Report. University of Utah and Hewlett Packard Laboratories.Google Scholar
- I. Puaut. 2006. WCET-centric software-controlled instruction caches for hard real-time systems. In Proceedings of the Euromicro Conference on Real-Time Systems. 217--226. Google ScholarDigital Library
- I. Puaut and D. Decotigny. 2002. Low-complexity algorithms for static cache locking in multitasking hard real-time systems. In Proceedings of the IEEE Real-Time Systems Symposium. 114. Google ScholarDigital Library
- I. Puaut and C. Pais. 2007. Scratchpad memories vs locked caches in hard real-time systems: A quantitative comparison. In Proceedings of the Design, Automation Test in Europe Conference Exhibition. 1--6. Google ScholarDigital Library
- R. Reddy and P. Petrov. 2007. Eliminating inter-process cache interference through cache reconfigurability for real-time and low-power embedded multi-tasking systems. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. 198--207. Google ScholarDigital Library
- F. Rossi, P. V. Beek, and T. Walsh. 2006. Handbook of Constraint Programming. Elsevier. Google ScholarDigital Library
- Seoul National University Real-Time Research Group. 2008. SNU-RT benchmark suite for worst case timing analysis.Google Scholar
- L. Sha, T. Abdelzaher, K.-E. Årzén, A. Cervin, T. Baker, A. Burns, G. Buttazzo, M. Caccamo, J. Lehoczky, and A. K. Mok. 2004. Real time scheduling theory: A historical perspective. Real-Time Systems 28, 101--155. Google ScholarDigital Library
- V. Suhendra and T. Mitra. 2008. Exploring locking & partitioning for predictable shared caches on multi-cores. In Proceedings of the 45th Design Automation Conference. 300--303. Google ScholarDigital Library
- H. Theiling, C. Ferdinand, and R. Wilhelm. 2000. Fast and precise WCET prediction by separated cache and path analyses. Real-Time Systems 18, 2--3, 157--179. Google ScholarDigital Library
- G. Tyson, M. Farrens, J. Matthews, and A. R. Pleszkun. 1995. A modified approach to data cache management. In Proceedings of the 28th Annual International Symposium on Microarchitecture (MICRO-28). IEEE, Los Alamitos, CA, 93--103. Google ScholarDigital Library
- G.-R. Uh, Y. Wang, D. Whalley, S. Jinturkar, C. Burns, and V. Cao. 1999. Effective exploitation of a zero overhead loop buffer. ACM SIGPLAN Notices 34, 7, 10--19. Google ScholarDigital Library
- X. Vera, B. Lisper, and J. Xue. 2003. Data caches in multitasking hard real-time systems. In Proceedings of the IEEE Real-Time Systems Symposium. 154--166. Google ScholarDigital Library
- X. Vera, B. Lisper, and J. Xue. 2007. Data cache locking for tight timing calculations. ACM Transactions on Embedded Computing Systems 7, 1, 1--38. Google ScholarDigital Library
- S. A. Ward and R. H. Halstead. 2002. Computation Structures. Kluwer Academics.Google Scholar
- R. White, F. Mueller, C. Healy, D. Whalley, and M. Harmon. 1997. Timing analysis for data caches and set-associative caches. In Proceedings of the IEEE Real-Time Technology and Applications Symposium. 192--202. Google ScholarDigital Library
- J. Whitham and N. Audsley. 2010. Studying the applicability of the scratchpad memory management unit. In Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium. 205--214. Google ScholarDigital Library
- M. E. Wolf and M. S. Lam. 1991. A data locality optimizing algorithm. ACM SIGPLAN Notices 26, 30--44. Google ScholarDigital Library
- J. Xue and X. Vera. 2004. Efficient and accurate analytical modeling of whole-program data cache behavior. IEEE Transactions on Computers 53, 5, 547--566. Google ScholarDigital Library
Index Terms
- ACDC: Small, Predictable and High-Performance Data Cache
Recommendations
Instruction cache locking for multi-task real-time embedded systems
In a multi-task embedded system, a cache is shared by different tasks, which increases the complexity of cache management and the unpredictability of cache behavior. This unpredictability in turn brings an overestimation of application's worst-case ...
Task Assignment with Cache Partitioning and Locking for WCET Minimization on MPSoC
ICPP '10: Proceedings of the 2010 39th International Conference on Parallel ProcessingCache is known for its unpredictability in embedded systems. Cache locking technique is often utilized to guarantee a tighter prediction of Worst-Case Execution Time (WCET) which is one of the most important performance metrics for embedded systems. ...
Joint task assignment and cache partitioning with cache locking for WCET minimization on MPSoC
Cache locking technique is often utilized to guarantee a tighter prediction of Worst-Case Execution Time (WCET) which is one of the most important performance metrics for embedded systems. However, in Multi-Processor Systems-on-Chip (MPSoC) systems with ...
Comments