Abstract
Cache memory, although important for boosting application performance, is also a source of execution time variability, and this makes its use difficult in systems requiring worst-case execution time (WCET) guarantees. Cache locking is a promising approach for simplifying WCET estimation and providing predictability, and hence, several commercial processors provide ability for locking cache. However, cache locking also has several disadvantages (e.g., extra misses for unlocked blocks, complex algorithms required for selection of locking contents) and hence, a careful management is required to realize the full potential of cache locking. In this article, we present a survey of techniques proposed for cache locking. We categorize the techniques into several groups to underscore their similarities and differences. We also discuss the opportunities and obstacles in using cache locking. We hope that this article will help researchers gain insight into cache locking schemes and will also stimulate further work in this area.
- Tosiron Adegbija and Ann Gordon-Ross. 2015. Phase-based cache locking for embedded systems. In Great Lakes Symposium on VLSI. 115--120. Google ScholarDigital Library
- Kapil Anand and Rajeev Barua. 2009. Instruction cache locking inside a binary rewriter. In International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. 185--194. Google ScholarDigital Library
- Luis C. Aparicio, Juan Segarra, Clemente Rodriguez, and Victor Vinals. 2010. Combining prefetch with instruction cache locking in multitasking real-time systems. In International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA). 319--328. Google ScholarDigital Library
- ARM. 1999. ARM966E-S Technical Reference Manual. Retrieved from http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0164a/ch05s03s02.html.Google Scholar
- ARM. 2007. ARM1156T2-S Technical Reference Manual. Retrieved from http://infocenter.arm.com/help/topic/com.arm.doc.ddi0338g/DDI0338G_arm1156t2s_r0p4_trm.pdf.Google Scholar
- ARM. 2010. Cortex-A8 Technical Reference Manual. Retrieved from http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344k/DDI0344K_cortex_a8_r3p2_trm.pdf.Google Scholar
- ARM. 2012. ARM Cortex-M Programming Guide to Memory Barrier Instructions. Retrieved from http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dai0321a/BIHEADII.html.Google Scholar
- Abu Asaduzzaman, Imad Mahgoub, and Fadi N. Sibai. 2009. Impact of L1 entire locking and L2 way locking on the performance, power consumption, and predictability of multicore real-time systems. In ACS/IEEE International Conference on Computer Systems and Applications (AICCSA). 705--711.Google Scholar
- Abu Asaduzzaman, Fadi N. Sibai, and Manira Rani. 2010. Improving cache locking performance of modern embedded systems via the addition of a miss table at the L2 cache level. Journal of Systems Architecture 56, 4 (2010), 151--162. Google ScholarDigital Library
- A. Martí Campoy, A. P. Ivars, and J. V. Busquets-Mataix. 2001a. Using genetic algorithms in content selection for locking-caches. In International Symposium on Applied Informatics. 271--276.Google Scholar
- A. Marti Campoy, A. Perles Ivars, and J. V. Busquets-Mataix. 2002. Dynamic use of locking caches in multitask, preemptive real-time systems. In 15th Triennial World Congress of the International Federation of Automatic Control.Google Scholar
- A. Marti Campoy, A. Perles, F. Rodriguez, and J. V. Busquets-Mataix. 2003. Static use of locking caches vs. dynamic use of locking caches for real-time systems. In Canadian Conference on Electrical and Computer Engineering (CCECE), Vol. 2. 1283--1286.Google Scholar
- Antonio Marti Campoy, Isabelle Puaut, Angel Perles Ivars, and Jose Vicente Busquets Mataix. 2005. Cache contents selection for statically-locked instruction caches: An algorithm comparison. In Euromicro Conference on Real-Time Systems (ECRTS). 49--56. Google ScholarDigital Library
- Marti Campoy, A. Perles Ivars, and J. V. Busquets-Mataix. 2001b. Static use of locking caches in multitask preemptive real-time systems. In Real-Time Embedded Systems Workshop. 1--6.Google Scholar
- Bekim Cilku, Daniel Prokesch, and Peter Puschner. 2015. A time-predictable instruction-cache architecture that uses prefetching and cache locking. Software Technologies for Future Embedded and Ubiquitous Systems (SEUS) (2015). Google ScholarDigital Library
- Huping Ding, Yun Liang, and Tulika Mitra. 2012. WCET-centric partial instruction cache locking. In Design Automation Conference (DAC). 412--420. Google ScholarDigital Library
- Huping Ding, Yun Liang, and Tulika Mitra. 2013. Integrated instruction cache analysis and locking in multitasking real-time systems. In Design Automation Conference. 147. Google ScholarDigital Library
- Huping Ding, Yun Liang, and Tulika Mitra. 2014. WCET-centric dynamic instruction cache locking. In Design, Automation & Test in Europe. 27. Google ScholarDigital Library
- Heiko Falk, Sascha Plazar, and Henrik Theiling. 2007. Compile-time decided instruction cache locking using worst-case execution paths. In International Conference on Hardware/Software Codesign and System Synthesis. 143--148. Google ScholarDigital Library
- IBM. 2002. IBM PowerPC 750FX RISC Microprocessor. (2002).Google Scholar
- Intel. 2007. 3rd Generation Intel XScale Microarchitecture: Developer’s Manual. http://download.intel.com/design/intelxscale/31628302.pdf. (May 2007).Google Scholar
- Kyungtae Kang, Kyung-Joon Park, and Hongseok Kim. 2012. Functional-level energy characterization of μC/OS-II and cache locking for energy saving. Bell Labs Technical Journal 17, 1 (2012), 219--227. Google ScholarDigital Library
- N. G. Kumar, Sudhanshu Vyas, Ron K. Cytron, Christopher D. Gill, Joseph Zambreno, and Phillip H. Jones. 2014. Cache design for mixed criticality real-time systems. In International Conference on Computer Design (ICCD). 513--516.Google Scholar
- Yau-Tsun Steven Li and Sharad Malik. 1995. Performance analysis of embedded software using implicit path enumeration. In ACM SIGPLAN Notices, Vol. 30. 88--98. Google ScholarDigital Library
- Yau-Tsun Steven Li, Sharad Malik, and Andrew Wolfe. 1996. Cache modeling for real-time software: Beyond direct mapped instruction caches. In Real-Time Systems Symposium. IEEE, 254--263. Google ScholarDigital Library
- Yun Liang, Huping Ding, Tulika Mitra, Abhik Roychoudhury, Yan Li, and Vivy Suhendra. 2012. Timing analysis of concurrent programs running on shared cache multi-cores. Real-Time Systems 48, 6 (2012), 638--680. Google ScholarDigital Library
- Yun Liang and Tulika Mitra. 2010. Instruction cache locking using temporal reuse profile. In Design Automation Conference. 344--349. Google ScholarDigital Library
- Chuanwen Lin, Naijie Gu, and Songsong Cai. 2013. Cache locking optimization in Java virtual machine. In Conference Anthology, IEEE. 1--4.Google Scholar
- Tiantian Liu, Minming Li, and Chun Jason Xue. 2009a. Instruction cache locking for real-time embedded systems with multi-tasks. In IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA). 494--499. Google ScholarDigital Library
- Tiantian Liu, Minming Li, and Chun Jason Xue. 2009b. Minimizing WCET for real-time embedded systems via static instruction cache locking. In Real-Time and Embedded Technology and Applications Symposium (RTAS). 35--44. Google ScholarDigital Library
- Tiantian Liu, Minming Li, and Chun Jason Xue. 2012. Instruction cache locking for embedded systems using probability profile. Journal of Signal Processing Systems 69, 2 (2012), 173--188. Google ScholarDigital Library
- Tiantian Liu, Yingchao Zhao, Minming Li, and Chun Jason Xue. 2010. Task assignment with cache partitioning and locking for WCET minimization on MPSoC. In ICPP. 573--582. Google ScholarDigital Library
- Matthew Loach and Wei Zhang. 2015. Exploring hybrid cache locking to balance performance and time predictability. In SoutheastCon. IEEE, 1--4.Google Scholar
- Thomas Lundqvist and Per Stenström. 1999a. An integrated path and timing analysis method based on cycle-level symbolic execution. Real-Time Systems 17, 2--3 (1999), 183--207. Google ScholarDigital Library
- Thomas Lundqvist and Per Stenström. 1999b. Timing anomalies in dynamically scheduled microprocessors. In IEEE Real-Time Systems Symposium. 12--21. Google ScholarDigital Library
- MIPS. 2004. MIPS32 4KEc Processor Core Datasheet. Retrieved from http://www.rockbox.org/wiki/pub/Main/IriverLPlayerPort/MIPS-4KEcDataSheet.pdf.Google Scholar
- Sparsh Mittal. 2014a. A survey of techniques for improving energy efficiency in embedded computing systems. International Journal of Computer Aided Engineering and Technology (IJCAET) 6, 4 (2014), 440--459.Google ScholarCross Ref
- Sparsh Mittal. 2014b. A survey of techniques for managing and leveraging caches in GPUs. Journal of Circuits, Systems, and Computers (JCSC) 23, 8 (2014).Google Scholar
- Sparsh Mittal. 2014c. A survey of architectural techniques for improving cache power efficiency. Elsevier Sustainable Computing: Informatics and Systems 4, 1 (2014), 33--43.Google ScholarCross Ref
- Sparsh Mittal. 2015. A survey of power management techniques for phase change memory. International Journal of Computer Aided Engineering and Technology (IJCAET) (2015).Google Scholar
- Sparsh Mittal, Jeffrey S. Vetter, and Dong Li. 2015. A survey of architectural approaches for managing embedded DRAM and non-volatile on-chip caches. IEEE Transactions on Parallel and Distributed Systems (TPDS) (2015).Google Scholar
- Fan Ni, Xiang Long, Han Wan, and Xiaopeng Gao. 2013. Combining instruction prefetching with partial cache locking to improve WCET in real-time systems. PloS one 8, 12 (2013), e82975.Google ScholarCross Ref
- John Picchi and Wei Zhang. 2015. Impact of L2 cache locking on GPU performance. In SoutheastCon. IEEE, 1--4.Google Scholar
- Sascha Plazar, Jan C. Kleinsorge, Peter Marwedel, and Heiko Falk. 2012. WCET-aware static locking of instruction caches. In International Symposium on Code Generation and Optimization. 44--52. Google ScholarDigital Library
- Isabelle Puaut. 2006. WCET-centric software-controlled instruction caches for hard real-time systems. In Euromicro Conference on Real-Time Systems. Google ScholarDigital Library
- I. Puaut and A. Arnaud. 2006. Dynamic instruction cache locking in hard real-time systems. In Int. Conference on Real-Time and Network Systems.Google Scholar
- Isabelle Puaut and David Decotigny. 2002. Low-complexity algorithms for static cache locking in multitasking hard real-time systems. In Real-Time Systems Symposium (RTSS). 114--123. Google ScholarDigital Library
- Isabelle Puaut and Christophe Pais. 2007. Scratchpad memories vs locked caches in hard real-time systems: A quantitative comparison. In Design, Automation & Test in Europe. 1--6. Google ScholarDigital Library
- Keni Qiu, Mengying Zhao, Chenchen Fu, and Chun Jason Xue. 2013. Data re-allocation enabled cache locking for embedded systems. In International Conference on Very Large Scale Integration (VLSI-SoC). 130--133.Google ScholarCross Ref
- Keni Qiu, Mengying Zhao, Chun Jason Xue, and Alex Orailoglu. 2014. Branch prediction-directed dynamic instruction cache locking for embedded systems. ACM Transactions on Embedded Computing Systems (TECS) 13, 5s (2014), 156. Google ScholarDigital Library
- Abhik Sarkar, Frank Mueller, and Harini Ramaprasad. 2015. Static task partitioning for locked caches in multicore real-time systems. ACM Transactions on Embedded Computing Systems (TECS) 14, 1 (2015), 4. Google ScholarDigital Library
- Mayank Shekhar, Abhik Sarkar, Harini Ramaprasad, and Frank Mueller. 2012. Semi-partitioned hard-real-time scheduling under locked cache migration in multicore systems. In Euromicro Conference on Real-Time Systems (ECRTS). 331--340. Google ScholarDigital Library
- Vivy Suhendra and Tulika Mitra. 2008. Exploring locking & partitioning for predictable shared caches on multi-cores. In Design Automation Conference. 300--303. Google ScholarDigital Library
- Henrik Theiling, Christian Ferdinand, and Reinhard Wilhelm. 2000. Fast and precise WCET prediction by separated cache and path analyses. Real-Time Systems 18, 2--3 (2000), 157--179. Google ScholarDigital Library
- Xavier Vera, Björn Lisper, and Jingling Xue. 2003. Data cache locking for higher program predictability. In ACM SIGMETRICS Performance Evaluation Review, Vol. 31. 272--282. Google ScholarDigital Library
- Xavier Vera, Björn Lisper, and Jingling Xue. 2007. Data cache locking for tight timing calculations. ACM Transactions on Embedded Computing Systems (TECS) 7, 1 (2007), 4. Google ScholarDigital Library
- Bryan C. Ward, Jonathan L. Herman, Christopher J. Kenna, and James H. Anderson. 2013. Making shared caches more predictable on multicore platforms. In Euromicro Conference on Real-Time Systems (ECRTS). 157--167. Google ScholarDigital Library
- Reinhard Wilhelm, Daniel Grund, Jan Reineke, Marc Schlickling, Markus Pister, and Christian Ferdinand. 2009. Memory hierarchies, pipelines, and buses for future architectures in time-critical embedded systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 28, 7 (2009), 966--978. Google ScholarDigital Library
- Wenguang Zheng and Hui Wu. 2014. WCET-aware dynamic instruction cache locking. In Conference on Languages, Compilers and Tools for Embedded Systems. 53--62. Google ScholarDigital Library
- Wenguang Zheng and Hui Wu. 2015. WCET-aware dynamic D-cache locking for a single task. In ACM Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES). Google ScholarDigital Library
Index Terms
- A Survey of Techniques for Cache Locking
Recommendations
Phase-based Cache Locking for Embedded Systems
GLSVLSI '15: Proceedings of the 25th edition on Great Lakes Symposium on VLSISince caches are commonly used in embedded systems, which typically have stringent design constraints imposed by physical size, battery capacity, real-time deadlines, etc., much research focuses on cache optimizations, such as improved performance and/...
Compile-time decided instruction cache locking using worst-case execution paths
CODES+ISSS '07: Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesisCaches are notorious for their unpredictability. It is difficult or even impossible to predict if a memory access results in a definite cache hit or miss. This unpredictability is highly undesired for real-time systems. The Worst-Case Execution Time (...
Instruction cache locking using temporal reuse profile
DAC '10: Proceedings of the 47th Design Automation ConferenceThe performance of most embedded systems is critically dependent on the average memory access latency. Improving the cache hit rate can have significant positive impact on the performance of an application. Modern embedded processors often feature cache ...
Comments