Skip to main content
Log in

Time-sensitivity-aware shared cache architecture for multi-core embedded systems

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In embedded systems such as automotive systems, multi-core processors are expected to improve performance and reduce manufacturing cost by integrating multiple functions on a single chip. However, inter-core interference in shared last-level cache (LLC) results in increased and unpredictable execution times for time-sensitive tasks (TSTs), which have (soft) timing constraints, thereby increasing the deadline miss rates of such systems. In this paper, we propose a time-sensitivity-aware dead block-based shared LLC architecture to mitigate these problems. First, a time-sensitivity indication bit is added to each cache block, which allows the proposed LLC architecture to be aware of instructions/data belonging to TSTs. Second, portions of the LLC space are allocated to general tasks without interfering with TSTs by developing a time-sensitivity-aware dead block-based cache partitioning technique. Third, to reduce the deadline miss rate of TSTs further, we propose a task matching in shared caches and a cache partitioning scheme that considers the memory access characteristics and the time-sensitivity of tasks (TATS). The TATS is combined with our proposed dead block-based scheme. Our evaluation shows that the proposed schemes reduce deadline miss rates of TSTs compared to conventional shared caches. On a dual-core system, compared to a baseline, equal partitioning, and state-of-the-art quality-of-service-aware cache partitioning, our proposed dead block-based cache partitioning provides 9.3%, 30.5%, and 2.6% lower average deadline miss rates, respectively. On a quad-core system, compared to the baseline, equal partitioning, and state-of-the-art quality-of-service-aware cache partitioning, the combination of our proposed schemes provides 21.2%, 17.7%, and 4.1% lower average deadline miss rates, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. In this study, we do not consider applications having hard timing constraints, such as engine control and power train, which require very strict timing constraints. We focus on the applications that can tolerate some deadline misses, which are called time-sensitive applications in this paper and [12].

  2. The task categorization is dependent on the timing requirement of the task. If a task is periodically executed and has a (soft) deadline, the task would be categorized as a TST. For example, if there is a video decoding application that requires a frame rate of 30 fps, the video decoding task of the application can be considered as a TST because it is periodically executed and must be accomplished within 33 ms to achieve the required frame rate. On the other hand, if a task has no deadline but requires higher overall performance, the task would be categorized as a GT.

  3. In this paper, performance predictability is defined as the inverse of the difference between the shortest and longest execution times of a task. A higher-performance predictability of a task indicates a smaller performance variation and better tail performance of the task.

  4. In this paper, a TST of an application is defined as the core task that runs periodically and has soft timing constraints. A job is defined as an instance of the TST. A detailed model is presented in Sect. 4, and our evaluation methodology is similar to [27].

  5. We do not consider tasks having hard timing constraints in this paper.

  6. We assume high utilization to examine the impact of inter-core cache interference. High utilization is preferred for lower manufacturing cost of the system.

  7. In this study, we allocate equal size cache partitions to TSTs. However, the cache space is not wasted because the actual cache partitioning is dynamically done during runtime. When some tasks are idle, the other tasks can occupy their cache space.

  8. The parameters are also used in Algorithms 3 and 4.

  9. This is because the maximum number of parallel tasks in a multi-core system is equal to the number of cores. The maximum number of groups is equal to that of cores when only TSTs are running.

  10. If the number of cache partitions cannot be divided by that of current groups (e.g., 16 partitions shared by 3 groups), the remaining partitions are randomly allocated to the groups.

  11. The number of cores used in this paper is at most 4. Therefore, 2-bit group field is used in this paper. Even with 64 cores, only 6 bits are needed.

  12. \(t_{CL}\): CAS (Column Address Strobe) latency, \(t_{RCD}\): row address to column address delay, \(t_{RP}\): row precharge time.

  13. In the experiments, we used configurations to fit the working sets of MiBench benchmarks and to model the inter-core cache interference in a harsh situation.

  14. In this paper, we profile the benchmarks with a halved LLC. For more partitions, one can profile the tasks with LLCs that are partitioned into more than two segments. Nevertheless, halving LLC space can be a good estimator for the performance sensitivities of tasks. Similar categorization is used in [46].

  15. To estimate the probability density of the execution times, kernel density estimation is applied to the data. For kernel function of the estimation, we used normal kernel function which implies that the probability of sampled execution times follows standard normal distribution.

References

  1. Anderson JH, Bud V, Devi UC (2005) An EDF-based scheduling algorithm for multiprocessor soft real-time systems. In: Proceedings of the 17th Euromicro Conference on Real-Time Systems

  2. Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The gem5 simulator. SIGARCH Comput Archit News 39(2):1–7

    Article  Google Scholar 

  3. Bui BD, Caccamo M, Sha L, Martinez J (2008) Impact of cache partitioning on multi-tasking real time embedded systems. In: Embedded and Real-Time Computing Systems and Applications

  4. Calandrino JM, Anderson JH (2008) Cache-aware real-time scheduling on multicore platforms: Heuristics and a case study. In: Euromicro Conference on Real-Time Systems, 2008. ECRTS’08. IEEE, pp 299–308

  5. Chang J, Sohi GS (2014) Cooperative cache partitioning for chip multiprocessors. In: ACM International Conference on Supercomputing 25th Anniversary Volume

  6. Chiou D, Jain P, Devadas S, Rudolph L (2000) Dynamic cache partitioning via columnization. In: DAC

  7. Chisholm M, Kim N, Ward BC, Otterness N, Anderson JH, Smith FD (2016) Reconciling the tension between hardware isolation and data sharing in mixed-criticality, multicore systems. In: RTSS

  8. Ding H, Liang Y, Mitra T (2012) WCET-centric partial instruction cache locking. In: DAC

  9. Ding H, Liang Y, Mitra T (2013) Integrated instruction cache analysis and locking in multitasking real-time systems. In: DAC

  10. Ebert C, Favaro J (2017) Automotive software. IEEE Softw 34(3):33–39

    Article  Google Scholar 

  11. El-Sayed N, Mukkara A, Tsai PA, Kasture H, Ma X, Sanchez D (2018) KPart: a hybrid cache partitioning-sharing technique for commodity multicores. In: HPCA

  12. Goel A, Abeni L, Krasic C, Snow J, Walpole J (2002) Supporting time-sensitive applications on a commodity OS. SIGOPS Oper Syst Rev 36(SI):165–180

    Article  Google Scholar 

  13. Guan N, Stigge M, Yi W, Yu G (2009) Cache-aware scheduling and analysis for multicores. In: Proceedings of the Seventh ACM International Conference on Embedded Software, ACM, pp 245–254

  14. Guo F, Solihin Y, Zhao L, Iyer R (2010) Quality of service shared cache management in chip multiprocessor architecture. ACM Trans Archit Code Optim 7(3):14

    Article  Google Scholar 

  15. Herdrich A, Verplanke E, Autee P, Illikkal R, Gianos C, Singhal R, Iyer R (2016) Cache QoS: from concept to reality in the intel® xeon® processor e5-2600 v3 product family. In: HPCA

  16. Iyer R (2004) CQoS: A framework for enabling QoS in shared caches of CMP platforms. In: Proceedings of the 18th Annual International Conference on Supercomputing, ICS ’04

  17. Jaleel A, Theobald KB, Steely SC Jr, Emer J (2010) High performance cache replacement using re-reference interval prediction (rrip). In: ISCA

  18. Kaxiras S, Hu Z, Martonosi M (2001) Cache decay: exploiting generational behavior to reduce cache leakage power. In: ACM SIGARCH Computer Architecture News

  19. Kern D, Schmidt A (2009) Design space for driver-based automotive user interfaces. In: AutomotiveUI

  20. Kim H, Rajkumar RR (2018) Predictable shared cache management for multi-core real-time virtualization. TECS 17(1):22

    Google Scholar 

  21. Kim H, Kandhalu A, Rajkumar R (2013) A coordinated approach for practical OS-level cache management in multi-core real-time systems. In: 2013 25th Euromicro Conference on Real-Time Systems

  22. Kim S, Chandra D, Solihin Y (2004) Fair cache sharing and partitioning in a chip multiprocessor architecture. In: PACT

  23. Kirk D, Strosnider J (1990) SMART (strategic memory allocation for real-time) cache design using the mips r3000. In: RTSS

  24. Lesage B, Puaut I, Seznec A (2012) Preti: Partitioned real-time shared cache for mixed-criticality real-time systems. In: Proceedings of the 20th International Conference on Real-Time and Network Systems

  25. Lin J, Lu Q, Ding X, Zhang Z, Zhang X, Sadayappan P (2008) Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In: HPCA

  26. Liu T, Li M, Xue CJ (2009) Instruction cache locking for real-time embedded systems with multi-tasks. In: RTCSA

  27. Lo D, Song T, Suh GE (2015) Prediction-guided performance-energy trade-off for interactive applications. In: MICRO

  28. Manikantan R, Rajan K, Govindarajan R (2012) Probabilistic shared cache management (PriSM). In: ISCA

  29. Paolieri M, Quiñones E, Cazorla FJ, Bernat G, Valero M (2009) Hardware support for WCET analysis of hard real-time multicore systems. In: ISCA

  30. Paolieri M, Quiñones E, Cazorla FJ, Bernat G, Valero M (2009) Hardware support for wcet analysis of hard real-time multicore systems. In: ACM SIGARCH Computer Architecture News, pp 57–68

  31. Puaut I, Decotigny D (2002) Low-complexity algorithms for static cache locking in multitasking hard real-time systems. In: RTSS

  32. Puaut I, Pais C (2007) Scratchpad memories vs. locked caches in hard real-time systems: a quantitative comparison. In: DATE

  33. Qureshi M, Patt Y (2006) Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In: MICRO

  34. Rafique N, Lim WT, Thottethodi M (2006) Architectural support for operating system-driven CMP cache management. In: PACT

  35. Sanchez D, Kozyrakis C (2011) Vantage: scalable and efficient fine-grain cache partitioning. SIGARCH Comput Archit News 39(3):57–68

    Article  Google Scholar 

  36. Sangiovanni-Vincentelli A, Di Natale M (2007) Embedded system design for automotive applications. Computer 40(10):42–51

    Article  Google Scholar 

  37. Srikantaiah S, Kandemir M, Wang Q (2009) SHARP control: controlled shared cache management in chip multiprocessors. In: MICRO

  38. Subramanian L, Seshadri V, Ghosh A, Khan S, Mutlu O (2015) The application slowdown model: Quantifying and controlling the impact of inter-application interference at shared caches and main memory. In: Proceedings of the 48th International Symposium on Microarchitecture, pp 62–75

  39. Suh GE, Rudolph L, Devadas S (2004) Dynamic partitioning of shared cache memory. J Supercomput 28(1):7–26

    Article  Google Scholar 

  40. Tam D, Azimi R, Stumm M (2007) Thread clustering: sharing-aware scheduling on smp-cmp-smt multiprocessors. ACM SIGOPS Oper Syst Rev 41:47–58

    Article  Google Scholar 

  41. Usui H, Subramanian L, Chang KKW, Mutlu O (2016) DASH: deadline-aware high-performance memory scheduler for heterogeneous systems with hardware accelerators. ACM Trans Archit Code Optim 12(4):65

    Article  Google Scholar 

  42. Vasilios K, Georgios K, Nikolaos V (2018) Combining software cache partitioning and loop tiling for effective shared cache management. ACM Trans Embedded Comput Syst (TECS) 17(3):72

    Google Scholar 

  43. Wang X, Chen S, Setter J, Martínez JF (2017) Swap: Effective fine-grain management of shared last-level caches with minimum hardware support. In: HPCA

  44. Ward B, Herman J, Kenna C, Anderson J (2013) Making shared caches more predictable on multicore platforms. In: ECRTS

  45. Wilhelm R, Engblom J, Ermedahl A, Holsti N, Thesing S, Whalley D, Bernat G, Ferdinand C, Heckmann R, Mitra T, Mueller F, Puaut I, Puschner P, Staschulat J, Stenström P (2008) The worst-case execution-time problem: overview of methods and survey of tools. ACM Trans Embed Comput Syst 7(3):36

    Article  Google Scholar 

  46. Xie Y, Loh G (2008) Dynamic classification of program memory behaviors in CMPS. In: the 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects

  47. Xie Y, Loh GH (2009) PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. SIGARCH Comput Archit News 37(3):174–183

    Article  Google Scholar 

  48. Xu C, Rajamani K, Ferreira A, Felter W, Rubio J, Li Y (2018) dCat: dynamic cache management for efficient, performance-sensitive infrastructure-as-a-service. In: EuroSys

  49. Ye Y, West R, Cheng Z, Li Y (2014) Coloris: a dynamic cache partitioning system using page coloring. In: 2014 23rd International Conference on Parallel Architecture and Compilation Techniques (PACT)

Download references

Acknowledgements

This work was supported by the National Research Foundation (NRF) Grants funded by Korean Government (2018R1A2B2005277).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Myoungjun Lee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, M., Kim, S. Time-sensitivity-aware shared cache architecture for multi-core embedded systems. J Supercomput 75, 6746–6776 (2019). https://doi.org/10.1007/s11227-019-02891-w

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-019-02891-w

Keywords

Navigation