Time-sensitivity-aware shared cache architecture for multi-core embedded systems

Lee, Myoungjun; Kim, Soontae

doi:10.1007/s11227-019-02891-w

Time-sensitivity-aware shared cache architecture for multi-core embedded systems

Published: 18 May 2019

Volume 75, pages 6746–6776, (2019)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

315 Accesses
2 Citations
Explore all metrics

Abstract

In embedded systems such as automotive systems, multi-core processors are expected to improve performance and reduce manufacturing cost by integrating multiple functions on a single chip. However, inter-core interference in shared last-level cache (LLC) results in increased and unpredictable execution times for time-sensitive tasks (TSTs), which have (soft) timing constraints, thereby increasing the deadline miss rates of such systems. In this paper, we propose a time-sensitivity-aware dead block-based shared LLC architecture to mitigate these problems. First, a time-sensitivity indication bit is added to each cache block, which allows the proposed LLC architecture to be aware of instructions/data belonging to TSTs. Second, portions of the LLC space are allocated to general tasks without interfering with TSTs by developing a time-sensitivity-aware dead block-based cache partitioning technique. Third, to reduce the deadline miss rate of TSTs further, we propose a task matching in shared caches and a cache partitioning scheme that considers the memory access characteristics and the time-sensitivity of tasks (TATS). The TATS is combined with our proposed dead block-based scheme. Our evaluation shows that the proposed schemes reduce deadline miss rates of TSTs compared to conventional shared caches. On a dual-core system, compared to a baseline, equal partitioning, and state-of-the-art quality-of-service-aware cache partitioning, our proposed dead block-based cache partitioning provides 9.3%, 30.5%, and 2.6% lower average deadline miss rates, respectively. On a quad-core system, compared to the baseline, equal partitioning, and state-of-the-art quality-of-service-aware cache partitioning, the combination of our proposed schemes provides 21.2%, 17.7%, and 4.1% lower average deadline miss rates, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CaPPS: cache partitioning with partial sharing for multi-core embedded systems

Article 04 November 2015

Wei Zang & Ann Gordon-Ross

Hybrid Approach on Cache Aware Real-Time Scheduling for Multi-Core Systems

How to Improve the Reactiveness and Efficiency of Embedded Multi-core Systems by Use of Probabilistic Simulation and Optimization Techniques

Notes

In this study, we do not consider applications having hard timing constraints, such as engine control and power train, which require very strict timing constraints. We focus on the applications that can tolerate some deadline misses, which are called time-sensitive applications in this paper and [12].
The task categorization is dependent on the timing requirement of the task. If a task is periodically executed and has a (soft) deadline, the task would be categorized as a TST. For example, if there is a video decoding application that requires a frame rate of 30 fps, the video decoding task of the application can be considered as a TST because it is periodically executed and must be accomplished within 33 ms to achieve the required frame rate. On the other hand, if a task has no deadline but requires higher overall performance, the task would be categorized as a GT.
In this paper, performance predictability is defined as the inverse of the difference between the shortest and longest execution times of a task. A higher-performance predictability of a task indicates a smaller performance variation and better tail performance of the task.
In this paper, a TST of an application is defined as the core task that runs periodically and has soft timing constraints. A job is defined as an instance of the TST. A detailed model is presented in Sect. 4, and our evaluation methodology is similar to [27].
We do not consider tasks having hard timing constraints in this paper.
We assume high utilization to examine the impact of inter-core cache interference. High utilization is preferred for lower manufacturing cost of the system.
In this study, we allocate equal size cache partitions to TSTs. However, the cache space is not wasted because the actual cache partitioning is dynamically done during runtime. When some tasks are idle, the other tasks can occupy their cache space.
The parameters are also used in Algorithms 3 and 4.
This is because the maximum number of parallel tasks in a multi-core system is equal to the number of cores. The maximum number of groups is equal to that of cores when only TSTs are running.
If the number of cache partitions cannot be divided by that of current groups (e.g., 16 partitions shared by 3 groups), the remaining partitions are randomly allocated to the groups.
The number of cores used in this paper is at most 4. Therefore, 2-bit group field is used in this paper. Even with 64 cores, only 6 bits are needed.
\(t_{CL}\): CAS (Column Address Strobe) latency, \(t_{RCD}\): row address to column address delay, \(t_{RP}\): row precharge time.
In the experiments, we used configurations to fit the working sets of MiBench benchmarks and to model the inter-core cache interference in a harsh situation.
In this paper, we profile the benchmarks with a halved LLC. For more partitions, one can profile the tasks with LLCs that are partitioned into more than two segments. Nevertheless, halving LLC space can be a good estimator for the performance sensitivities of tasks. Similar categorization is used in [46].
To estimate the probability density of the execution times, kernel density estimation is applied to the data. For kernel function of the estimation, we used normal kernel function which implies that the probability of sampled execution times follows standard normal distribution.

References

Anderson JH, Bud V, Devi UC (2005) An EDF-based scheduling algorithm for multiprocessor soft real-time systems. In: Proceedings of the 17th Euromicro Conference on Real-Time Systems
Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The gem5 simulator. SIGARCH Comput Archit News 39(2):1–7
Article Google Scholar
Bui BD, Caccamo M, Sha L, Martinez J (2008) Impact of cache partitioning on multi-tasking real time embedded systems. In: Embedded and Real-Time Computing Systems and Applications
Calandrino JM, Anderson JH (2008) Cache-aware real-time scheduling on multicore platforms: Heuristics and a case study. In: Euromicro Conference on Real-Time Systems, 2008. ECRTS’08. IEEE, pp 299–308
Chang J, Sohi GS (2014) Cooperative cache partitioning for chip multiprocessors. In: ACM International Conference on Supercomputing 25th Anniversary Volume
Chiou D, Jain P, Devadas S, Rudolph L (2000) Dynamic cache partitioning via columnization. In: DAC
Chisholm M, Kim N, Ward BC, Otterness N, Anderson JH, Smith FD (2016) Reconciling the tension between hardware isolation and data sharing in mixed-criticality, multicore systems. In: RTSS
Ding H, Liang Y, Mitra T (2012) WCET-centric partial instruction cache locking. In: DAC
Ding H, Liang Y, Mitra T (2013) Integrated instruction cache analysis and locking in multitasking real-time systems. In: DAC
Ebert C, Favaro J (2017) Automotive software. IEEE Softw 34(3):33–39
Article Google Scholar
El-Sayed N, Mukkara A, Tsai PA, Kasture H, Ma X, Sanchez D (2018) KPart: a hybrid cache partitioning-sharing technique for commodity multicores. In: HPCA
Goel A, Abeni L, Krasic C, Snow J, Walpole J (2002) Supporting time-sensitive applications on a commodity OS. SIGOPS Oper Syst Rev 36(SI):165–180
Article Google Scholar
Guan N, Stigge M, Yi W, Yu G (2009) Cache-aware scheduling and analysis for multicores. In: Proceedings of the Seventh ACM International Conference on Embedded Software, ACM, pp 245–254
Guo F, Solihin Y, Zhao L, Iyer R (2010) Quality of service shared cache management in chip multiprocessor architecture. ACM Trans Archit Code Optim 7(3):14
Article Google Scholar
Herdrich A, Verplanke E, Autee P, Illikkal R, Gianos C, Singhal R, Iyer R (2016) Cache QoS: from concept to reality in the intel® xeon® processor e5-2600 v3 product family. In: HPCA
Iyer R (2004) CQoS: A framework for enabling QoS in shared caches of CMP platforms. In: Proceedings of the 18th Annual International Conference on Supercomputing, ICS ’04
Jaleel A, Theobald KB, Steely SC Jr, Emer J (2010) High performance cache replacement using re-reference interval prediction (rrip). In: ISCA
Kaxiras S, Hu Z, Martonosi M (2001) Cache decay: exploiting generational behavior to reduce cache leakage power. In: ACM SIGARCH Computer Architecture News
Kern D, Schmidt A (2009) Design space for driver-based automotive user interfaces. In: AutomotiveUI
Kim H, Rajkumar RR (2018) Predictable shared cache management for multi-core real-time virtualization. TECS 17(1):22
Google Scholar
Kim H, Kandhalu A, Rajkumar R (2013) A coordinated approach for practical OS-level cache management in multi-core real-time systems. In: 2013 25th Euromicro Conference on Real-Time Systems
Kim S, Chandra D, Solihin Y (2004) Fair cache sharing and partitioning in a chip multiprocessor architecture. In: PACT
Kirk D, Strosnider J (1990) SMART (strategic memory allocation for real-time) cache design using the mips r3000. In: RTSS
Lesage B, Puaut I, Seznec A (2012) Preti: Partitioned real-time shared cache for mixed-criticality real-time systems. In: Proceedings of the 20th International Conference on Real-Time and Network Systems
Lin J, Lu Q, Ding X, Zhang Z, Zhang X, Sadayappan P (2008) Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In: HPCA
Liu T, Li M, Xue CJ (2009) Instruction cache locking for real-time embedded systems with multi-tasks. In: RTCSA
Lo D, Song T, Suh GE (2015) Prediction-guided performance-energy trade-off for interactive applications. In: MICRO
Manikantan R, Rajan K, Govindarajan R (2012) Probabilistic shared cache management (PriSM). In: ISCA
Paolieri M, Quiñones E, Cazorla FJ, Bernat G, Valero M (2009) Hardware support for WCET analysis of hard real-time multicore systems. In: ISCA
Paolieri M, Quiñones E, Cazorla FJ, Bernat G, Valero M (2009) Hardware support for wcet analysis of hard real-time multicore systems. In: ACM SIGARCH Computer Architecture News, pp 57–68
Puaut I, Decotigny D (2002) Low-complexity algorithms for static cache locking in multitasking hard real-time systems. In: RTSS
Puaut I, Pais C (2007) Scratchpad memories vs. locked caches in hard real-time systems: a quantitative comparison. In: DATE
Qureshi M, Patt Y (2006) Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In: MICRO
Rafique N, Lim WT, Thottethodi M (2006) Architectural support for operating system-driven CMP cache management. In: PACT
Sanchez D, Kozyrakis C (2011) Vantage: scalable and efficient fine-grain cache partitioning. SIGARCH Comput Archit News 39(3):57–68
Article Google Scholar
Sangiovanni-Vincentelli A, Di Natale M (2007) Embedded system design for automotive applications. Computer 40(10):42–51
Article Google Scholar
Srikantaiah S, Kandemir M, Wang Q (2009) SHARP control: controlled shared cache management in chip multiprocessors. In: MICRO
Subramanian L, Seshadri V, Ghosh A, Khan S, Mutlu O (2015) The application slowdown model: Quantifying and controlling the impact of inter-application interference at shared caches and main memory. In: Proceedings of the 48th International Symposium on Microarchitecture, pp 62–75
Suh GE, Rudolph L, Devadas S (2004) Dynamic partitioning of shared cache memory. J Supercomput 28(1):7–26
Article Google Scholar
Tam D, Azimi R, Stumm M (2007) Thread clustering: sharing-aware scheduling on smp-cmp-smt multiprocessors. ACM SIGOPS Oper Syst Rev 41:47–58
Article Google Scholar
Usui H, Subramanian L, Chang KKW, Mutlu O (2016) DASH: deadline-aware high-performance memory scheduler for heterogeneous systems with hardware accelerators. ACM Trans Archit Code Optim 12(4):65
Article Google Scholar
Vasilios K, Georgios K, Nikolaos V (2018) Combining software cache partitioning and loop tiling for effective shared cache management. ACM Trans Embedded Comput Syst (TECS) 17(3):72
Google Scholar
Wang X, Chen S, Setter J, Martínez JF (2017) Swap: Effective fine-grain management of shared last-level caches with minimum hardware support. In: HPCA
Ward B, Herman J, Kenna C, Anderson J (2013) Making shared caches more predictable on multicore platforms. In: ECRTS
Wilhelm R, Engblom J, Ermedahl A, Holsti N, Thesing S, Whalley D, Bernat G, Ferdinand C, Heckmann R, Mitra T, Mueller F, Puaut I, Puschner P, Staschulat J, Stenström P (2008) The worst-case execution-time problem: overview of methods and survey of tools. ACM Trans Embed Comput Syst 7(3):36
Article Google Scholar
Xie Y, Loh G (2008) Dynamic classification of program memory behaviors in CMPS. In: the 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects
Xie Y, Loh GH (2009) PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. SIGARCH Comput Archit News 37(3):174–183
Article Google Scholar
Xu C, Rajamani K, Ferreira A, Felter W, Rubio J, Li Y (2018) dCat: dynamic cache management for efficient, performance-sensitive infrastructure-as-a-service. In: EuroSys
Ye Y, West R, Cheng Z, Li Y (2014) Coloris: a dynamic cache partitioning system using page coloring. In: 2014 23rd International Conference on Parallel Architecture and Compilation Techniques (PACT)

Download references

Acknowledgements

This work was supported by the National Research Foundation (NRF) Grants funded by Korean Government (2018R1A2B2005277).

Author information

Authors and Affiliations

School of Computing, Korea Advanced Institute of Science and Technology, Daejeon, Korea
Myoungjun Lee & Soontae Kim

Authors

Myoungjun Lee
View author publications
You can also search for this author in PubMed Google Scholar
Soontae Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Myoungjun Lee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, M., Kim, S. Time-sensitivity-aware shared cache architecture for multi-core embedded systems. J Supercomput 75, 6746–6776 (2019). https://doi.org/10.1007/s11227-019-02891-w

Download citation

Published: 18 May 2019
Issue Date: October 2019
DOI: https://doi.org/10.1007/s11227-019-02891-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Time-sensitivity-aware shared cache architecture for multi-core embedded systems

Abstract

Access this article

Similar content being viewed by others

CaPPS: cache partitioning with partial sharing for multi-core embedded systems

Hybrid Approach on Cache Aware Real-Time Scheduling for Multi-Core Systems

How to Improve the Reactiveness and Efficiency of Embedded Multi-core Systems by Use of Probabilistic Simulation and Optimization Techniques

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Time-sensitivity-aware shared cache architecture for multi-core embedded systems

Abstract

Access this article

Similar content being viewed by others

CaPPS: cache partitioning with partial sharing for multi-core embedded systems

Hybrid Approach on Cache Aware Real-Time Scheduling for Multi-Core Systems

How to Improve the Reactiveness and Efficiency of Embedded Multi-core Systems by Use of Probabilistic Simulation and Optimization Techniques

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation