Data re-allocation enabled cache locking for embedded systems

https://doi.org/10.1016/j.sysarc.2016.12.002Get rights and content

Abstract

Cache locking is a cache management technique to preclude the replacement of locked contents. Cache locking methods have been proposed to improve predictability and worst-case execution time (WCET) previously. Recently, instruction cache locking has also been applied to improve average-case execution time (ACET). However, we observe that the previous ACET-driven instruction cache locking technique shows very limited improvement on performance when applied in data cache. The underlying reason lies in that object similarity of data accesses in data memory blocks are relatively low. This paper presents a data re-allocation enabled cache locking framework where data objects are first re-allocated to enhance data object similarity in memory blocks and then a data cache locking is well motivated. In this way, locking efficiency for data cache can be enhanced and thus system performance can be improved. The experimental results show that the miss rate, memory access cycles and dynamic energy can obtain good improvements across a suite of benchmarks.

Introduction

Cache is a widely used component in embedded systems to bridge the widening processor-memory performance gap. Cache locking is a technique to further improve cache efficiency which is supported by many modern processors, such as MIPS32 series [1], Intel XScale series [2], ARM processor series [3] and MPC processors [4]. Cache locking is implemented by executing a special locking routine. It empowers a processor with the ability to prevent part or all of its instructions or data from being evicted unless it is unlocked. Once a memory block is locked in the cache, all the subsequent accesses to the locked contents are cache hits. Although locking contents in the cache avoids cache miss penalty, the available cache space is reduced because any locked cache resource is unavailable for caching other parts of the main memory. Hence, a benefit-cost tradeoff should be carefully considered when implementing cache locking.

Recent work has shown that instruction cache locking can be able to improve average execution time (ACET) of a program by eliminating the evictions resulting from conflict accesses [6]. However, the effectiveness is different in data cache locking. Data cache analysis is more complicated than instruction cache due to two major reasons. First, locality of a reference is harder to capture for data accesses. Second, single instruction may refer to multiple memory locations [7]. In other words, data cache has poorer locality than instruction cache. The prior work summarizes four different types of data reference reuses: self-temporal reuse, self-spatial reuse, group-temporal reuse and group-spatial reuse [8]. As a result, reuse analysis of data object, a key procedure in cache locking content determination, is very complicated for data cache. In this paper, a data object denotes a generic variable that holds an individual value. It is used to differentiate a single number from a vector or matrix array etc.. The size of a data object may be defined differently among different types such as integers, floats, chars, strings, Booleans and so on.

In this paper, we conduct a set of experiments and observe that the conventional locking heuristic technique applied in instruction cache to improve ACET performance exhibits poor efficiency in data cache for most benchmarks. By analyzing the relationship of data cache locking and data cache properties, we have two interesting observations. First, object similarity of data access behavior in a memory block is weaker than that in an instruction memory block. The term object similarity means how similar data access behavior in a memory block exhibits in execution flows. Second, the one-trace based analyzing approach is not suitable for data cache locking because accesses to data cache are more sensitive to input data sets than accesses to instruction cache. We consider this input-sensitivity cannot be negligible. Therefore, we are motivated to put forward a solution in which a data layout is conducted beforehand to improve data object similarity inside memory blocks and then cache locking is determined based on the entire program control flow graph. In this way, a good benefit is expected to be exploited from data cache locking.

This paper proposes a data re-allocation enabled data cache locking technique to improve ACET performance of programs based on global data flow analysis. The main concept is to reorganize data layout by placing data objects with the largest interference frequency into the same memory block and select the memory block with the highest interference risk for locking. The reduction of interferences leads to a reduction of possible cache miss, which in turn improves ACET performance. The proposed approach has double-fold effects. First, data re-allocation can eliminate a partial of potential conflict misses and meanwhile improve the similarity of data objects inside a memory block. Second, cache locking can exploit the highest efficiency in performance improvement based on the data re-allocation effort. The experimental results show that the proposed approach, data re-allocation followed by data cache locking, can be an effective strategy to improve system ACET performance. The contributions of this paper include:

  • Observations based on a set of experiments are presented that the conventional instruction cache locking approach has little effect on data cache. Furthermore, data cache locking efficiency is sensitive to program inputs.

  • The above observations result from the fact that data cache exhibits inherent poorer object similarity than instruction cache.

  • A data re-allocation approach is proposed to reorganize data objects in the memory so as to minimize the memory block interference frequency and improve object similarity of inside data memory blocks.

  • The techniques of building a data interference graph and a memory block interference graph are proposed to direct data re-allocation and data cache locking respectively to achieve performance improvement.

The rest of the paper is organized as follows. The related work is introduced in Section 2. The observations and discussion of data cache locking are illustrated in Section 3. In Section 4, a motivational example is presented and the data re-allocation enabled cache locking problem is addressed. Section 5 presents the detailed solution framework. A set of experiments is conducted to evaluate the proposed approach in Section 6. Finally, Section 7 concludes the paper.

Section snippets

Related work

In previous work, cache locking has been studied to have promises in improving predictability in hard real time systems. As the hits and misses of the references can be counted statically with cache locking, the memory access cycles can be determined accurately, leading to tight worst-case execution time (WCET) analysis. Asaduzzaman et al. [9] proposed an instruction locking scheme of locking the blocks that cause the most misses using offline analysis on the Heptane platform. Liu and Xue [10]

Insights to data cache locking

In this section, we first give some preliminaries of cache locking, and then analyze the properties of data cache locking by comparing to instruction cache locking based on a set of experiments.

Motivation

In this section, we will show a motivational example to present that a data re-allocation followed by a data cache locking strategy can effectively improve system ACET performance. Fig. 2(a) first shows a simple memory-cache mapping scenario with a 2-way set associative cache. For ease of explanation, we make the following assumptions for this example.

  • All data objects, a, b, c, d, e and f, are of the same size and each memory block (or cache line) is assumed to be able to hold only two

Data re-allocation enabled cache locking framework

The objective of this paper is to minimize total cache misses and improve ACET performance. For each possible data layout, we can determine a sub-optimal locking solution. Therefore there exists a global optimal locking solution among all these sub-optimal solutions. Since we have to work out all possible data layouts and their corresponding sub-optimal locking solutions and then select the global optimal solution, it is almost likely an NP-hard problem though we have not attempted to prove

Experimental evaluation

In this section, we first introduce the experimental setup and then present the experimental results of using the proposed data re-allocation enabled cache locking framework.

Conclusion

This paper presents a data re-allocation enabled data cache locking framework to improve ACET performance of embedded systems. Interference analyses of data objects and memory blocks are conducted based on a global control flow to direct data re-allocation and locking determination. The proposed approach shows good results on cache miss reduction and performance improvement, thus perhaps initiating further opportunities for data cache optimization.

Chun Xue received the B.S. degree in Computer Science and Engineering from University of Texas at Arlington in May 1997, and the M.S. degree and Ph.D. degree in computer Science from University of Texas at Dallas, in December 2002 and May 2007, respectively. He is now an associate professor in Department of Computer Science of the City University of Hong Kong. His research interests include memory and parallelism optimization for embedded systems, software/hardware co-design, real time systems

References (21)

  • MIPS32:...
  • Intel:...
  • ARM9:...
  • MPC:...
  • Nios II:...
  • Y. Liang et al.

    Instruction Cache Locking Using Temporal Reuse Profile

    Proceedings of the 47th Design Automation Conference (DAC)

    (2010)
  • X. Vera et al.

    Data cache locking for higher program predictability

    Proceedings of the 2003 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems

    (2003)
  • M.S. Lam et al.

    A data locality optimizing algorithm

    SIGPLAN Not.

    (2004)
  • A. Asaduzzaman et al.

    Evaluation of I-cache locking technique for real-time embedded systems

    IEEE Fourth International Conference on Innovations in Information Technology

    (2007)
  • M.L.T. Liu et al.

    Minimizing WCET for real-time embedded systems via static instruction cache locking

    15th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)

    (2009)
There are more references available in the full text version of this article.

Chun Xue received the B.S. degree in Computer Science and Engineering from University of Texas at Arlington in May 1997, and the M.S. degree and Ph.D. degree in computer Science from University of Texas at Dallas, in December 2002 and May 2007, respectively. He is now an associate professor in Department of Computer Science of the City University of Hong Kong. His research interests include memory and parallelism optimization for embedded systems, software/hardware co-design, real time systems and computer security.

Keni Qiu received B.S. degree from Department of Information Technology, Central China Normal University, Wuhan, China, in 2001, M.S. degree from University of Chinese Academy of Sciences (UCAS), Beijing, China, in 2004, Ph.D. degree from City University of Hong Kong, China, in 2014 respectively. She had worked as an electronic engineer in the Center of Space Science and Application Research (CSSAR), Chinese Academy of Sciences (CAS), Beijing, China, from 2004 to 2010. She is currently a research associate in the College of Information Engineering at Capital Normal University (CNU), Beijing, China. Her current research interests include embedded systems and non-volatile memory optimizations.

Weigong Zhang received B.S. degree from Department of Computer Science and Technology, Dalian University of Technology, Dalian, China, in 1989 and Ph.D. degree from Xidian University, Xi’an, China, in 2005, respectively. He had worked as vice-chief engineer in the Xi’an Microelectronics Technology Institute, Xi’an, China, from 1989 to 2007. He is currently a research professor in College of Information Engineering at Capital Normal University (CNU), Beijing, China. His research interests include reliable embedded systems and device interconnection.

Jing Wang received the B.S. degree and Ph.D. degree from the Department of Computer Science and Technology of Harbin Institute University, Harbin, China in 2005 and Peking University, Beijing, China, in 2011, respectively. She is currently an assistant professor in College of Information Engineering at Capital Normal University (CNU), Beijing, China. Her current research interests include computer architecture, big data and cloud computing.

Yuanchao Xu received B.S. degree in Computer Science from Beijing Institute of Technology in Beijing in July 1998, M.S. degree in Computer Science from Beihang University in Beijing in April 2002, and Ph.D. degree in Computer Architecture from Institute of Computing Technology, Chinese Academy of Sciences in July 2012, respectively. He is now an associate professor in College of Information Engineering at the Capital Normal University, Beijing, China. His research interests include operating system support for emerging memory technologies and architecture.

Mengying Zhao received the B.S. degree and Ph.D. degree from the Department of Computer Science and Technology of Shandong University, Jinan, China in 2011 and City University of Hong Kong, China, in 2014, respectively. She is currently an assistant professor in the Department of Computer Science and Technology of Shandong University, Jinan, China. Her research interests include embedded and real-time systems, architecture, and memory optimizations.

View full text