On the performance of trace locality of reference
Introduction
Hierarchical systems are used extensively in computer area. Most systems use cache memory to decrease their access time. They include file systems, network servers, web proxies, network clients, and memory management systems.
In a hierarchical model, objects are fetched and placed into a layer when a miss occurs or they are prefetched. If there is no room for the new object, a replacement algorithm determines the victim object. Both of the prefetching and replacement algorithms are based on predicting the system behavior. The success of a hierarchical system depends on the correct prediction of system behavior.
Although the concepts developed in this paper are general and can be applied to any hierarchical system, this paper focuses on memory management systems and studies their data access patterns. The results presented here can be used for caching (cache miss handling), address translation (TLB management), and virtual memory management (page fault handling).
Various methods have been studied so far to model the system behavior and predict objects required in near future. Locality of reference [11] is among these methods. Two LoR types are defined in literature: spatial and temporal. Most replacement algorithms, including LRU, and the effectiveness of hierarchical systems, are based on temporal LoR [9], [43]; meanwhile, some prefetching algorithms such as sequential prefetching [42] or stream buffers [31] are based on spatial LoR. Similarly, fetching a cache block or a page frame, instead of just fetching the requested word, which is a type of prefetching, is based on spatial LoR [44].
However, there are a variety of access patterns not covered by traditional LoR types [18], [29]. The fact that traditional LoR types are not sufficient to capture all access patterns, has been the motivation of lots of prediction algorithms. Examples are prefetch algorithms for array-based programs [3], [5], and algorithms for pointer-intensive programs [4], [10], [21], [22], [24], [25], [26], [27], [36], [37].
The main problem of these approaches is that they are tied to special cases. The program segment having the expected behavior should be identified and then this algorithm applied to it. Most of them are offline algorithms that require compiler support [3], [4].
A group of prediction methods uses the system trace. The system trace is defined as the sequence of accessed objects. Now, if an object was accessed previously and is accessed again, such methods predict that its nearby objects in the trace will be accessed too. Branch prediction algorithms use the past outcomes of a branch instruction to predict whether it is taken or not taken the next time. Trace cache and trace processors [19], [20], [32], [34], [35] extend the idea of branch prediction to predict next basic blocks of codes to be executed.
In data access patterns, recency-based prediction2[38], based on LRU stack, and a number of frequency-based graph algorithms [13], [17], [22] use the system trace to predict its future behavior. The graph algorithms are Markov predictor [22], [30], access graph [13], and probability graph [17]. Markov prediction has been used in the context of cache prefetching [22] and I/O prediction [30]. Access graph has been used in the context of virtual memory management, and probability graph in the context of file systems.
This paper formalizes the concept of trace LoR in general and shows how it can be used to predict future data accesses in memory management systems. Section 2 reviews the related work. Section 3 defines trace LoR as a general aspect of most systems. Section 4 introduces trace graph for capturing trace LoR. Section 5 studies the benchmark results including effects of system configuration. Section 6 introduces some extensions to trace graph to predict the correct behavior when there are more than one trace associated with an object. Section 7 defines n-stride prediction and evaluates its usefulness. Finally, Section 7 concludes the paper.
Section snippets
Related work
In the context of memory management, various models were suggested to predict a program behavior and prefetch an object. There are three types of prefetching techniques: offline, online, and hybrid methods. Offline techniques rely on compiler to analyze the program and insert prefetching instructions in the code [6], [15], [46]. Online algorithms detect access patterns at runtime, and hybrid methods [7], [8], [16], [24], [40], [45], [47] use both compiler and runtime behavior to predict the
Trace locality of reference
Consider the algorithm used to find an item in a linked list. This algorithm works as follows. It first examines the first element. If it matches the requested item, the search completes and the element is returned.
Otherwise, the second element is examined. This process continues until the list is exhausted or the item is found. This behavior is repeated whenever an item is searched in a linked list. In fact, most linked list processing functions have the same behavior.
As another example,
Definition
To make prediction using trace LoR, one should store the trace of the system. However, this information should be stored in a way that can be used for prediction. In the simplest case, one wishes to only predict the next object and only uses the last occurrence of the current object. For this purpose, trace graph is introduced.
Trace graph. For each object in the object space, one node is created. Each node has at most one outgoing edge. If the trace contains the sequence , an edge is
Experiments
To evaluate the effect of prediction using trace LoR on system performance, SimpleScalar [1] was used. The functionality of SimpleScalar was extended to extract the required information. The benchmarks were taken from SPEC CPU 2000 suite and each was run for 20,000,000 addresses. Table 1 shows the list of used benchmarks.
The prediction accuracy (the percentage of correct predictions to total number of predictions) were measured for each of the programs and each of its input set. Then, for each
Enhancing the model
In two ways, there are more than one trace for an object. To illustrate the first, note that a program may not follow its previous behavior exactly. Based on the input parameters, a code fragment accesses different objects and follows different execution paths. In this manner, the same code fragment produces different traces.
As an example, consider the linked list search example again. Consider that each item key is an ordered pair . The code to find an item in this list is shown in Fig.
n-Stride prediction
Trace graph is the simplest usage of trace LoR for predicting a system behavior. This section explores a more complicated usage of trace LoR. We call it n-stride prediction. However, it should be noted that it differs from the stride prediction discussed in [14].
n-Stride prediction shows that an access can be predicted far ahead before its happening. Note that for a prediction to be useful, not only the right prediction should be done, but also there must be sufficient time to fetch the
Conclusion
In this paper, the concept of trace LoR was developed. If a system behavior has the trace LoR property, the trace of its accesses can be stored to predict its future behavior. A simple model, called trace graph was developed for this purpose. In the next phase, the model was enhanced to improve its effectiveness. In addition, effects of system parameters on the trace graph are measured.
If there are more than one occurrence for an object in the trace, it is not clear which occurrence should be
Ali Mahjur received his B.S. and M.S. degrees in computer engineering from Sharif University of Technology (SUT), Iran, in 1996 and 1998, respectively. He has been a Ph.D. student in computer engineering at SUT since then. His research interests include Computer Architecture, Operating Systems, Memory Management Systems, and Programming Languages.
References (47)
- et al.
SimpleScalar: an infrastructure for computer system modeling
IEEE Comput.
(2002) - et al.
An effective on-chip preloading scheme to reduce data access penalty
- et al.
Tolerating latency by prefetching java objects
- et al.
Data flow analysis for software prefetching linked data structures in java controller
- et al.
Simple and effective array prefetching in Java
- et al.
Software prefetching
- et al.
Effective hardware-based data prefetching for high performance processors
IEEE Trans. Comput.
(1995) An effective programmable prefetch engine for high performance processors
LRU is better than FIFO
- et al.
A stateless, content-directed data prefetching mechanism
The working set model for program behavior
Commun. ACM
Memory-system design considerations for dynamically scheduled processors
Experimental studies of access graph-based heuristics: beating the LRU standard
Stride directed prefetching in scalar processors
Precise miss analysis for program transformations with caches of arbitrary associativity
An integrated hardware/software data prefetching scheme for shared-memory multiprocessors
Reducing file system latency using a predictive approach
A comparison of locality transformations for irregular codes
Path-based next trace prediction
Trace preconstruction
Run-time cache bypassing
IEEE Trans. Comput.
Prefetching using Markov predictors
IEEE Trans. Comput.
Improving direct-mapped cache performance by the addition of a small fully associative cache and prefetch buffers
Cited by (2)
BitTorrent traffic from a caching perspective
2013, Journal of the Brazilian Computer SocietyTwo-phase prediction of L1 data cache misses
2006, IEE Proceedings: Computers and Digital Techniques
Ali Mahjur received his B.S. and M.S. degrees in computer engineering from Sharif University of Technology (SUT), Iran, in 1996 and 1998, respectively. He has been a Ph.D. student in computer engineering at SUT since then. His research interests include Computer Architecture, Operating Systems, Memory Management Systems, and Programming Languages.
Amir Hossein Jahangir received his Ph.D. degree in industrial informatics from the Department of Electrical Engineering, Institut National des Sciences Appliquees, Toulouse, France in 1989. Since then, he has been with the Department of Computer Engineering, Sharif University of Technology, Iran, where he has taught several hardware architecture courses and supervised related research projects. From 1990 to 1994 he was the head of the department and has had several other responsibilities thereafter. His research interests include High Performance Computer Architectures, Analysis of Network devices and the design of real-time and Fault-Tolerant systems.
Amir Hossein Gholamipour will receive his B.Sc. from the Department of Computer Engineering, Sharif University of Technology, Iran by June 2005. His research interests include Computer Architecture, Real-Time systems, and Embedded Systems.
- 1
It should be noted that the n-stride prediction, introduced in this paper, differs from stride prefetching introduced in other researches.