Dynamic reuse of subroutine results

https://doi.org/10.1016/j.sysarc.2006.06.002Get rights and content

Abstract

The paper discusses a concept of dynamic reuse of subroutine results. The described technique uses a hardware mechanism that caches the address of the called subroutine along with its arguments and returned value. When the same subroutine is called again using the same arguments, the returned value can be accurately predicted without an actual execution of the subroutine. Although this approach can be highly effective in some cases, it is limited to subroutines that do not use side effects, and use only by-value parameter passing. Since the proposed method requires that both the user and the compiler be aware of this mechanism, it might be more appropriate for specific computing-intensive applications, rather than standard all-purpose programming.

Introduction

In computer systems, prediction of the future can be used in order to enhance performance [10], [13]. While at compile time only little can be learnt about the true behavior of the program execution, the future can be predicted at run-time based on previous events. This approach is commonly used to predict branches [1], [13] and values [10].

Many of the recent microprocessor architectures use value prediction [10], [3], [8] in order to increase parallelism. Since the variation of values being computed during the execution of a single program is relatively low [9], some of the values can be predicted at run-time before the actual execution of the computation. Value prediction techniques are used for reducing latencies caused by data dependencies [4], and branch prediction is used for reducing the latencies caused by control dependencies.

Sodani and Sohi [14] presented a technique for eliminating the execution of instructions by dynamically reusing previously executed instructions. However, information about previously executed instructions can be used not only for eliminating the execution of a single instruction, but also for eliminating the execution of some sequences of instructions. Kumar [7], presented a technique that uses compiler generated cache for reusing function calls, and showed that a substantial number of identical subroutine calls are executed during the execution of a computer program. In this paper we propose a hardware mechanism that eliminates subroutine calls by reusing the returned values of previous calls.

Section snippets

Dynamic subroutine reuse overview

The notion of subroutines, in their various forms of functions, procedures or object methods, is widely used among software developers. A subroutine is a basic building block of a program that is often executed more than once during the program execution. A typical subroutine has a well-defined lean interface with its calling program, which is a list of arguments and a returned value. It is not seldom that subroutines are repeatedly executed with the exact same argument values, and their

Instruction set

A traditional call to a subroutine is usually performed as follows: after the parameters are pushed into the stack, the jsr instruction is executed. The return address is also pushed into the stack, and the called subroutine address is copied to the PC. The proposed new branch instruction vpjsr is similar to the regular jsr, but performs several additional actions explained later in this paper. In order to exit from a subroutine call, the instruction vpret is introduced.

The vpjsr instruction is

Performance evaluation

The proposed method was experimented by modifying the benchmark programs such that calls to subroutines with a total size of arguments less than or equal to 16 bytes were replaced with other subroutines doing the same thing. For instance, calls to the function RGB2HSV(x) were replaced by calls to the subroutine my_RGB2HSV(x) described below:

  • HSV my_RGB2HSV(RGB x)

  • {

    • unsigned char parameters[SIZE_OF_PARAMETERS];

    • unsigned char ret_val_buffer[SIZE_OF_RETURNED_VALUE];

    • HSV returned_value;

    • memcpy(parameters,

Conclusion

Dynamic elimination of repetitive identical subroutine calls can improve performance. The improvement is dependent on the behavior of program at run-time, but also on the size of the PSCT and the entries replacement policy. More entries in the PSCT allow storing more previous subroutine calls, and therefore increase the efficacy of this mechanism. The downside of this technique is that compilers and users should be aware of this mechanism and support it. The presented method can be applied only

Lior Shamir is a research scientist at Michigan Tech. He earned his bachelor’s degree in computer science from Tel Aviv University in 1999, and his Ph.D degree in Computational Science & Engineering from Michigan Tech in 2006.

References (14)

  • A.N. Eden, T. Mudge, The YAGS branch prediction scheme, in: Proc. MICRO-31, Dallas, TX, 1998, pp....
  • M. Eleftheriou, B. Fitch, A. Rayshubskiy, T.J. Christopher, R. Germain, Performance measurements of the 3D FFT on the...
  • F. Gabbay, Speculative execution based on value prediction, EE Department Technical Report 1080, Technion – Israel...
  • J.L. Hennessy et al.

    Computer Architecture: A Quantitative Approach

    (1996)
  • X. Huang et al.

    Efficient combination of multiple word models for improved sequence comparison

    Bioinformatics

    (2004)
  • IBM Journal of Research and Development 49(2/3), 2005, (Special issue on Blue...
  • K.V. Seshu Kumar

    Value reuse optimization: reuse of evaluated math library function calls through compiler generated cache

    ACM SIGPLAN Notices

    (2003)
There are more references available in the full text version of this article.

Cited by (0)

Lior Shamir is a research scientist at Michigan Tech. He earned his bachelor’s degree in computer science from Tel Aviv University in 1999, and his Ph.D degree in Computational Science & Engineering from Michigan Tech in 2006.

View full text