Skip to main content
Log in

Runtime Engine for Dynamic Profile Guided Stride Prefetching

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Stride prefetching is recognized as an important technique to improve memory access performance. The prior work usually profiles and/or analyzes the program behavior offline, and uses the identified stride patterns to guide the compilation process by injecting the prefetch instructions at appropriate places. There are some researches trying to enable stride prefetching in runtime systems with online profiling, but they either cannot discover cross-procedural prefetch opportunity, or require special supports in hardware or garbage collection. In this paper, we present a prefetch engine for JVM (Java Virtual Machine). It firstly identifies the candidate load operations during just-in-time (JIT) compilation, and then instruments the compiled code to profile the addresses of those loads. The runtime profile is collected in a trace buffer, which triggers a prefetch controller upon a protection fault. The prefetch controller analyzes the trace to discover any stride patterns, then modifies the compiled code to inject the prefetch instructions in place of the instrumentations. One of the major advantages of this engine is that, it can detect striding loads in any virtual code places for both regular and irregular code, not being limited with plain loop or procedure scopes. Actually we found the cross-procedural patterns take about 30% of all the prefetchings in the representative Java benchmarks. Another major advantage of the engine is that it has runtime overhead much smaller (the maximal is less than 4.0%) than the benefits it brings. Our evaluation with Apache Harmony JVM shows that the engine can achieve an average 6.2% speed-up with SPECJVM98 and DaCapo on Intel Pentium 4 platform, in spite of the runtime overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Youfeng Wu. Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching. SIGPLAN Not., 2002, 37(5): 210–221.

    Article  Google Scholar 

  2. Robert Muth, Harish Patil, Richard Weiss, P Geoffrey Lowney, Robert Cohn. Profile-guided post-link stride prefetching. In Proc. 16th Int. Supercomputing, New York, USA, June 22–26, 2002, pp.167–178.

  3. Brendon D Cahoon. Effective compile-time analysis for data prefetching in Java [Dissertation]. University of Massachusetts, Amherst, 2002.

  4. Inagaki T, Onodera T, Komatsu H, Nakatani T. Stride prefetching by dynamically inspecting objects. In Proc. the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, San Diego, California, USA, June 9–11, 2003, pp.269–277.

  5. Adl-Tabatabai A, Hudson R L, Serrano M J, Subramoney S. Prefetch injection based on hardware monitoring and object metadata. In Proc. the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation, Washington DC, USA, June 9–11, 2004, pp.267–276.

  6. Vanderwiel S P, Lilja D J. Data prefetch mechanisms. ACM Comput. Surv., Jun. 2000, 32(2): 174–199.

    Article  Google Scholar 

  7. Bernstein D, Cohen D, Freund A. Compiler techniques for data prefetching on the PowerPC. In Proc. the IFIP Wg10.3 Working Conference on Parallel Architectures and Compilation Techniques, Limassol, Cyprus, June 27–29, 1995, pp.19–26.

  8. Santhanam V, Gornish E H, Hsu W. Data prefetching on the HP PA-8000. In Proc. the 24th Annual International Symposium on Computer Architecture, Denver, Colorado, United States, June 1–4, 1997, pp.264–273.

  9. Hölzle U, Ungar D. Reconciling responsiveness with performance in pure object-oriented languages. ACM Trans. Program. Lang. Syst., July 1996, 18(4): 355–400.

    Article  Google Scholar 

  10. Suganuma T, Yasue T, Kawahito M, Komatsu H, Nakatani T. A dynamic optimization framework for a Java just-in-time compiler. In Proc. the 16th ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications, Tampa Bay, FL, USA, October 14–18, 2001, pp.180–195.

  11. Arnold M, Ryder B G. A framework for reducing the cost of instrumented code. In Proc. the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation, Snowbird, Utah, United States, New York, pp.168–179.

  12. Apache Harmony project. http://harmony.apache.org/.

  13. Standand Performance Evaluation Corporation (SPEC). JVM client98 (SPECjvm98).

  14. The DaCapo benchmark suite. http://dacapobench.org/.

  15. Intel Corporation. Intel(R) architecture software developer’s manual, Volume 2: Instruction set reference manual. http://download.intel.com/design/intarch/manuals.

  16. Shuf Y, Serrano M J, Gupta M, Singh J P. Characterizing the memory behavior of Java workloads: A structured view and opportunities for optimizations. SIGMETRICS Perform. Eval. Rev., June 2001, pp.194–205.

  17. Hosking A L, Moss J E, Stefanovic D. A comparative performance evaluation of write barrier implementation. In Proc. Object-Oriented Programming Systems, Languages, and Applications, Vancouver, British Columbia, Canada, October 18–22, 1992, pp.92–109.

  18. Intel Corporation. VTune performance analyzer. http://www.intel.com/cd/software/products/apac/zho/vtune/275878.htm.

  19. Chen W, Bhansali S, Chilimbi T, Gao X, Chuang W. Profile-guided proactive garbage collection for locality optimization. SIGPLAN Not., Jun. 2006, pp.332–340.

  20. Chilimbi T M, Davidson B, Larus J R. Cache-conscious structure definition. In Proc. the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation, Atlanta, Georgia, United States, May 1–4, 1999, pp.13–24.

  21. Huang X, Blackburn S M, McKinley K S, Moss J B, Wang Z, Cheng P. The garbage collection advantage: Improving program locality. SIGPLAN Not., Oct. 2004, 39(10): 69–80.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiong Zou.

Additional information

Supported by the National Natural Science Foundation of China under Grant Nos. 60673146, 60603049, 60736012, and 60703017, the National High Technology Development 863 Program of China under Grant No. 2006AA010201 and No. 2007AA01Z114, the National Basic Research Program of China under Grant No. 2005CB321601.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 80 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zou, Q., Li, XF. & Zhang, LB. Runtime Engine for Dynamic Profile Guided Stride Prefetching. J. Comput. Sci. Technol. 23, 633–643 (2008). https://doi.org/10.1007/s11390-008-9159-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-008-9159-2

Keywords

Navigation