Regular Article
Efficient Integration of Compiler-Directed Cache Coherence and Data Prefetching

https://doi.org/10.1006/jpdc.2001.1784Get rights and content

Abstract

Cache coherence enforcement and memory latency reduction and hiding are very important and challenging problems in the design of large-scale distributed shared-memory (DSM) multiprocessors. We propose an integrated approach to solve these problems through a compiler-directed cache coherence scheme called the Cache Coherence with Data Prefetching (CCDP) scheme. The CCDP scheme enforces cache coherence by prefetching the potentially stale references in a parallel program. It also prefetches the non-stale references to hide their memory latencies. To optimize the performance of the CCDP scheme, some prefetch hardware support is provided to efficiently handle these two forms of data prefetching operations. We also developed the compiler techniques utilized by the CCDP scheme for stale reference detection, prefetch target analysis, and prefetch scheduling. We evaluated the performance of the CCDP scheme via execution-driven simulations of several numerical applications from the SPEC CFP95 and the Perfect benchmark suites. The simulation results show that the CCDP scheme provides significant performance improvements for the applications studied, comparable to that obtained with a full-map hardware cache coherence scheme.

References (30)

  • H.-B. Lim et al.

    Maintaining cache coherence through compiler-directed data prefetching

    J. Parallel Distrib. Comput.

    (September 1998)
  • J. Archibald et al.

    An economical solution to the cache coherence problem

    Proceedings of the 11th International Symposium on Computer Architecture

    (June 1984)
  • D. Bernstein et al.

    Compiler techniques for data prefetching on the PowerPC

    Proceedings of the 1995 International Conference on Parallel Architectures and Compilation Techniques

    (June 1995)
  • M. Berry

    The perfect club benchmarks: Effective performance evaluation of supercomputers

    Internat. J. Supercomputer Appl.

    (1989)
  • T.-F. Chen et al.

    Effective hardware-based data prefetching for high-performance processors

    IEEE Trans. Computers

    (May 1995)
  • Y.-C. Chen

    Cache Design and Performance in a Large-Scale Shared-Memory Multiprocessor System

    (1993)
  • L. Choi

    Hardware and Compiler Support for Cache Coherence in Large-Scale Multiprocessors

    (March 1996)
  • L. Choi et al.

    Techniques for compiler-directed cache coherence

    IEEE Parallel Distrib. Technol.

    (Winter 1996)
  • Cray T3D System Architecture Overview

    (March 1993)
  • Cray MPP Fortran Reference Manual, Version 6.1

    (June 1994)
  • F. Dahlgren et al.

    Evaluation of hardware-based stride and sequential prefetching in shared-memory multiprocessors

    IEEE Trans. Parallel Distrib. Systems

    (April 1996)
  • J.W.C. Fu et al.

    Data prefetching in multiprocessor vector cache memories

    Proceedings of the 18th International Symposium on Computer Architecture

    (May 1991)
  • E. Gornish, Compile time analysis for data prefetching, Master's thesis, University of Illinois at Urbana-Champaign,...
  • E. Gornish et al.

    Compiler-directed data prefetching in multiprocessors with memory hierarchies

    Proceedings of the 1990 ACM International Conference on Supercomputing

    (June 1990)
  • Cited by (2)

    • An immune inspired co-evolutionary affinity network for prefetching of distributed object

      2010, Journal of Parallel and Distributed Computing
      Citation Excerpt :

      The evolutionary features in these frameworks should be studied and applied in the distributed computing systems [4]. Caching and prefetching are common technologies to improve the performance in distributed system [10,14,18]. Web prefetching is built on Web caching to improve the page access time from Web servers [1,21].

    • On data locality in supernode transformation

      2003, Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications

    This work is supported in part by the National Science Foundation under Grants MIP 93-07910, MIP 94-96320, CDA 95-02979, and MIP 96-10379. Additional support is provided by a gift from Cray Research, Inc and by a gift from Intel Corporation.

    f1

    E-mail: [email protected]

    f2

    E-mail: [email protected]

    View full text