Abstract
The success of large-scale, hierarchical and distributed shared memory systems hinges on our ability to reduce delays resulting from remote accesses to shared data. To facilitate this, we present a compile-time algorithm for analyzing programs with doall-style parallelism to determine when read and write accesses to shared data areredundant (unnecessary). One identified, redundant remote accesses can be replaced by local accesses or eliminated entirely. This optimization improves program performance in two ways. First, slow memory accesses are replaced by faster ones. Second, the time to perform other remote memory accesses may be reduced as a result of the decreased traffic level. We also show how the information obtained through redundancy analysis can be used for other compiler optimizations such as prefetching and cache management.
Similar content being viewed by others
References
Daniel Gajski, David Kuck, Duncan Lawrie, and Ahmed Sameh, Cedar—a Large Scale Multiprocessor. InProc. of the Int. Conf. on Parallel Processing, pp. 524–529 (August 1983).
Z. Lajormi and T. Priol, KOAN: A Shared-Memory for the iPSC/2 Hypercube. InCONPAR/VAPP92, LNCS 634. Springer-Verlag (September 1992).
D. Lenoski, J. Laudon, T. Joe, D. Nakahira, L. Stevens, A. Gupta, and J. Hennessy, The DASH Prototype: Implementation and Performance. InInt. Symp. on Computer Architecture, pp. 92–103 (May 1992).
P. Keleher, S. Dwarkadas, A. Cox, and W. Zwaenepoel, Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems. InWinter Usenix Conference (1994).
Kendall Square Research Corporation,Kendall Square Research Technical Summary (1992).
Cray Research, Inc.CRAY T3D System Architecture Overview (1993).
CONVEX Computer Corporation, 3000 Waterview Parkway, Richardson, Texas 75083-3851.Exemplar Architecture, November 1993. Order No. DHW-014.
E. Morel and C. Renvoise, Global Optimization by Suppression of Partial Redundancies.Comm. of the ACM 22(2):96–103 (February 1979).
Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck, Global Value Numbers and Redundant Computations. InACM Symp. on Principles of Programming Languages, pp. 12–27 (January 1988).
Jennifer Anderson and Monica Lam, Global Optimizations for Parallelism and Locality on Scalable Parallel Machines. InProgramming Languages Design and Implementation (June 1993).
Bill Applebe, Charles Hardnett, and Sri Doddapaneni, Program Transformation for Locality Using Affinity Regions. InProc. of the Sixth Workshop on Languages and Compilers for Parallel Computing, Portland, Oregon (August 1993).
François Bodin, Elana D. Granston, and Thierry Montaut, Page-level Affinity Scheduling for Eliminating False Sharing. InFifth Workshop on Compilers for Parallel Computers, Malaga, Spain (June 1995).
Alexander V. Veidenbaum, A Compiler-assisted Cache Coherence Solution for Multiprocessors. InProc. of the Int. Conf. on Parallel Processing, pp. 1029–1036 (August 1986).
Ron Cytron, Steve Karlovsky, and Kevin P. McAuliffe, Automatic Management of Programmable Caches Using Flow Analysis. InProc. of the Int. Conf. on Parallel Processing, Vol. II, pp. 229–238 (August 1988).
Hoichi Cheong and Alexander V. Veidenbaum, Stale Data Detection and Coherence Enforcement Using Flow Analysis. InProc. of the Int. Conf. on Parallel Processing, Vol. 1, pp. 138–145 (August 1988).
Ervan Darnell, John M. Mellor-Crummey, and Ken Kennedy, Automatic Software Cache Coherence Through Vectorization. InProc. of the Int. Conf. on Supercomputing, pp. 129–138 (July 1992).
Olivier Temam, Elana D. Granston, and William Jalby, To Copy or Not to Copy: A Compile-Time Technique, for Assessing When Data Copying Should be Used to Eliminate Cache Conflicts. InSupercomputing '93, pp. 410–419 (November 1993).
François Bodin, William Jalby, Christine Eisenbeis, and Daniel Windheiser, Window-Based Register Allocation. Technical Report, INRIA (1991).
Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman,Compilers: Principles, Techniques and Tools, Addison-Wesley, Reading, Massachusetts (1986).
Vasanth Balasundaram, A Mechanism for Keeping Useful Internal Information in Parallel Programming Tools: The Data Access Descriptor.Journal of Parallel and Distributed Computing,9(2):154–170 (June 1990).
Paul Havlak, and Ken Kennedy, An Implementation of Interprocedural Bounded Regular Section Analysis,IEEE Trans. on Parallel and Distributed Systems,2(3):350–360 (July 1991).
Dale Schouten, An Overview of Interprocedural Analysis Techniques for High Performance Parallelizing Compilers. Master's thesis, Center for Supercomputing Research and Development, University of Illinois at urbana-Champaign (December 1989).
Lorenz Huelsbergen, Douglas Hahn, and James Larus, Exact Data Dependence Analysis Using Data Access Descriptors. Technical Report 945, Computer Science Department, University of Wisconsin-Madison (July 1990).
William Pugh, The Omega Test: A Fast and Practical Integer Programming Algorithm for Dependence Analysis.Comm. of the ACM, pp. 102–114 (August 1992).
Thomas Gross and Peter Steenkiste, Structured Dataflow Analysis for Arrays and Its Use in an Optimizing Compiler.Software—Practice & Experience,20(2):133–115 (February 1990).
Barbara G. Ryder and Marvin C. Paull, Elimination Algorithms for Data Flow Analysis.Computing Surveys,18(3):277–316 (September 1986).
Elana D. Granston. Reducing Memory Access Delays in Large-Scale, Shared-Memory Multiprocessors. Ph.D. thesis, Center for Supercomputing Research and Development, Technical Report 1257, University of Illinois at Urbana-Champaign (October 1992).
Carl M. Rosene, Incremental Dependence Analysis. Ph.D. thesis, Rice University, Technical Report COMP TR90-112 (March 1991).
Jeanne Ferrante, Dirk Grunwald, and Harini Srinivasan, Computing Communication Sets for Control Parallel Programs. InProc. of the Seventh Workshop on Languages and Compilers for Parallel Computing, Ithaca, New York, August 1994.
Manish Gupta, Edith Schonberg, and Harini Srinivasan, A Unified Framework for Optimizing Communication. InProc. of the Seventh Workshop on Laguages and Compilers for Parallel Computing, Ithaca, New York (August 1994).
Reinhard von Hanxleden and Ken Kennedy, A Code Placement Framework and Its Application to Communication Generation. Technical Report CRPC-TR93337-S, Center for Research on Parallel Computation, Rice University (October 1993).
Elana D. Granston, Thierry Montaut, and François Bodin, Loop Transformations to Prevent False Sharing,International Journal of Parallel Programming,23(4):263–301 (October 1995).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Granston, E.D., Veidenbaum, A.V. Combining flow and dependence analyses to expose redundant array accesses. Int J Parallel Prog 23, 423–470 (1995). https://doi.org/10.1007/BF02577773
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF02577773