Abstract
The performence of scientific programs on modern processors can be significantly degraded by memory references that frequently arise due to load and store operations associated with array references. We have developed techniques for optimally allocating registers to array elements whose values are repeatedly referenced over one or more loop iterations. The resulting placement of loads and stores is optimal in that number of loads and stores encoutered along each path through the loop is minimal for the given program branching structure. To place load, store, and register-to-register shift operations without introducing fully/partially redundant and dead memory operations, a detailed value flow analysis of array references is required. We present an analysis framework to efficiently solve various data flow problems required by array load-store optimizations. The framework determines the collective behavior of recurrent references spread over multiple loop iterations. We also demonstrate how our algorithms can be adapted for various fine-grain architectures.
Similar content being viewed by others
References
Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman, Compilers, Principles, Techniques, and Tools, Addison-Wesley, 1986.
M. E. Benitez and J. W. Davidson, Code Generation for Streaming: an Access/Execute Mechanism, Proc. of Arch. Support for Programming Languages and Operating Systems-IV, pp. 132–141 (1991).
D. Callahan, S. Carr, and K. Kennedy, Improving Register Allocation for Subscripted Variables, Proc. of the SIGPLAN Conf. on PLDI, White Plains, New York, pp. 53–65 (June 1990).
S. Carr and K. Kennedy, Scalar Replacement in the Presence of Conditional Control Flow, Software-Practice and Experience, 24(1):51–77 (January 1994).
E. Duesterwald, R. Gupta, and M L. Sofia, Register Pipelining: An Integrated Approach to Register Allocation for Scalar and Subscripted Variables, Proc. of Int’l. Workshop on Compiler Construction, LNCS 641 Springer Verlag, Paderborn, Germany, pp. 192–206 (October 1992).
E. Duesterwald, R. Gupta, and M. L. Soffa, A Practical Data Flow Framework for Array Reference Analysis and its Application in Optimizations, Proc. of ACM SIGPLAN Conf. PLDI, Albuquerque, New Mexico, pp. 68–77 (June 1993).
D. M. Dhamdhere, Practical Adaption of the Global Optimization Algorithm of Morel and Renvoise, ACM Trans. on Programming Languages and Systems, 13(2):291–294 (April 1991).
D. M. Dhamdhere, B. K. Rosen and F. K. Zadeck, How to Analyze Large Programs Efficiently and Informatively, Proc. of the SIGPLAN PLDI, San Francisco, California, pp. 212–223 (June 1992).
J. Knoop, O. Ruthing, and B. Steffen, Optimal Code Motion: Theory and Practice, ACM TOPLAS, 16 (4): 1117–1155.
E. Morel and C. Renvoise, Global Optimization by Suppression of Partial Redundancies, Comm. ACM, 22(2): 96–103 (1979).
M. Wolfe and U. Banerjee, Data Dependence and its Application to Parallel Processing, IJPP, 16 (2): (April 1987).
L. Hendren, G. R. Gao, E. R. Altman, and C. Mukerji, A Register Allocation Framework Based Upon Hierarchical Cyclic Interval Graphs, Int’l. Workshop on Compiler Construction, LNCS 641 Springer Verlag, Germany, pp. 176–191 (1992)
R. Gupta, Generalized Dominators and Post-Dominators, The 19th Ann. ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, Albuquerque, New Mexico, pp. 246–257 (January 1992).
T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms, The MIT Press, Cambridge, Massachusetts (1990).
B. R. Rau. M. Lee, P. P. Tirumalai, M. S. Schlansker, Register Allocation for Software Pipelined Loops, Proc. of the SIGPLAN Conf. PLDI, San Francisco, California, pp. 212–223 (June 1992).
J. C. Dehnert, P. Y.-T. Hsu, and J. P. Bratt, Overlapped Loop Support in the Cydra 5, Proc. of ASPLOS-III, pp. 26–39 (1989).
P. Kolte and M. J. Harrold, Load/Store Range Analysis for Global Register Allocation, Proc. of the SIGPLAN Conf. PLDI, Albequerque, New Mexico, pp. 268–277 (June 1994).
V. Kathail, M. Schlansker, and B. Rau, HPL PlayDoh Architecture Specification: Version 1.0, HPL-93-80 (February 1994).
B. R. Rau, Data Flow and Dependence Analysis for Instruction-Level Parallelism, Fourth Annual Workshop on Languages and Compilers for Parallel Computing, Santa Clara, California (August 1991).
R. Bodik and R. Gupta, Optimal Placement of Load-Store Operations for Array Accesses in Loops, Technical Report 95-03, DCS, University of Pittsburgh (1995).
G. J. Chaitin, Register Allocation and Spilling via Graph Coloring, Proc. of the SIGPLAN Symp. on Compiler Construction, SIGPLAN Notices, 17(6):98–105 (June 1982).
Author information
Authors and Affiliations
Additional information
Partially supported by National Science Foundation Presidential Young Investigator Award CCR-9157371 to the University of Pittsburgh and a grant from Hewlett-Packard Laboratories.
Rights and permissions
About this article
Cite this article
Bodík, R., Gupta, R. Array Data Flow Analysis for Load-Store Optimizations in Fine-Grain Architectures. Int J Parallel Prog 24, 481–512 (1996). https://doi.org/10.1007/BF03356757
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF03356757