Abstract
In this paper, we propose a compilation scheme to analyze and exploit the implicit reuses of vector register data. According to the reuse analysis, we present a translation strategy that translates the vectorized loops into assembly vector codes with exploitation of vector reuses. Experimental results show that our compilation technique can improve the execution time and traffic between shared memory and vector registers. Techniques discussed here are simple, systematic, and easy to be implemented in the conventional vector compilers or translators to enhance the data locality of vector registers.
Similar content being viewed by others
References
R. Allen and K. Kennedy. PFC: a program to convert Fortran to parallel form. In Proceedings ofIBM Conference on Parallel Computing and Scientific Computation, 1982.
R. Allen and K. Kennedy. Vectorregister allocation. IEEE Transactions on Computers, 41(10):1290-1317, 1992.
D. Callahan, S. Carr,and K. Kennedy. Improving register allocation for subscripted variables. In Proceedings of the ACM SIGPLAN'90 Conference on Programming Language Design and Implementation, pp. 53-65, June 1990.
H. Cheng. Vector pipelining,chaining, and speed on the IBM 3090 and Cray X-MP. IEEE Computer, 10:31-46, 1989.
Convex.CONVEX FORTRAN Optimization Guide. CONVEX Computer Corporation, Richardson, TX, 1990.
F. Dahlgren and P. Stenstrom. Evaluation of hardware-based stride and sequential prefetching in shared-memory multiprocessors. IEEE Transactions on Parallel and Distributed Systems, pp. 385-398, April 1996.
D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformation. Journal of Parallel and Distributed Computing, 5(5):587-616, 1988.
F. Irigoin and R. Triolet. Supernode partitioning. InProceedings of the Fifteenth Annual ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, pp. 319-329, January 1988.
L. S. Liu, C. W. Ho, and J. P. Sheu. On the parallelism of nested for-loops using index shiftmethod. In Proceedings of the 1990 International Conference on Parallel Processing, Vol. lII, pp. 119-123, August 1990.
N. Manjikian. Compiling loop fusion with prefetching on shared-memory multiprocessors. In Proceedingsof 1997 International Conference on Parallel Processing, pp. 78-82, 1990.
N. Mitchell, L. Carter, J. Ferrante,and K. Hogstedt. Quantifying the multi-level nature of tiling iterations. In The 10th International Workshop on Languages and Compilers for Parallel Computing, pp. 1-15, 1997.
D. A. Padua and M. J. Wolfe. Advanced compileroptimizations for supercomputers. Communications of the ACM, 29(12):1184-1201, 1986.
S. M. Pizerand V. L. Wallace. To Compute Numerically Concepts and Strategies. Little, Brown and Company, Boston, 1993.
M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN'91Conference on Programming Language Design and Implementation, pp. 30-44, June 1991.
M. J. Wolfe. Moreiteration space tiling. Proceedings of the ACM International Conference on Supercomputing, pp. 655-664, November 1989.
S. Yakowitz and F. Szidarovszky. An Introduction to Numerical Computations. 2nd ed. MacmillanPublishing Company, New York, 1989.
H. Zima and B. Chapman. Supercompilers for Parallel and VectorComputers. Addison-Wesley Publishing Company, Reading, Mass., 1990.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Chang, CY., Chen, TS. & Sheu, JP. Improving Memory Traffic by Assembly-Level Exploitation of Reuses for Vector Registers. The Journal of Supercomputing 17, 187–204 (2000). https://doi.org/10.1023/A:1008134522009
Issue Date:
DOI: https://doi.org/10.1023/A:1008134522009