Skip to main content
Log in

Improving Memory Traffic by Assembly-Level Exploitation of Reuses for Vector Registers

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In this paper, we propose a compilation scheme to analyze and exploit the implicit reuses of vector register data. According to the reuse analysis, we present a translation strategy that translates the vectorized loops into assembly vector codes with exploitation of vector reuses. Experimental results show that our compilation technique can improve the execution time and traffic between shared memory and vector registers. Techniques discussed here are simple, systematic, and easy to be implemented in the conventional vector compilers or translators to enhance the data locality of vector registers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. R. Allen and K. Kennedy. PFC: a program to convert Fortran to parallel form. In Proceedings ofIBM Conference on Parallel Computing and Scientific Computation, 1982.

  2. R. Allen and K. Kennedy. Vectorregister allocation. IEEE Transactions on Computers, 41(10):1290-1317, 1992.

    Google Scholar 

  3. D. Callahan, S. Carr,and K. Kennedy. Improving register allocation for subscripted variables. In Proceedings of the ACM SIGPLAN'90 Conference on Programming Language Design and Implementation, pp. 53-65, June 1990.

  4. H. Cheng. Vector pipelining,chaining, and speed on the IBM 3090 and Cray X-MP. IEEE Computer, 10:31-46, 1989.

    Google Scholar 

  5. Convex.CONVEX FORTRAN Optimization Guide. CONVEX Computer Corporation, Richardson, TX, 1990.

  6. F. Dahlgren and P. Stenstrom. Evaluation of hardware-based stride and sequential prefetching in shared-memory multiprocessors. IEEE Transactions on Parallel and Distributed Systems, pp. 385-398, April 1996.

  7. D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformation. Journal of Parallel and Distributed Computing, 5(5):587-616, 1988.

    Google Scholar 

  8. F. Irigoin and R. Triolet. Supernode partitioning. InProceedings of the Fifteenth Annual ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, pp. 319-329, January 1988.

  9. L. S. Liu, C. W. Ho, and J. P. Sheu. On the parallelism of nested for-loops using index shiftmethod. In Proceedings of the 1990 International Conference on Parallel Processing, Vol. lII, pp. 119-123, August 1990.

    Google Scholar 

  10. N. Manjikian. Compiling loop fusion with prefetching on shared-memory multiprocessors. In Proceedingsof 1997 International Conference on Parallel Processing, pp. 78-82, 1990.

  11. N. Mitchell, L. Carter, J. Ferrante,and K. Hogstedt. Quantifying the multi-level nature of tiling iterations. In The 10th International Workshop on Languages and Compilers for Parallel Computing, pp. 1-15, 1997.

  12. D. A. Padua and M. J. Wolfe. Advanced compileroptimizations for supercomputers. Communications of the ACM, 29(12):1184-1201, 1986.

    Google Scholar 

  13. S. M. Pizerand V. L. Wallace. To Compute Numerically Concepts and Strategies. Little, Brown and Company, Boston, 1993.

    Google Scholar 

  14. M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN'91Conference on Programming Language Design and Implementation, pp. 30-44, June 1991.

  15. M. J. Wolfe. Moreiteration space tiling. Proceedings of the ACM International Conference on Supercomputing, pp. 655-664, November 1989.

  16. S. Yakowitz and F. Szidarovszky. An Introduction to Numerical Computations. 2nd ed. MacmillanPublishing Company, New York, 1989.

    Google Scholar 

  17. H. Zima and B. Chapman. Supercompilers for Parallel and VectorComputers. Addison-Wesley Publishing Company, Reading, Mass., 1990.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chang, CY., Chen, TS. & Sheu, JP. Improving Memory Traffic by Assembly-Level Exploitation of Reuses for Vector Registers. The Journal of Supercomputing 17, 187–204 (2000). https://doi.org/10.1023/A:1008134522009

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008134522009

Navigation