Improving Memory Traffic by Assembly-Level Exploitation of Reuses for Vector Registers

Chang, Chih-Yung; Chen, Tzung-Shi; Sheu, Jang-Ping

doi:10.1023/A:1008134522009

Improving Memory Traffic by Assembly-Level Exploitation of Reuses for Vector Registers

Published: January 2000

Volume 17, pages 187–204, (2000)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Chih-Yung Chang¹,
Tzung-Shi Chen² &
Jang-Ping Sheu³

39 Accesses
2 Citations
Explore all metrics

Abstract

In this paper, we propose a compilation scheme to analyze and exploit the implicit reuses of vector register data. According to the reuse analysis, we present a translation strategy that translates the vectorized loops into assembly vector codes with exploitation of vector reuses. Experimental results show that our compilation technique can improve the execution time and traffic between shared memory and vector registers. Techniques discussed here are simple, systematic, and easy to be implemented in the conventional vector compilers or translators to enhance the data locality of vector registers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Hybrid Machine Learning Model for Code Optimization

Article 22 September 2023

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Article 27 April 2021

Supporting single responsibility through automated extract method refactoring

Article 22 December 2023

References

R. Allen and K. Kennedy. PFC: a program to convert Fortran to parallel form. In Proceedings ofIBM Conference on Parallel Computing and Scientific Computation, 1982.
R. Allen and K. Kennedy. Vectorregister allocation. IEEE Transactions on Computers, 41(10):1290-1317, 1992.
Google Scholar
D. Callahan, S. Carr,and K. Kennedy. Improving register allocation for subscripted variables. In Proceedings of the ACM SIGPLAN'90 Conference on Programming Language Design and Implementation, pp. 53-65, June 1990.
H. Cheng. Vector pipelining,chaining, and speed on the IBM 3090 and Cray X-MP. IEEE Computer, 10:31-46, 1989.
Google Scholar
Convex.CONVEX FORTRAN Optimization Guide. CONVEX Computer Corporation, Richardson, TX, 1990.
F. Dahlgren and P. Stenstrom. Evaluation of hardware-based stride and sequential prefetching in shared-memory multiprocessors. IEEE Transactions on Parallel and Distributed Systems, pp. 385-398, April 1996.
D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformation. Journal of Parallel and Distributed Computing, 5(5):587-616, 1988.
Google Scholar
F. Irigoin and R. Triolet. Supernode partitioning. InProceedings of the Fifteenth Annual ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, pp. 319-329, January 1988.
L. S. Liu, C. W. Ho, and J. P. Sheu. On the parallelism of nested for-loops using index shiftmethod. In Proceedings of the 1990 International Conference on Parallel Processing, Vol. lII, pp. 119-123, August 1990.
Google Scholar
N. Manjikian. Compiling loop fusion with prefetching on shared-memory multiprocessors. In Proceedingsof 1997 International Conference on Parallel Processing, pp. 78-82, 1990.
N. Mitchell, L. Carter, J. Ferrante,and K. Hogstedt. Quantifying the multi-level nature of tiling iterations. In The 10th International Workshop on Languages and Compilers for Parallel Computing, pp. 1-15, 1997.
D. A. Padua and M. J. Wolfe. Advanced compileroptimizations for supercomputers. Communications of the ACM, 29(12):1184-1201, 1986.
Google Scholar
S. M. Pizerand V. L. Wallace. To Compute Numerically Concepts and Strategies. Little, Brown and Company, Boston, 1993.
Google Scholar
M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN'91Conference on Programming Language Design and Implementation, pp. 30-44, June 1991.
M. J. Wolfe. Moreiteration space tiling. Proceedings of the ACM International Conference on Supercomputing, pp. 655-664, November 1989.
S. Yakowitz and F. Szidarovszky. An Introduction to Numerical Computations. 2nd ed. MacmillanPublishing Company, New York, 1989.
Google Scholar
H. Zima and B. Chapman. Supercompilers for Parallel and VectorComputers. Addison-Wesley Publishing Company, Reading, Mass., 1990.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Science, Aletheia University, 32 Chen-Li St., Tamsui, Taipei, Taiwan
Chih-Yung Chang
Department of Information Management, Chang Jung University, Tainan, Taiwan
Tzung-Shi Chen
Department of Computer Science and Information Engineering, National Central University, Chung-Li, Taiwan
Jang-Ping Sheu

Authors

Chih-Yung Chang
View author publications
You can also search for this author in PubMed Google Scholar
Tzung-Shi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jang-Ping Sheu
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chang, CY., Chen, TS. & Sheu, JP. Improving Memory Traffic by Assembly-Level Exploitation of Reuses for Vector Registers. The Journal of Supercomputing 17, 187–204 (2000). https://doi.org/10.1023/A:1008134522009

Download citation

Issue Date: January 2000
DOI: https://doi.org/10.1023/A:1008134522009

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving Memory Traffic by Assembly-Level Exploitation of Reuses for Vector Registers

Abstract

Access this article

Similar content being viewed by others

A Hybrid Machine Learning Model for Code Optimization

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Supporting single responsibility through automated extract method refactoring

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Improving Memory Traffic by Assembly-Level Exploitation of Reuses for Vector Registers

Abstract

Access this article

Similar content being viewed by others

A Hybrid Machine Learning Model for Code Optimization

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Supporting single responsibility through automated extract method refactoring

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation