Inter-array Data Regrouping

Ding, Chen; Kennedy, Ken

doi:10.1007/3-540-44905-1_10

Chen Ding⁵ &
Ken Kennedy⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1863))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

338 Accesses
13 Citations
3 Altmetric

Abstract

As the speed gap between CPU and memory widens, memory hierarchy has become the performance bottleneck for most applications because of both the high latency and low bandwidth of direct memory access. With the recent introduction of latency hiding strategies on modern machines, limited memory bandwidth has become the primary performance constraint and, consequently, the effective use of available memory bandwidth has become critical. Since memory data is transferred one cache block at a time, improving the utilization of cache blocks can directly improve memory bandwidth utilization and program performance. However, existing optimizations do not maximize cache-block utilization because they are intra-array; that is, they improve only data reuse within single arrays, and they do not group useful data of multiple arrays into the same cache block. In this paper, we present inter-array data regrouping, a global data transformation that first splits and then selectively regroups all data arrays in a program. The new transformation is optimal in the sense that it exploits inter-array cache-block reuse when and only when it is always profitable. When evaluated on real-world programs with both regular contiguous data access, and irregular and dynamic data access, inter-array data regrouping transforms as many as 26 arrays in a program and improves the overall performance by as much as 32%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J. Anderson, S. Amarasinghe, and M. Lam. Data and computation transformation for multiprocessors. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Santa Barbara, CA, July 1995.
Google Scholar
O. Beckmann and P.H.J. Kelly. Efficient interprocedural data placement optimisation in a parallel library. In Proceedings of the Fourth Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers, May 1998.
Google Scholar
B. Calder, K. Chandra, S. John, and T. Austin. Cache-conscious data placement. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), San Jose, Oct 1998.
Google Scholar
T.M. Chilimbi, B. Davidson, and J.R. Larus. Cache-conscious structure definition. In Proceedings of SIGPLAN Conference on Programming Language Design and Implementation, 1999.
Google Scholar
M. Cierniak and W. Li. Unifying data and control transformations for distributed shared-memory machines. In Proceedings of the SIGPLAN’ 95 Conference on Programming Language Design and Implementation, La Jolla, June 1995.
Google Scholar
C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the SIGPLAN’ 99 Conference on Programming Language Design and Implementation, Atlanta, GA, May 1999.
Google Scholar
C. Ding and K. Kennedy. Memory bandwidth bottleneck and its amelioration by a compiler. Technical report, Rice University, May 1999. Submitted for publication.
Google Scholar
J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing, Fourth International Workshop, Santa Clara, CA, August 1991. Springer-Verlag.
Google Scholar
D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformations. In Proceedings of the First International Conference on Supercomputing. Springer-Verlag, Athens, Greece, June 1987.
Google Scholar
Tor E. Jeremiassen and Susan J. Eggers. Reducing false sharing on shared memory multiprocessors through compile time data transformations. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 179–188, Santa Barbara, CA, July 1995.
Google Scholar
M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee. A matrix-based approach to the global locality optimization problem. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, 1998.
Google Scholar
D. G. Kirkpatrick and P. Hell. On the completeness of a generalized matching problem. In The Tenth Annual ACM Symposium on Theory of Computing, 1978.
Google Scholar
U. Kremer. Automatic Data Layout for Distributed Memory Machines. PhD thesis, Dept. of Computer Science, Rice University, October 1995.
Google Scholar
S. Leung. Array restructuring for cache locality. Technical Report UW-CSE-96-08-01, University of Washington, 1996. PhD Thesis.
Google Scholar
M.E. Mace. Memory storage patterns in parallel processing. Kluwer Academic, Boston, 1987.
Google Scholar
K. S. McKinley, S. Carr, and C.-W. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424–453, July 1996.
Google Scholar
K. O. Thabit. Cache Management by the Compiler. PhD thesis, Dept. of Computer Science, Rice University, 1981.
Google Scholar
M. E. Wolf and M. Lam. A data locality optimizing algorithm. In Proceedings of the SIGPLAN’ 91 Conference on Programming Language Design and Implementation, Toronto, Canada, June 1991.
Google Scholar

Download references

Author information

Authors and Affiliations

Rice University, Houston, TX, 77005, USA
Chen Ding & Ken Kennedy

Authors

Chen Ding
View author publications
You can also search for this author in PubMed Google Scholar
Ken Kennedy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093-0114, USA
Larry Carter & Jeanne Ferrante &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ding, C., Kennedy, K. (2000). Inter-array Data Regrouping. In: Carter, L., Ferrante, J. (eds) Languages and Compilers for Parallel Computing. LCPC 1999. Lecture Notes in Computer Science, vol 1863. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44905-1_10

Download citation

DOI: https://doi.org/10.1007/3-540-44905-1_10
Published: 12 June 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67858-8
Online ISBN: 978-3-540-44905-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics