Skip to main content

Inter-array Data Regrouping

  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 1999)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1863))

Abstract

As the speed gap between CPU and memory widens, memory hierarchy has become the performance bottleneck for most applications because of both the high latency and low bandwidth of direct memory access. With the recent introduction of latency hiding strategies on modern machines, limited memory bandwidth has become the primary performance constraint and, consequently, the effective use of available memory bandwidth has become critical. Since memory data is transferred one cache block at a time, improving the utilization of cache blocks can directly improve memory bandwidth utilization and program performance. However, existing optimizations do not maximize cache-block utilization because they are intra-array; that is, they improve only data reuse within single arrays, and they do not group useful data of multiple arrays into the same cache block. In this paper, we present inter-array data regrouping, a global data transformation that first splits and then selectively regroups all data arrays in a program. The new transformation is optimal in the sense that it exploits inter-array cache-block reuse when and only when it is always profitable. When evaluated on real-world programs with both regular contiguous data access, and irregular and dynamic data access, inter-array data regrouping transforms as many as 26 arrays in a program and improves the overall performance by as much as 32%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. Anderson, S. Amarasinghe, and M. Lam. Data and computation transformation for multiprocessors. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Santa Barbara, CA, July 1995.

    Google Scholar 

  2. O. Beckmann and P.H.J. Kelly. Efficient interprocedural data placement optimisation in a parallel library. In Proceedings of the Fourth Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers, May 1998.

    Google Scholar 

  3. B. Calder, K. Chandra, S. John, and T. Austin. Cache-conscious data placement. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), San Jose, Oct 1998.

    Google Scholar 

  4. T.M. Chilimbi, B. Davidson, and J.R. Larus. Cache-conscious structure definition. In Proceedings of SIGPLAN Conference on Programming Language Design and Implementation, 1999.

    Google Scholar 

  5. M. Cierniak and W. Li. Unifying data and control transformations for distributed shared-memory machines. In Proceedings of the SIGPLAN’ 95 Conference on Programming Language Design and Implementation, La Jolla, June 1995.

    Google Scholar 

  6. C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the SIGPLAN’ 99 Conference on Programming Language Design and Implementation, Atlanta, GA, May 1999.

    Google Scholar 

  7. C. Ding and K. Kennedy. Memory bandwidth bottleneck and its amelioration by a compiler. Technical report, Rice University, May 1999. Submitted for publication.

    Google Scholar 

  8. J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing, Fourth International Workshop, Santa Clara, CA, August 1991. Springer-Verlag.

    Google Scholar 

  9. D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformations. In Proceedings of the First International Conference on Supercomputing. Springer-Verlag, Athens, Greece, June 1987.

    Google Scholar 

  10. Tor E. Jeremiassen and Susan J. Eggers. Reducing false sharing on shared memory multiprocessors through compile time data transformations. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 179–188, Santa Barbara, CA, July 1995.

    Google Scholar 

  11. M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee. A matrix-based approach to the global locality optimization problem. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, 1998.

    Google Scholar 

  12. D. G. Kirkpatrick and P. Hell. On the completeness of a generalized matching problem. In The Tenth Annual ACM Symposium on Theory of Computing, 1978.

    Google Scholar 

  13. U. Kremer. Automatic Data Layout for Distributed Memory Machines. PhD thesis, Dept. of Computer Science, Rice University, October 1995.

    Google Scholar 

  14. S. Leung. Array restructuring for cache locality. Technical Report UW-CSE-96-08-01, University of Washington, 1996. PhD Thesis.

    Google Scholar 

  15. M.E. Mace. Memory storage patterns in parallel processing. Kluwer Academic, Boston, 1987.

    Google Scholar 

  16. K. S. McKinley, S. Carr, and C.-W. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424–453, July 1996.

    Google Scholar 

  17. K. O. Thabit. Cache Management by the Compiler. PhD thesis, Dept. of Computer Science, Rice University, 1981.

    Google Scholar 

  18. M. E. Wolf and M. Lam. A data locality optimizing algorithm. In Proceedings of the SIGPLAN’ 91 Conference on Programming Language Design and Implementation, Toronto, Canada, June 1991.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ding, C., Kennedy, K. (2000). Inter-array Data Regrouping. In: Carter, L., Ferrante, J. (eds) Languages and Compilers for Parallel Computing. LCPC 1999. Lecture Notes in Computer Science, vol 1863. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44905-1_10

Download citation

  • DOI: https://doi.org/10.1007/3-540-44905-1_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67858-8

  • Online ISBN: 978-3-540-44905-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics