Skip to main content

Automatic Pre-Fetch and Modulo Scheduling Transformations for the Cell BE Architecture

  • Conference paper
Book cover Languages and Compilers for Parallel Computing (LCPC 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5335))

Abstract

Ease of programming is one of the main impediments for the broad acceptance of multi-core systems with no hardware support for transparent data transfer between local and global memories. Software cache is a robust approach to provide the user with a transparent view of the memory architecture; but this software approach can suffer from poor performance. In this paper, we propose a hierarchical, hybrid software-cache architecture that targets enabling pre-fetch techniques. Memory accesses are classified at compile time in two classes, high-locality and irregular. Our approach then steers the memory references toward one of two specific cache structures optimized for their respective access pattern. The specific cache structures are optimized to enable high-level compiler optimizations to aggressively unroll loops, reorder cache references, and/or transform surrounding loops so as to practically eliminate the software cache overhead in the innermost loop. The cache design enables automatic pre-fetch and modulo scheduling transforma-tions. Performance evaluation indicates that the optimized software-cache structures combined with the proposed pre-fetch techniques translate into speed-up between 10% and 20%. Evaluation is done on a set of parallel NAS applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Peter Hofstee, H.: Power Efficient Processor Architecture and The Cell Processor. In: Proceedings of the 11th Int’l. Symposium on High-Performance Computer Architecture (2005)

    Google Scholar 

  2. Pham, D., et al.: The Design and Implementation of a First-Generation Cell Processor. In: Proceedings the IEEE International Solid-State Circuits Conference (2005)

    Google Scholar 

  3. Kistler, M., et al.: Cell Multiprocessor Communication Network: Built for Speed. IEEE Micro 26(3), 10–23 (2006)

    Article  MathSciNet  Google Scholar 

  4. Gschwind, M., et al.: A Novel SIMD Architecture for the Cell Heterogeneous Chip-Multiprocessor. In: Hot Chips, vol. 17 (2005)

    Google Scholar 

  5. Eichenberger, A.E., et al.: Using advanced compiler technology to exploit the performance of the Cell Broadband Engine architecture. IBM Systems Journal 45(1) (2006)

    Google Scholar 

  6. McCalpin, John, D.: Memory Bandwidth and Machine Balance in Current High Performance Computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) (1995)

    Google Scholar 

  7. Ramakrishna Rau, B., et al.: Code Generation Schema for Modulo Scheduling Loops. In: Proccedings of the 25th Annual International Symposium on Microarchitecture (1992)

    Google Scholar 

  8. Ramakrishna Rau, B., et al.: Iterative Modulo Scheduling: An Algorithm for Software Pipelining Loops. In: Proceedings of the 27th annual International Symposium on Microarchitecture (1994)

    Google Scholar 

  9. Lavery, D.M.: Modulo Scheduling of Loops in Control-intensive Non-numeric Programs. In: Proceedings of the 29th annual ACM/IEEE International Symposium on Microarchitecture (1996)

    Google Scholar 

  10. Bailey, D., et al.: The NAS parallel benchmarks. Technical Report TR RNR-91-002, NASA Ames (August 1991)

    Google Scholar 

  11. Sinharoy, B., et al.: POWER 5 system micro-architecture. IBM Journal of Research and Development 49(4/5) (July/September 2005)

    Google Scholar 

  12. Chen, T., et al.: Prefetching irregular references for software cache on cell. In: Proceedings of the sixth annual IEEE/ACM international symposium on Code Generation and Optimization, pp. 155–164 (2008)

    Google Scholar 

  13. Dasygenis, M., et al.: A Combined DMA and Application-Specific Prefetching Approach for Tackling the Memory Bottleneck. IEEE Transactions on Very Large Integration (VLSI) Systems 14(3), 279–291 (2006)

    Article  Google Scholar 

  14. Chen, T.-F.: An Effective Programmable Prefetch Engine for On-Chip Caches. In: Proceedings of the 28th Annual International Symposium on Microarchitecture (1995)

    Google Scholar 

  15. Batcher, K.W., et al.: Interrupt Triggered Software Prefetching for Embedded CPU Instruction Cache. In: Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vujić, N., Gonzàlez, M., Martorell, X., Ayguadé, E. (2008). Automatic Pre-Fetch and Modulo Scheduling Transformations for the Cell BE Architecture. In: Amaral, J.N. (eds) Languages and Compilers for Parallel Computing. LCPC 2008. Lecture Notes in Computer Science, vol 5335. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89740-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-89740-8_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-89739-2

  • Online ISBN: 978-3-540-89740-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics