Skip to main content

Adaptive and Speculative Memory Consistency Support for Multi-core Architectures with On-Chip Local Memories

  • Conference paper
Languages and Compilers for Parallel Computing (LCPC 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5898))

  • 828 Accesses

Abstract

Software cache has been showed as a robust approach in multi-core systems with no hardware support for transparent data transfers between local and global memories. Software cache provides the user with a transparent view of the memory architecture and considerably improves the programmability of such systems. But this software approach can suffer from poor performance due to considerable overheads related to software mechanisms to maintain the memory consistency. This paper presents a set of alternatives to smooth their impact. A specific write-back mechanism is introduced based on some degree of speculation regarding the number of threads actually modifying the same cache lines. A case study based on the Cell BE processor is described. Performance evaluation indicates that improvements due to the optimized software-cache structures combined with the proposed code-optimizations translate into 20% up to 40% speedup factors, compared to a traditional software cache approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kistler, M., et al.: Cell Multiprocessor Communication Network: Built for Speed. IEEE Micro 26(3), 10–23

    Google Scholar 

  2. Pham, D., et al.: The Design and Implementation of a First-Generation CELL Processor. In: Proceedings of the IEEE International Solid-State Circuits Conference (2005)

    Google Scholar 

  3. Gschwind, M., et al.: A Novel SIMD Architecture for the CELL Heterogeneous Chip-Multiprocessor. Hot Chips 17 (2005)

    Google Scholar 

  4. Chen, T., et al.: Optimizing the use of static buffers for DMA on a Cell chip. In: Almási, G.S., Caşcaval, C., Wu, P. (eds.) LCPC 2006. LNCS, vol. 4382, pp. 314–329. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  5. Eichenberger, A.E., et al.: Using advanced compiler technology to exploit the performance of the Cell Broadband Engine architecture. IBM Sytems Journal 45(1) (2006)

    Google Scholar 

  6. Chen, T., et al.: Orchestrating Data Transfer for the Cell B.E. processor. In: The Proceedings of the Annual International Conference on Supercomputing, ICS 2008 (2008)

    Google Scholar 

  7. Gonzalez, M., et al.: Hybrid Access-Specific Software Cache Techniques for the Cell BE Architecture. In: Proceedings of the Seventeenth International Conference on Parallel Architectures and Compilation Techniques, PACT 2008 (2008)

    Google Scholar 

  8. Vujic, N., et al.: Automatic Pre-Fetch and Modulo Scheduling Transformations for the Cell BE Architecture. In: Amaral, J.N. (ed.) LCPC 2008. LNCS, vol. 5335, pp. 31–46. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  9. Bailey, D., et al.: The NAS parallel benchmarks, Technical Report TR RNR-91-002, NASA Ames (August 1991)

    Google Scholar 

  10. Jin, H., et al.: The OpenMP Implementation of the NAS Parallel Benchmarks and its Performance. Technical Report NAS-99-011, NASA Ames Research Center (October 1999)

    Google Scholar 

  11. Paek, Y., et al.: Efficient and Precise Array Access Analysis. ACM Transactions on Programming Languages and Systems 24(1), 65–109 (2002)

    Article  Google Scholar 

  12. Rugina, R., et al.: Pointer Analysis for Structured Parallel Programs. ACM Transactions on Programming Languages and Systems 25(1) (January 2003)

    Google Scholar 

  13. Robert, P., et al.: Efficient Context-Sensitive Pointer Analysis. ACM SIGPLAN 30(6) (June 1995)

    Google Scholar 

  14. Hoeflinger, J., et al.: The OpenMP Memory Model. In: The Proceedings of the First International Workshop on OpenMP

    Google Scholar 

  15. Altevogt, P., et al.: IBM BladeCenter QS21 Hardware Performance. IBM Technical White Paper WP101245 (2008)

    Google Scholar 

  16. Chen, T., et al.: Prefetching Irregular References for Software Cache on Cell. In: Proc. of the sixth Annual International Symposium on Code Generation and Optimization

    Google Scholar 

  17. Shen, Z., et al.: An empirical study of Fortran programs for parallelizing compilers. IEEE Trans. Paral. Distrib. Syst.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vujic, N., Alvarez, L., Tallada, M.G., Martorell, X., Ayguadé, E. (2010). Adaptive and Speculative Memory Consistency Support for Multi-core Architectures with On-Chip Local Memories. In: Gao, G.R., Pollock, L.L., Cavazos, J., Li, X. (eds) Languages and Compilers for Parallel Computing. LCPC 2009. Lecture Notes in Computer Science, vol 5898. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13374-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13374-9_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13373-2

  • Online ISBN: 978-3-642-13374-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics