Adaptive and Speculative Memory Consistency Support for Multi-core Architectures with On-Chip Local Memories

Vujic, Nikola; Alvarez, Lluc; Tallada, Marc Gonzalez; Martorell, Xavier; Ayguadé, Eduard

doi:10.1007/978-3-642-13374-9_15

Nikola Vujic¹⁸,
Lluc Alvarez¹⁸,
Marc Gonzalez Tallada¹⁹,
Xavier Martorell^18,19 &
…
Eduard Ayguadé^18,19

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5898))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

828 Accesses

Abstract

Software cache has been showed as a robust approach in multi-core systems with no hardware support for transparent data transfers between local and global memories. Software cache provides the user with a transparent view of the memory architecture and considerably improves the programmability of such systems. But this software approach can suffer from poor performance due to considerable overheads related to software mechanisms to maintain the memory consistency. This paper presents a set of alternatives to smooth their impact. A specific write-back mechanism is introduced based on some degree of speculation regarding the number of threads actually modifying the same cache lines. A case study based on the Cell BE processor is described. Performance evaluation indicates that improvements due to the optimized software-cache structures combined with the proposed code-optimizations translate into 20% up to 40% speedup factors, compared to a traditional software cache approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kistler, M., et al.: Cell Multiprocessor Communication Network: Built for Speed. IEEE Micro 26(3), 10–23
Google Scholar
Pham, D., et al.: The Design and Implementation of a First-Generation CELL Processor. In: Proceedings of the IEEE International Solid-State Circuits Conference (2005)
Google Scholar
Gschwind, M., et al.: A Novel SIMD Architecture for the CELL Heterogeneous Chip-Multiprocessor. Hot Chips 17 (2005)
Google Scholar
Chen, T., et al.: Optimizing the use of static buffers for DMA on a Cell chip. In: Almási, G.S., Caşcaval, C., Wu, P. (eds.) LCPC 2006. LNCS, vol. 4382, pp. 314–329. Springer, Heidelberg (2006)
Chapter Google Scholar
Eichenberger, A.E., et al.: Using advanced compiler technology to exploit the performance of the Cell Broadband Engine architecture. IBM Sytems Journal 45(1) (2006)
Google Scholar
Chen, T., et al.: Orchestrating Data Transfer for the Cell B.E. processor. In: The Proceedings of the Annual International Conference on Supercomputing, ICS 2008 (2008)
Google Scholar
Gonzalez, M., et al.: Hybrid Access-Specific Software Cache Techniques for the Cell BE Architecture. In: Proceedings of the Seventeenth International Conference on Parallel Architectures and Compilation Techniques, PACT 2008 (2008)
Google Scholar
Vujic, N., et al.: Automatic Pre-Fetch and Modulo Scheduling Transformations for the Cell BE Architecture. In: Amaral, J.N. (ed.) LCPC 2008. LNCS, vol. 5335, pp. 31–46. Springer, Heidelberg (2008)
Chapter Google Scholar
Bailey, D., et al.: The NAS parallel benchmarks, Technical Report TR RNR-91-002, NASA Ames (August 1991)
Google Scholar
Jin, H., et al.: The OpenMP Implementation of the NAS Parallel Benchmarks and its Performance. Technical Report NAS-99-011, NASA Ames Research Center (October 1999)
Google Scholar
Paek, Y., et al.: Efficient and Precise Array Access Analysis. ACM Transactions on Programming Languages and Systems 24(1), 65–109 (2002)
Article Google Scholar
Rugina, R., et al.: Pointer Analysis for Structured Parallel Programs. ACM Transactions on Programming Languages and Systems 25(1) (January 2003)
Google Scholar
Robert, P., et al.: Efficient Context-Sensitive Pointer Analysis. ACM SIGPLAN 30(6) (June 1995)
Google Scholar
Hoeflinger, J., et al.: The OpenMP Memory Model. In: The Proceedings of the First International Workshop on OpenMP
Google Scholar
Altevogt, P., et al.: IBM BladeCenter QS21 Hardware Performance. IBM Technical White Paper WP101245 (2008)
Google Scholar
Chen, T., et al.: Prefetching Irregular References for Software Cache on Cell. In: Proc. of the sixth Annual International Symposium on Code Generation and Optimization
Google Scholar
Shen, Z., et al.: An empirical study of Fortran programs for parallelizing compilers. IEEE Trans. Paral. Distrib. Syst.
Google Scholar

Download references

Author information

Authors and Affiliations

Barcelona Supercomputing Center – Centro Nacional de Supercomputación,
Nikola Vujic, Lluc Alvarez, Xavier Martorell & Eduard Ayguadé
Technical University of Catalonia,
Marc Gonzalez Tallada, Xavier Martorell & Eduard Ayguadé

Authors

Nikola Vujic
View author publications
You can also search for this author in PubMed Google Scholar
Lluc Alvarez
View author publications
You can also search for this author in PubMed Google Scholar
Marc Gonzalez Tallada
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Martorell
View author publications
You can also search for this author in PubMed Google Scholar
Eduard Ayguadé
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, University of Delaware, 19716, Newark, DE, USA
Guang R. Gao & Xiaoming Li &
Department of Computer and Information Sciences, University of Delaware, 19716, Newark, DE, USA
Lori L. Pollock & John Cavazos &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vujic, N., Alvarez, L., Tallada, M.G., Martorell, X., Ayguadé, E. (2010). Adaptive and Speculative Memory Consistency Support for Multi-core Architectures with On-Chip Local Memories. In: Gao, G.R., Pollock, L.L., Cavazos, J., Li, X. (eds) Languages and Compilers for Parallel Computing. LCPC 2009. Lecture Notes in Computer Science, vol 5898. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13374-9_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-13374-9_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13373-2
Online ISBN: 978-3-642-13374-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics