L1 Collective Cache: Managing Shared Data for Chip Multiprocessors

Jiang, Guanjun; Fen, Degui; Tong, Liangliang; Xiang, Lingxiang; Wang, Chao; Chen, Tianzhou

doi:10.1007/978-3-642-03644-6_10

Guanjun Jiang^19,20,
Degui Fen^19,20,
Liangliang Tong^19,20,
Lingxiang Xiang^19,20,
Chao Wang^19,20 &
…
Tianzhou Chen^19,20

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5737))

Included in the following conference series:

International Workshop on Advanced Parallel Processing Technologies

722 Accesses
1 Citations

Abstract

In recent years, with the possible end of further improvements in single processor, more and more researchers shift to the idea of Chip Multiprocessors (CMPs). The burgeoning of multi-thread programs brings on dramatically increased inter-core communication. Unfortunately, traditional architectures fail to meet the challenge, as they conduct such a kind of communication on the last level of on-chip cache or even on the memory.This paper proposes a novel approach, called Collective Cache, to differentiate the access to shared/private data and handle data communication on the first level cache. In the proposed cache architecture, the share data found in the last level cache are moved into the Collective Cache, a L1 cache structure shared by all cores. We show that the mechanism this paper proposed can immensely enhance inter-processors communication, increase the usage efficiency of L1 cache and simplify data consistency protocol. Extensive analysis of this approach with Simics shows that it can reduce the L1 cache miss rate by 3.36%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Monchiero, M., Canal, R., Gonzalez, A.: Design space exploration for multicore architectures: A power/performance/thermal view. In: IEEE conference on supercomputing (June 2006)
Google Scholar
Sinharoy, B., Kalla, R., Tendler, J., Eickemeyer, R., Joyner, J.: Power5 System Microarchitecture. IBM Journal of Research and Development 49(4) (2005)
Google Scholar
Kongetira, P.: A 32-way Multithreaded SPARC? Processor. In: Proceedings of the 16th HotChips Symposium (August 2004)
Google Scholar
Krewell, K.: UltraSPARC IV Mirrors Predecessor. In: Microprocessor. Report, November 2003, pp. 1–3 (2003)
Google Scholar
McNairy, C., Bhatia, R.: Montecito: A Dual-Core Dual-Thread Itanium Processor. IEEE Micro. 25(2), 10–20 (2005)
Article Google Scholar
Chang, J., Sohi, G.S.: Cooperative cache for chip multiprocessors. In: ISCA (2006)
Google Scholar
Srikantaiah, S., Irwin, M.K.M.J.: Adaptive set pinning: Managing shared caches in Chip Multiprocessors. In: ASPLOS 2008 (2008)
Google Scholar
Beckmann, B.M., Marty, M.R., Wood, D.A.: ASR: Adaptive selective replication for CMP caches. In: MICRO (2006)
Google Scholar
Peter, S.: Magnusson: Simics: a full system simulator. IEEE Computer Society, Los Alamitos (2002)
Google Scholar
Martin, M.M.K.: Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. In: Computer Architecture News (September 2005)
Google Scholar
Hammond, L., Nayfeh, B.A., Olukotun, K.: A single-chip multiprocessor. IEEE Computer Society, Los Alamitos (1997)
Google Scholar
Monchiero, M., Canal, R., Gonzalez, A.: Design space exploration for multicore architectures: A power/performance/thermal view. In: IEEE conference on supercomputing (June 2006)
Google Scholar
Leverich, J., Arakida, H., Solomatnikov, A.: Comparing memory systems for chip multiprocessors. In: ISCA (2007)
Google Scholar
Kim, C., Burger, D., Keckler, S.W.: An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In: ASPLOS (2002)
Google Scholar
Cgushti, Z., Powell, M.D., Vijaykumar, T.N.: Distance associativity for high-performance energy-efficient non-uniform cache architectures. In: MICRO (2003)
Google Scholar
Beckmann, B.M., Wood, D.A.: Managing Wire Delay in Large Chip-Multiprocessor Caches. In: Proc. 37th Int’l. Symp. Microarchitecture (MICRO-37) (December)
Google Scholar
Chishti, Z., Powell, M.D., Vijaykumar, T.N.: Optimizing Replication, Communication, and Capacity Allocation in CMPs. In: Proc. 32nd Ann. Int’l. Symp. Computer Architecture (ISCA 2005) (June 2005)
Google Scholar
Liu, C., Sivasubramaniam, A., Kandemir, M.: Organizing the last line of Defense before hitting the memory wall for CMPs. In: 10th HPCA (2004)
Google Scholar
Huh, J., Kim, C.: A NUCA substrate for flexible CMP cache sharing. IEEE transactions on parallel and distributed systems (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science, Zhejiang University, China
Guanjun Jiang, Degui Fen, Liangliang Tong, Lingxiang Xiang, Chao Wang & Tianzhou Chen
Department of Computer Science, Hongkong University, China
Guanjun Jiang, Degui Fen, Liangliang Tong, Lingxiang Xiang, Chao Wang & Tianzhou Chen

Authors

Guanjun Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Degui Fen
View author publications
You can also search for this author in PubMed Google Scholar
Liangliang Tong
View author publications
You can also search for this author in PubMed Google Scholar
Lingxiang Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Chao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tianzhou Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National University of Defense Technology, Department of Computer Science, 410073, Changsha, P.R. China
Yong Dou
Lausanne (EPFL), Ecole Polytechnique Fédérale de ,Dépt. Physique, 1015, LAUSANNE, Switzerland
Ralf Gruber
Technik Rapperswil, HSR - Hochschule für, Oberseestr. 10, 8640, RAPPERSWIL , SCHWEIZ
Josef M. Joller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, G., Fen, D., Tong, L., Xiang, L., Wang, C., Chen, T. (2009). L1 Collective Cache: Managing Shared Data for Chip Multiprocessors. In: Dou, Y., Gruber, R., Joller, J.M. (eds) Advanced Parallel Processing Technologies. APPT 2009. Lecture Notes in Computer Science, vol 5737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03644-6_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-03644-6_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03643-9
Online ISBN: 978-3-642-03644-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics