Skip to main content

L1 Collective Cache: Managing Shared Data for Chip Multiprocessors

  • Conference paper
Advanced Parallel Processing Technologies (APPT 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5737))

Included in the following conference series:

Abstract

In recent years, with the possible end of further improvements in single processor, more and more researchers shift to the idea of Chip Multiprocessors (CMPs). The burgeoning of multi-thread programs brings on dramatically increased inter-core communication. Unfortunately, traditional architectures fail to meet the challenge, as they conduct such a kind of communication on the last level of on-chip cache or even on the memory.This paper proposes a novel approach, called Collective Cache, to differentiate the access to shared/private data and handle data communication on the first level cache. In the proposed cache architecture, the share data found in the last level cache are moved into the Collective Cache, a L1 cache structure shared by all cores. We show that the mechanism this paper proposed can immensely enhance inter-processors communication, increase the usage efficiency of L1 cache and simplify data consistency protocol. Extensive analysis of this approach with Simics shows that it can reduce the L1 cache miss rate by 3.36%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Monchiero, M., Canal, R., Gonzalez, A.: Design space exploration for multicore architectures: A power/performance/thermal view. In: IEEE conference on supercomputing (June 2006)

    Google Scholar 

  2. Sinharoy, B., Kalla, R., Tendler, J., Eickemeyer, R., Joyner, J.: Power5 System Microarchitecture. IBM Journal of Research and Development 49(4) (2005)

    Google Scholar 

  3. Kongetira, P.: A 32-way Multithreaded SPARC? Processor. In: Proceedings of the 16th HotChips Symposium (August 2004)

    Google Scholar 

  4. Krewell, K.: UltraSPARC IV Mirrors Predecessor. In: Microprocessor. Report, November 2003, pp. 1–3 (2003)

    Google Scholar 

  5. McNairy, C., Bhatia, R.: Montecito: A Dual-Core Dual-Thread Itanium Processor. IEEE Micro. 25(2), 10–20 (2005)

    Article  Google Scholar 

  6. Chang, J., Sohi, G.S.: Cooperative cache for chip multiprocessors. In: ISCA (2006)

    Google Scholar 

  7. Srikantaiah, S., Irwin, M.K.M.J.: Adaptive set pinning: Managing shared caches in Chip Multiprocessors. In: ASPLOS 2008 (2008)

    Google Scholar 

  8. Beckmann, B.M., Marty, M.R., Wood, D.A.: ASR: Adaptive selective replication for CMP caches. In: MICRO (2006)

    Google Scholar 

  9. Peter, S.: Magnusson: Simics: a full system simulator. IEEE Computer Society, Los Alamitos (2002)

    Google Scholar 

  10. Martin, M.M.K.: Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. In: Computer Architecture News (September 2005)

    Google Scholar 

  11. Hammond, L., Nayfeh, B.A., Olukotun, K.: A single-chip multiprocessor. IEEE Computer Society, Los Alamitos (1997)

    Google Scholar 

  12. Monchiero, M., Canal, R., Gonzalez, A.: Design space exploration for multicore architectures: A power/performance/thermal view. In: IEEE conference on supercomputing (June 2006)

    Google Scholar 

  13. Leverich, J., Arakida, H., Solomatnikov, A.: Comparing memory systems for chip multiprocessors. In: ISCA (2007)

    Google Scholar 

  14. Kim, C., Burger, D., Keckler, S.W.: An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In: ASPLOS (2002)

    Google Scholar 

  15. Cgushti, Z., Powell, M.D., Vijaykumar, T.N.: Distance associativity for high-performance energy-efficient non-uniform cache architectures. In: MICRO (2003)

    Google Scholar 

  16. Beckmann, B.M., Wood, D.A.: Managing Wire Delay in Large Chip-Multiprocessor Caches. In: Proc. 37th Int’l. Symp. Microarchitecture (MICRO-37) (December)

    Google Scholar 

  17. Chishti, Z., Powell, M.D., Vijaykumar, T.N.: Optimizing Replication, Communication, and Capacity Allocation in CMPs. In: Proc. 32nd Ann. Int’l. Symp. Computer Architecture (ISCA 2005) (June 2005)

    Google Scholar 

  18. Liu, C., Sivasubramaniam, A., Kandemir, M.: Organizing the last line of Defense before hitting the memory wall for CMPs. In: 10th HPCA (2004)

    Google Scholar 

  19. Huh, J., Kim, C.: A NUCA substrate for flexible CMP cache sharing. IEEE transactions on parallel and distributed systems (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jiang, G., Fen, D., Tong, L., Xiang, L., Wang, C., Chen, T. (2009). L1 Collective Cache: Managing Shared Data for Chip Multiprocessors. In: Dou, Y., Gruber, R., Joller, J.M. (eds) Advanced Parallel Processing Technologies. APPT 2009. Lecture Notes in Computer Science, vol 5737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03644-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03644-6_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03643-9

  • Online ISBN: 978-3-642-03644-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics