skip to main content
10.1145/1882453.1882458acmotherconferencesArticle/Chapter ViewAbstractPublication PagesifmtConference Proceedingsconference-collections
research-article

An adaptive cache coherence protocol for chip multiprocessors

Published: 19 June 2010 Publication History

Abstract

Multi-core architectures also referred to as Chip Multiprocessors (CMPs) have emerged as the dominant architecture for both desktop and high-performance systems. CMPs introduce many challenges that need to be addressed to achieve the best performance. One of the big challenges comes with the shared-memory model observed in such architectures which is the cache coherence overhead problem. Contemporary architectures employ write-invalidate based protocols which are known to generate coherence misses that yield to latency issues. On the other hand, write-update based protocols can solve the coherence misses problem but they tend to generate excessive network traffic which is especially not desirable for CMPs. Previous studies have shown that a single protocol approach is not sufficient for many sharing patterns. As a solution, this paper evaluates an adaptive protocol which targets write-update optimizations for producer-consumer sharing patterns. This work targets a minimalistic hardware extension approach to test the benefits of such adaptive protocols in a practical environment. Experimental study is conducted on a 16-core CMP by using a full-system simulator with selected scientific applications from SPLASH-2 and NAS parallel benchmark suites. Results show up to 40% improvement for coherence misses which corresponds to 15% application speedup.

References

[1]
NAS Parallel Benchmarks, http://www.nas.nasa.gov/resources/software/npb.html.
[2]
NAS Parallel Benchmarks, openmp version developed by omni group, http://www.hpcs.cs.tsukuba.ac.jp/omni-openmp.
[3]
Teraflops research chip, http://techresearch.intel.com/articles/tera-scale/1449.htm.
[4]
Tile-gx100, a 100-core microprocessor from Tilera corporation, http://www.tilera.com.
[5]
H. Abdel-Shafi, J. Hall, S. V. Adve, and V. S. Adve. An evaluation of fine-grain producer-initiated communication in cache-coherent multiprocessors. In International Symposium on High-Performance Computer Architecture (HPCA), pages 204--, 1997.
[6]
M. Acacio, J. Gonzalez, J. Garcia, and J. Duato. A novel approach to reduce L2 miss latency in shared-memory multiprocessors. In IPDPS '02: Proceedings of the International Parallel and Distributed Processing Symposium, pages 62--69, 2002.
[7]
M. Acacio, J. González, J. García, and J. Duato. Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture. In Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pages 1--12. IEEE Computer Society Press Los Alamitos, CA, USA, 2002.
[8]
M. E. Acacio, J. González, J. M. García, and J. Duato. The use of prediction for accelerating upgrade misses in cc-NUMA multiprocessors. In IEEE PACT, pages 155--164. IEEE Computer Society, 2002.
[9]
A. Agarwal and M. Levy. The kill rule for multicore. In DAC '07: Proceedings of the 44th annual conference on Design automation, pages 750--753. IEEE, 2007.
[10]
S. R. Alam, R. F. Barrett, J. A. Kuehn, P. C. Roth, and J. S. Vetter. Characterization of scientific workloads on systems with multi-core processors. In IISWC, pages 225--236. IEEE, 2006.
[11]
A. R. Alameldeen, M. M. K. Martin, C. J. Mauer, K. E. Moore, M. Xu, M. D. Hill, D. A. Wood, and D. J. Sorin. Simulating a $2m commercial server on a $2k pc. IEEE Computer, 36(2):50--57, 2003.
[12]
C. Anderson and A. R. Karlin. Two adaptive hybrid cache coherency protocols. In International Symposium on High-Performance Computer Architecture (HPCA), pages 303--313, 1996.
[13]
S. Balakrishnan, R. Rajwar, M. Upton, and K. Lai. The impact of performance asymmetry in emerging multicore architectures. In ISCA '05: Proceedings of the 32nd annual international symposium on Computer Architecture, pages 506--517, Washington, DC, USA, 2005. IEEE Computer Society.
[14]
E. E. Bilir, R. M. Dickson, Y. Hu, M. Plakal, D. J. Sorin, M. D. Hill, and D. A. Wood. Multicast snooping: A new coherence method using a multicast address network. In ISCA '99: Proceedings of the 26th annual international symposium on Computer architecture, pages 294--304, 1999.
[15]
G. Byrd and M. Flynn. Producer-consumer communication in distributed shared memory multiprocessors. Proceedings of the IEEE, 87(3):456--466, Mar 1999.
[16]
L. Cheng and J. B. Carter. Extending cc-numa systems to support write update optimizations. In SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, page 30. IEEE/ACM, 2008.
[17]
L. Cheng, J. B. Carter, and D. Dai. An adaptive cache coherence protocol optimized for producer-consumer sharing. In International Symposium on High-Performance Computer Architecture (HPCA), pages 328--339. IEEE Computer Society, 2007.
[18]
L. Cheng, N. Muralimanohar, K. Ramani, R. Balasubramonian, and J. B. Carter. Interconnect-aware coherence protocols for chip multiprocessors. In ISCA '06: Proceedings of the 33rd annual international symposium on Computer Architecture, pages 339--351, Washington, DC, USA, 2006. IEEE Computer Society.
[19]
M. Chu, R. Ravindran, and S. Mahlke. Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures. In MICRO '07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pages 369--380, Washington, DC, USA, 2007. IEEE Computer Society.
[20]
A. L. Cox and R. J. Fowler. Adaptive cache coherency for detecting migratory shared data. In International Symposium on Computer Architecture (ISCA), pages 98--108, 1993.
[21]
F. Dahlgren. Boosting the performance of hybrid snooping cache protocols. In ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture, pages 60--69, New York, NY, USA, 1995. ACM.
[22]
F. Dahlgren and P. Stenström. Reducing the write traffic for a hybrid cache protocol. In International Conference on Parallel Processing (ICPP), pages 166--173, 1994.
[23]
N. Eisley, L.-S. Peh, and L. Shang. In-network cache coherence. In MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 321--332, Washington, DC, USA, 2006. IEEE Computer Society.
[24]
D. Geer. Industry trends: Chip makers turn to multicore processors. IEEE Computer, 38(5):11--13, 2005.
[25]
D. Ghosh, J. B. Carter, and H. D. III. Perceptron-based coherence predictors. In Proc. of 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects (CMP-MSI), in conjunction with ISCA 2008.
[26]
P. F. Gorder. Multicore processors for science and engineering. Computing in Science and Eng., 9(2):3--7, 2007.
[27]
H. K. Grahn and P. Stenström. Evaluation of a competitive-update cache coherence protocol with migratory data detection. Journal of Parallel and Distributed Computing, 39:39--2, 1996.
[28]
D. Gustavson. The scalable coherent interface and related standards projects. Micro, IEEE, 12(1):10--22, Feb 1992.
[29]
A. R. Karlin, M. S. Manasse, L. Rudolph, and D. D. Sleator. Competitive snoopy caching. Algorithmica, 3:77--119, 1988.
[30]
S. Kaxiras and J. R. Goodman. Improving cc-NUMA performance using instruction-based prediction. In International Symposium on High-Performance Computer Architecture (HPCA), pages 161--, 1999.
[31]
S. Kaxiras and C. Young. Coherence communication prediction in shared-memory multiprocessors. In International Symposium on High-Performance Computer Architecture (HPCA), pages 156--167, 2000.
[32]
A. Kayi, E. Kornkven, T. El-Ghazawi, S. Al-Bahra, and G. B. Newby. Performance analysis and tuning for clusters with ccnuma nodes for scientific computing - a case study. International Journal of Computer Systems Science and Engineering, 24(5), September 2009.
[33]
D. Koufaty, X. Chen, D. K. Poulsen, and J. Torrellas. Data forwarding in scalable shared-memory multiprocessors. In International Conference on Supercomputing (ICS), pages 255--264, 1995.
[34]
A.-C. Lai and B. Falsafi. Memory sharing predictor: The key to a speculative coherent dsm. In ISCA '99: Proceedings of the 26th annual international symposium on Computer architecture, pages 172--183, 1999.
[35]
A.-C. Lai and B. Falsafi. Selective, accurate, and timely self-invalidation using last-touch prediction. In International Symposium on Computer Architecture (ISCA), pages 139--148, 2000.
[36]
A. R. Lebeck and D. A. Wood. Dynamic self-invalidation: Reducing coherence overhead in shared-memory multiprocessors. In ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture, pages 48--59, 1995.
[37]
S. Leventhal and M. Franklin. Perceptron based consumer prediction in shared-memory multiprocessors. In ICCD 2006: International Conference on Computer Design, pages 148--154, Oct. 2006.
[38]
P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hållberg, J. Högberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. IEEE Computer, 35(2):50--58, 2002.
[39]
M. M. K. Martin. Formal verification and its impact on the snooping versus directory protocol debate. In ICCD 2005: International Conference on Computer Design, pages 543--449. IEEE Computer Society, 2005.
[40]
M. M. K. Martin, P. J. Harper, D. J. Sorin, M. D. Hill, and D. A. Wood. Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors. In International Symposium on Computer Architecture (ISCA), pages 206--217. IEEE Computer Society, 2003.
[41]
M. M. K. Martin, D. J. Sorin, A. Ailamaki, A. R. Alameldeen, R. M. Dickson, C. J. Mauer, K. E. Moore, M. Plakal, M. D. Hill, and D. A. Wood. Timestamp snooping: an approach for extending smps. In International conference on Architectural support for programming languages and operating systems (ASPLOS), pages 25--36, 2000.
[42]
M. M. K. Martin, D. J. Sorin, M. D. Hill, and D. A. Wood. Bandwidth adaptive snooping. In International Symposium on High-Performance Computer Architecture (HPCA), pages 251--262, 2002.
[43]
M. R. Marty, J. D. Bingham, M. D. Hill, A. J. Hu, M. M. K. Martin, and D. A. Wood. Improving multiple-cmp systems using token coherence. In ISCA '05: Proceedings of the 32nd annual international symposium on Computer Architecture, pages 328--339. IEEE Computer Society, 2005.
[44]
S. S. Mukherjee and M. D. Hill. Using prediction to accelerate coherence protocols. In International Symposium on Computer Architecture (ISCA), pages 179--190, 1998.
[45]
H. Nilsson and P. Stenström. An adaptive update-based cache coherence protocol for reduction of miss rate and traffic. In Proc. Parallel Architectures and Languages Europe (PARLE) Conf., Athens, Greece (Lecture Notes in Computer Science, 817, pages 363--374. Springer-Verlag, 1994.
[46]
J. Nilsson and F. Dahlgren. Improving performance of load-store sequences for transaction processing workloads on multiprocessors. In International Conference on Parallel Processing (ICPP), pages 246--, 1999.
[47]
J. Nilsson and F. Dahlgren. Reducing ownership overhead for Load-Store sequences in cache-coherent multiprocessors. In IPDPS '00: Proceedings of the International Parallel and Distributed Processing Symposium, pages 684--692. IEEE Computer Society, 2000.
[48]
J. Nilsson, A. Landin, and P. Stenström. The coherence predictor cache: A resource-efficient and accurate coherence prediction infrastructure. In IPDPS '03: Proceedings of the International Parallel and Distributed Processing Symposium, page 10. IEEE Computer Society, 2003.
[49]
K. Olukotun and L. Hammond. The future of microprocessors. Queue, 3(7):26--29, 2005.
[50]
A. Raynaud, Z. Zhang, and J. Torrellas. Distance-adaptive update protocols for scalable shared-memory multiprocessors. In HPCA '96: Proceedings of the Second International Symposium on High-Performance Computer Architecture, pages 323--334, Feb 1996.
[51]
P. Stenström, M. Brorsson, and L. Sandberg. An adaptive cache coherence protocol optimized for migratory sharing. In International Symposium on Computer Architecture (ISCA), pages 109--118, 1993.
[52]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and methodological considerations. In ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture, pages 24--36, 1995.
[53]
T.-Y. Yeh and Y. N. Patt. Alternative implementations of two-level adaptive branch prediction. In International Symposium on Computer Architecture (ISCA), pages 124--134, 1992.

Cited By

View all
  • (2021)A Novel Hybrid Cache Coherence with Global Snooping for Many-core ArchitecturesACM Transactions on Design Automation of Electronic Systems10.1145/346277527:1(1-31)Online publication date: 13-Sep-2021
  • (2021)Stream Floating: Enabling Proactive and Decentralized Cache Optimizations2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00060(640-653)Online publication date: Feb-2021
  • (2021)Analytical Modeling the Multi-Core Shared Cache Behavior With Considerations of Data-Sharing and CoherenceIEEE Access10.1109/ACCESS.2021.30533509(17728-17743)Online publication date: 2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IFMT '10: Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
June 2010
91 pages
ISBN:9781450300087
DOI:10.1145/1882453
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • E-JUST: Egypt-Japan University of Science and Technology
  • CAS: IBM Centers for Advanced Studies
  • INRIA: Institut Natl de Recherche en Info et en Automatique

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cache coherence protocols
  2. chip-multiprocessors
  3. directory-based cache coherence
  4. multi-core architectures

Qualifiers

  • Research-article

Conference

IFMT '10
Sponsor:
  • E-JUST
  • CAS
  • INRIA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)1
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)A Novel Hybrid Cache Coherence with Global Snooping for Many-core ArchitecturesACM Transactions on Design Automation of Electronic Systems10.1145/346277527:1(1-31)Online publication date: 13-Sep-2021
  • (2021)Stream Floating: Enabling Proactive and Decentralized Cache Optimizations2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00060(640-653)Online publication date: Feb-2021
  • (2021)Analytical Modeling the Multi-Core Shared Cache Behavior With Considerations of Data-Sharing and CoherenceIEEE Access10.1109/ACCESS.2021.30533509(17728-17743)Online publication date: 2021
  • (2015)Automatic sharing classification and timely push for cache-coherent systemsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/2807591.2807649(1-12)Online publication date: 15-Nov-2015
  • (2014)Leveraging on-chip networks for efficient prediction on multicore coherenceProceedings of the conference on Design, Automation & Test in Europe10.5555/2616606.2616825(1-4)Online publication date: 24-Mar-2014
  • (2014)Integrated Coherence PredictionACM Transactions on Design Automation of Electronic Systems10.1145/261175619:3(1-22)Online publication date: 23-Jun-2014
  • (2011)PEPCPProceedings of the 2011 International Conference on Parallel Processing10.1109/ICPP.2011.34(63-72)Online publication date: 13-Sep-2011

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media