research-article

An adaptive cache coherence protocol for chip multiprocessors

Authors:

Tarek El-GhazawiAuthors Info & Claims

IFMT '10: Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies

Article No.: 4, Pages 1 - 10

https://doi.org/10.1145/1882453.1882458

Published: 19 June 2010 Publication History

Abstract

Multi-core architectures also referred to as Chip Multiprocessors (CMPs) have emerged as the dominant architecture for both desktop and high-performance systems. CMPs introduce many challenges that need to be addressed to achieve the best performance. One of the big challenges comes with the shared-memory model observed in such architectures which is the cache coherence overhead problem. Contemporary architectures employ write-invalidate based protocols which are known to generate coherence misses that yield to latency issues. On the other hand, write-update based protocols can solve the coherence misses problem but they tend to generate excessive network traffic which is especially not desirable for CMPs. Previous studies have shown that a single protocol approach is not sufficient for many sharing patterns. As a solution, this paper evaluates an adaptive protocol which targets write-update optimizations for producer-consumer sharing patterns. This work targets a minimalistic hardware extension approach to test the benefits of such adaptive protocols in a practical environment. Experimental study is conducted on a 16-core CMP by using a full-system simulator with selected scientific applications from SPLASH-2 and NAS parallel benchmark suites. Results show up to 40% improvement for coherence misses which corresponds to 15% application speedup.

References

[1]

NAS Parallel Benchmarks, http://www.nas.nasa.gov/resources/software/npb.html.

[2]

NAS Parallel Benchmarks, openmp version developed by omni group, http://www.hpcs.cs.tsukuba.ac.jp/omni-openmp.

[3]

Teraflops research chip, http://techresearch.intel.com/articles/tera-scale/1449.htm.

[4]

Tile-gx100, a 100-core microprocessor from Tilera corporation, http://www.tilera.com.

[5]

H. Abdel-Shafi, J. Hall, S. V. Adve, and V. S. Adve. An evaluation of fine-grain producer-initiated communication in cache-coherent multiprocessors. In International Symposium on High-Performance Computer Architecture (HPCA), pages 204--, 1997.

Digital Library

[6]

M. Acacio, J. Gonzalez, J. Garcia, and J. Duato. A novel approach to reduce L2 miss latency in shared-memory multiprocessors. In IPDPS '02: Proceedings of the International Parallel and Distributed Processing Symposium, pages 62--69, 2002.

Digital Library

[7]

M. Acacio, J. González, J. García, and J. Duato. Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture. In Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pages 1--12. IEEE Computer Society Press Los Alamitos, CA, USA, 2002.

Digital Library

[8]

M. E. Acacio, J. González, J. M. García, and J. Duato. The use of prediction for accelerating upgrade misses in cc-NUMA multiprocessors. In IEEE PACT, pages 155--164. IEEE Computer Society, 2002.

Digital Library

[9]

A. Agarwal and M. Levy. The kill rule for multicore. In DAC '07: Proceedings of the 44th annual conference on Design automation, pages 750--753. IEEE, 2007.

Digital Library

[10]

S. R. Alam, R. F. Barrett, J. A. Kuehn, P. C. Roth, and J. S. Vetter. Characterization of scientific workloads on systems with multi-core processors. In IISWC, pages 225--236. IEEE, 2006.

[11]

A. R. Alameldeen, M. M. K. Martin, C. J. Mauer, K. E. Moore, M. Xu, M. D. Hill, D. A. Wood, and D. J. Sorin. Simulating a $2m commercial server on a $2k pc. IEEE Computer, 36(2):50--57, 2003.

Digital Library

[12]

C. Anderson and A. R. Karlin. Two adaptive hybrid cache coherency protocols. In International Symposium on High-Performance Computer Architecture (HPCA), pages 303--313, 1996.

Digital Library

[13]

S. Balakrishnan, R. Rajwar, M. Upton, and K. Lai. The impact of performance asymmetry in emerging multicore architectures. In ISCA '05: Proceedings of the 32nd annual international symposium on Computer Architecture, pages 506--517, Washington, DC, USA, 2005. IEEE Computer Society.

Digital Library

[14]

E. E. Bilir, R. M. Dickson, Y. Hu, M. Plakal, D. J. Sorin, M. D. Hill, and D. A. Wood. Multicast snooping: A new coherence method using a multicast address network. In ISCA '99: Proceedings of the 26th annual international symposium on Computer architecture, pages 294--304, 1999.

Digital Library

[15]

G. Byrd and M. Flynn. Producer-consumer communication in distributed shared memory multiprocessors. Proceedings of the IEEE, 87(3):456--466, Mar 1999.

[16]

L. Cheng and J. B. Carter. Extending cc-numa systems to support write update optimizations. In SC '08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, page 30. IEEE/ACM, 2008.

Digital Library

[17]

L. Cheng, J. B. Carter, and D. Dai. An adaptive cache coherence protocol optimized for producer-consumer sharing. In International Symposium on High-Performance Computer Architecture (HPCA), pages 328--339. IEEE Computer Society, 2007.

Digital Library

[18]

L. Cheng, N. Muralimanohar, K. Ramani, R. Balasubramonian, and J. B. Carter. Interconnect-aware coherence protocols for chip multiprocessors. In ISCA '06: Proceedings of the 33rd annual international symposium on Computer Architecture, pages 339--351, Washington, DC, USA, 2006. IEEE Computer Society.

Digital Library

[19]

M. Chu, R. Ravindran, and S. Mahlke. Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures. In MICRO '07: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pages 369--380, Washington, DC, USA, 2007. IEEE Computer Society.

Digital Library

[20]

A. L. Cox and R. J. Fowler. Adaptive cache coherency for detecting migratory shared data. In International Symposium on Computer Architecture (ISCA), pages 98--108, 1993.

Digital Library

[21]

F. Dahlgren. Boosting the performance of hybrid snooping cache protocols. In ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture, pages 60--69, New York, NY, USA, 1995. ACM.

Digital Library

[22]

F. Dahlgren and P. Stenström. Reducing the write traffic for a hybrid cache protocol. In International Conference on Parallel Processing (ICPP), pages 166--173, 1994.

Digital Library

[23]

N. Eisley, L.-S. Peh, and L. Shang. In-network cache coherence. In MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 321--332, Washington, DC, USA, 2006. IEEE Computer Society.

Digital Library

[24]

D. Geer. Industry trends: Chip makers turn to multicore processors. IEEE Computer, 38(5):11--13, 2005.

Digital Library

[25]

D. Ghosh, J. B. Carter, and H. D. III. Perceptron-based coherence predictors. In Proc. of 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects (CMP-MSI), in conjunction with ISCA 2008.

[26]

P. F. Gorder. Multicore processors for science and engineering. Computing in Science and Eng., 9(2):3--7, 2007.

Digital Library

[27]

H. K. Grahn and P. Stenström. Evaluation of a competitive-update cache coherence protocol with migratory data detection. Journal of Parallel and Distributed Computing, 39:39--2, 1996.

Digital Library

[28]

D. Gustavson. The scalable coherent interface and related standards projects. Micro, IEEE, 12(1):10--22, Feb 1992.

Digital Library

[29]

A. R. Karlin, M. S. Manasse, L. Rudolph, and D. D. Sleator. Competitive snoopy caching. Algorithmica, 3:77--119, 1988.

Digital Library

[30]

S. Kaxiras and J. R. Goodman. Improving cc-NUMA performance using instruction-based prediction. In International Symposium on High-Performance Computer Architecture (HPCA), pages 161--, 1999.

Digital Library

[31]

S. Kaxiras and C. Young. Coherence communication prediction in shared-memory multiprocessors. In International Symposium on High-Performance Computer Architecture (HPCA), pages 156--167, 2000.

[32]

A. Kayi, E. Kornkven, T. El-Ghazawi, S. Al-Bahra, and G. B. Newby. Performance analysis and tuning for clusters with ccnuma nodes for scientific computing - a case study. International Journal of Computer Systems Science and Engineering, 24(5), September 2009.

[33]

D. Koufaty, X. Chen, D. K. Poulsen, and J. Torrellas. Data forwarding in scalable shared-memory multiprocessors. In International Conference on Supercomputing (ICS), pages 255--264, 1995.

Digital Library

[34]

A.-C. Lai and B. Falsafi. Memory sharing predictor: The key to a speculative coherent dsm. In ISCA '99: Proceedings of the 26th annual international symposium on Computer architecture, pages 172--183, 1999.

Digital Library

[35]

A.-C. Lai and B. Falsafi. Selective, accurate, and timely self-invalidation using last-touch prediction. In International Symposium on Computer Architecture (ISCA), pages 139--148, 2000.

Digital Library

[36]

A. R. Lebeck and D. A. Wood. Dynamic self-invalidation: Reducing coherence overhead in shared-memory multiprocessors. In ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture, pages 48--59, 1995.

Digital Library

[37]

S. Leventhal and M. Franklin. Perceptron based consumer prediction in shared-memory multiprocessors. In ICCD 2006: International Conference on Computer Design, pages 148--154, Oct. 2006.

[38]

P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hållberg, J. Högberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. IEEE Computer, 35(2):50--58, 2002.

Digital Library

[39]

M. M. K. Martin. Formal verification and its impact on the snooping versus directory protocol debate. In ICCD 2005: International Conference on Computer Design, pages 543--449. IEEE Computer Society, 2005.

Digital Library

[40]

M. M. K. Martin, P. J. Harper, D. J. Sorin, M. D. Hill, and D. A. Wood. Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors. In International Symposium on Computer Architecture (ISCA), pages 206--217. IEEE Computer Society, 2003.

Digital Library

[41]

M. M. K. Martin, D. J. Sorin, A. Ailamaki, A. R. Alameldeen, R. M. Dickson, C. J. Mauer, K. E. Moore, M. Plakal, M. D. Hill, and D. A. Wood. Timestamp snooping: an approach for extending smps. In International conference on Architectural support for programming languages and operating systems (ASPLOS), pages 25--36, 2000.

Digital Library

[42]

M. M. K. Martin, D. J. Sorin, M. D. Hill, and D. A. Wood. Bandwidth adaptive snooping. In International Symposium on High-Performance Computer Architecture (HPCA), pages 251--262, 2002.

Digital Library

[43]

M. R. Marty, J. D. Bingham, M. D. Hill, A. J. Hu, M. M. K. Martin, and D. A. Wood. Improving multiple-cmp systems using token coherence. In ISCA '05: Proceedings of the 32nd annual international symposium on Computer Architecture, pages 328--339. IEEE Computer Society, 2005.

Digital Library

[44]

S. S. Mukherjee and M. D. Hill. Using prediction to accelerate coherence protocols. In International Symposium on Computer Architecture (ISCA), pages 179--190, 1998.

Digital Library

[45]

H. Nilsson and P. Stenström. An adaptive update-based cache coherence protocol for reduction of miss rate and traffic. In Proc. Parallel Architectures and Languages Europe (PARLE) Conf., Athens, Greece (Lecture Notes in Computer Science, 817, pages 363--374. Springer-Verlag, 1994.

Digital Library

[46]

J. Nilsson and F. Dahlgren. Improving performance of load-store sequences for transaction processing workloads on multiprocessors. In International Conference on Parallel Processing (ICPP), pages 246--, 1999.

Digital Library

[47]

J. Nilsson and F. Dahlgren. Reducing ownership overhead for Load-Store sequences in cache-coherent multiprocessors. In IPDPS '00: Proceedings of the International Parallel and Distributed Processing Symposium, pages 684--692. IEEE Computer Society, 2000.

Digital Library

[48]

J. Nilsson, A. Landin, and P. Stenström. The coherence predictor cache: A resource-efficient and accurate coherence prediction infrastructure. In IPDPS '03: Proceedings of the International Parallel and Distributed Processing Symposium, page 10. IEEE Computer Society, 2003.

Digital Library

[49]

K. Olukotun and L. Hammond. The future of microprocessors. Queue, 3(7):26--29, 2005.

Digital Library

[50]

A. Raynaud, Z. Zhang, and J. Torrellas. Distance-adaptive update protocols for scalable shared-memory multiprocessors. In HPCA '96: Proceedings of the Second International Symposium on High-Performance Computer Architecture, pages 323--334, Feb 1996.

Digital Library

[51]

P. Stenström, M. Brorsson, and L. Sandberg. An adaptive cache coherence protocol optimized for migratory sharing. In International Symposium on Computer Architecture (ISCA), pages 109--118, 1993.

Digital Library

[52]

S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and methodological considerations. In ISCA '95: Proceedings of the 22nd annual international symposium on Computer architecture, pages 24--36, 1995.

Digital Library

[53]

T.-Y. Yeh and Y. N. Patt. Alternative implementations of two-level adaptive branch prediction. In International Symposium on Computer Architecture (ISCA), pages 124--134, 1992.

Digital Library

Cited By

Gade SDeb S(2021)A Novel Hybrid Cache Coherence with Global Snooping for Many-core ArchitecturesACM Transactions on Design Automation of Electronic Systems10.1145/346277527:1(1-31)Online publication date: 13-Sep-2021
https://dl.acm.org/doi/10.1145/3462775
Wang ZWeng JLowe-Power JGaur JNowatzki T(2021)Stream Floating: Enabling Proactive and Decentralized Cache Optimizations2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00060(640-653)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00060
Ling MLu XWang GGe J(2021)Analytical Modeling the Multi-Core Shared Cache Behavior With Considerations of Data-Sharing and CoherenceIEEE Access10.1109/ACCESS.2021.30533509(17728-17743)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3053350
Show More Cited By

Recommendations

Cache vulnerability mitigation using an adaptive cache coherence protocol

This paper proposes an adaptive cache coherence protocol to improve the reliability of caches against soft errors in shared-memory multi-core processors. The proposed protocol is conducted based-on a comprehensive study and analysis intended to ...
Bandwidth Adaptive Cache Coherence Optimizations for Chip Multiprocessors

Chip Multiprocessors (CMPs) have different technological parameters and physical constraints than earlier multi-processor systems, which should be taken into consideration when designing cache coherence protocols. Also, contemporary cache coherence ...
An efficient cache coherence mechanism for chip multiprocessors

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

IFMT '10: Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies

June 2010

91 pages

ISBN:9781450300087

DOI:10.1145/1882453

General Chairs:
Hisham El-Shishiny
World-Wide Leader of IBM Centers for Advanced Studies
,
Erven Rohou
INRIA Rennes, France

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

E-JUST: Egypt-Japan University of Science and Technology
CAS: IBM Centers for Advanced Studies
INRIA: Institut Natl de Recherche en Info et en Automatique

In-Cooperation

ACM: Association for Computing Machinery

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

IFMT '10

Sponsor:

E-JUST
CAS
INRIA

IFMT '10: Second International Forum on Next-Generation Multicore/Manycore Technologies

June 19, 2010

Saint-Malo, France

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
385
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)1

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gade SDeb S(2021)A Novel Hybrid Cache Coherence with Global Snooping for Many-core ArchitecturesACM Transactions on Design Automation of Electronic Systems10.1145/346277527:1(1-31)Online publication date: 13-Sep-2021
https://dl.acm.org/doi/10.1145/3462775
Wang ZWeng JLowe-Power JGaur JNowatzki T(2021)Stream Floating: Enabling Proactive and Decentralized Cache Optimizations2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00060(640-653)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00060
Ling MLu XWang GGe J(2021)Analytical Modeling the Multi-Core Shared Cache Behavior With Considerations of Data-Sharing and CoherenceIEEE Access10.1109/ACCESS.2021.30533509(17728-17743)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3053350
Musleh MPai VKern JVetter J(2015)Automatic sharing classification and timely push for cache-coherent systemsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/2807591.2807649(1-12)Online publication date: 15-Nov-2015
https://dl.acm.org/doi/10.1145/2807591.2807649
Huang LFettweis GNebel W(2014)Leveraging on-chip networks for efficient prediction on multicore coherenceProceedings of the conference on Design, Automation & Test in Europe10.5555/2616606.2616825(1-4)Online publication date: 24-Mar-2014
https://dl.acm.org/doi/10.5555/2616606.2616825
Huang LWang ZXiao NWang YDou Q(2014)Integrated Coherence PredictionACM Transactions on Design Automation of Electronic Systems10.1145/261175619:3(1-22)Online publication date: 23-Jun-2014
https://dl.acm.org/doi/10.1145/2611756
Zeng FQiao LWang W(2011)PEPCPProceedings of the 2011 International Conference on Parallel Processing10.1109/ICPP.2011.34(63-72)Online publication date: 13-Sep-2011
https://dl.acm.org/doi/10.1109/ICPP.2011.34

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten