skip to main content
10.1145/1248377.1248398acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
Article

Proximity-aware directory-based coherence for multi-core processor architectures

Published: 09 June 2007 Publication History

Abstract

As the number of cores increases on chip multiprocessors, coherence is fast becoming a central issue for multi-core performance. This is exacerbated by the fact that interconnection speeds are not scaling well with technology. This paper describes mechanisms to accelerate coherence for a multi-core architecture that has multiple private L2 caches and a scalable point-to-point interconnect between cores. These techniques exploit the differences in geometry between chip multiprocessors and traditional multiprocessor architectures.
Directory-based protocols have been proposed as a scalable alternative to snoop-based protocols. In this paper, we discuss implementations of coherence for CMPs and propose and evaluate a novel directory-based coherence scheme to improve the performance of parallel programs on such processors. Proximity-aware coherence accelerates read and write misses by initiating cache-to-cache transfers from the spatially closest sharer. This has the dual benefit of eliminating unnecessary accesses to off-chip memory, and minimizing the distance over which communicated data moves across the network. The proposed schemes result in speedups up to 74.9% for our workloads.

References

[1]
M. E. Acacio, J. Gonzalez, J. M. Garcia, and J. Duato. A novel approach to reduce l2 miss latency in shared-memory multiprocessors. In IPDPS '02: Proceedings of the 16th International Parallel and Distributed Processing Symposium, page 25, Washington, DC, USA, 2002. IEEE Computer Society.
[2]
AMD. http://www.amd.com/usen/processors/productinformation/0 30 118 9484%,00.html.
[3]
L. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A scalable architecture based on single-chip multiprocessing. In ISCA-27, 2000.
[4]
J. Chang and G. S. Sohi. Cooperative caching for chip multiprocessors. In Proceedings of the 33rd International Symposium on Computer Architecture, pages 264--276, Washington, DC, USA, 2006. IEEE Computer Society.
[5]
F. Dahlgren and J. Torrellas. Cache-only memory architectures. Computer, 32(6):72--79, 1999.
[6]
Device Group. Predictive technology model. In UC Berkeley Technical Report, 2001.
[7]
N. Eisley, L.-S. Peh, and L. Shang. In-network cache coherence. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pages 321{332, Washington, DC, USA, 2006. IEEE Computer Society.
[8]
A. Gupta, W.-D. Weber, and T. C. Mowry. Reducing memory and traffic requirements for scalable directory-based cache coherence schemes. In ICPP (1), pages 312--321, 1990.
[9]
A. Hartstein and T. R. Puzak. The optimum pipeline depth considering both power and performance. ACM Trans. Archit. Code Optim., 1(4):369--388, 2004.
[10]
R. Ho, K. Mai, and M. Horowitz. The future of wires. Proceedings of the IEEE, 89(4):490--504, 2001.
[11]
J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. W. Keckler. A nuca substrate for exible cmp cache sharing. In Proceedings of the 19th ACM International Conference on Supercomputing (ICS 05), June 2005.
[12]
IBM. Power5: Presentation at microprocessor forum. 2003.
[13]
Intel. http://www.intel.com/products/processor/coreduo/.
[14]
P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-way multithreaded sparc processor. In IEEE MICRO Magazine, Mar. 2005.
[15]
R. Kumar, V. Zyuban, and D. M. Tullsen. Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling. In Proceedings of International Symposium on Computer Architecture, 2005.
[16]
R. Kumar, V. Zyuban, and D. M. Tullsen. Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling. In Proceedings of International Symposium on Computer Architecture, June 2005.
[17]
J. Laudon and D. Lenoski. The SGI Origin: a ccNUMA highly scalable server. In ISCA '97: Proceedings of the 24th annual international symposium on Computer architecture, pages 241--251, New York, NY, USA, 1997. ACM Press.
[18]
D. Lenoski, J. Laudon, K. Gharachorloo, W. Weber, A. Gupta, J. Henessy, M. Horowitz, and M. Lam. The stanford DASH multiprocessor. In IEEE Computer, 1992.
[19]
M. M. K. Martin, M. D. Hill, and D. A. Wood. Token coherence: decoupling performance and correctness. In Proceedings of the 30th annual international symposium on Computer architecture, pages 182--193, New York, NY, USA, 2003. ACM Press.
[20]
M. M. Michael and A. K. Nanda. Design and performance of directory caches for scalable shared memory multiprocessors. In HPCA '99: Proceedings of the 5th International Symposium on High Performance Computer Architecture, page 142, Washington, DC, USA, 1999. IEEE Computer Society.
[21]
B. W. O'Krafka and A. R. Newton. An empirical evaluation of two memory-efficient directory methods. In ISCA '90: Proceedings of the 17th annual international symposium on Computer Architecture, pages 138--147, New York, NY, USA, 1990. ACM Press.
[22]
V. S. Pai, P. Ranganathan, and S. V. Adve. RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors and Uniprocessors. In Proceedings of the Third Workshop on Computer Architecture Education, February 1997. Also appears in IEEE TCCA Newsletter, October 1997.
[23]
Sun. UltrasparcIV: http://siliconvalley.internet.com/news/print.php/3090801.
[24]
M. Zhang and K. Asanovic. Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors. In ISCA '05: Proceedings of the 32nd Annual International Symposium on Computer Architecture, pages 336--345, Washington, DC, USA, 2005. IEEE Computer Society.
[25]
Z. Zhang and J. Torrellas. Reducing remote conict misses: Numa with remote cache versus coma. In HPCA '97: Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture, page 272, Washington, DC, USA, 1997. IEEE Computer Society.

Cited By

View all
  • (2017)Coding for Efficient Caching in Multicore Embedded Systems2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI.2017.59(296-301)Online publication date: Jul-2017
  • (2016)Performance Analysis of Cache Coherence Protocols for Multi-core ArchitecturesProceedings of the International Conference on Advances in Information Communication Technology & Computing10.1145/2979779.2979801(1-7)Online publication date: 12-Aug-2016
  • (2016)Pattern Based Cache Coherency Architecture for Embedded ManycoresProcedia Computer Science10.1016/j.procs.2016.05.48180:C(1542-1553)Online publication date: 1-Jun-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
June 2007
376 pages
ISBN:9781595936677
DOI:10.1145/1248377
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. chip multiprocessors
  2. coherence

Qualifiers

  • Article

Conference

SPAA07

Acceptance Rates

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Upcoming Conference

SPAA '25
37th ACM Symposium on Parallelism in Algorithms and Architectures
July 28 - August 1, 2025
Portland , OR , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)8
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Coding for Efficient Caching in Multicore Embedded Systems2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI.2017.59(296-301)Online publication date: Jul-2017
  • (2016)Performance Analysis of Cache Coherence Protocols for Multi-core ArchitecturesProceedings of the International Conference on Advances in Information Communication Technology & Computing10.1145/2979779.2979801(1-7)Online publication date: 12-Aug-2016
  • (2016)Pattern Based Cache Coherency Architecture for Embedded ManycoresProcedia Computer Science10.1016/j.procs.2016.05.48180:C(1542-1553)Online publication date: 1-Jun-2016
  • (2015)Simulation based Performance Study of Cache Coherence ProtocolsProceedings of the 2015 IEEE International Symposium on Nanoelectronic and Information Systems (iNIS)10.1109/iNIS.2015.52(125-130)Online publication date: 21-Dec-2015
  • (2015)Adaptive Cache Coherence Mechanisms with Producer–Consumer Sharing Optimization for Chip MultiprocessorsIEEE Transactions on Computers10.1109/TC.2013.21764:2(316-328)Online publication date: 1-Feb-2015
  • (2015)Cluster Cache MonitorInternational Journal of Parallel Programming10.1007/s10766-014-0339-043:6(1054-1077)Online publication date: 1-Dec-2015
  • (2014)Accelerated design space pruning for CMP memory architecturesProceedings of the High Performance Computing Symposium10.5555/2663510.2663535(1-6)Online publication date: 13-Apr-2014
  • (2014)A Practical Data Classification Framework for Scalable and High Performance Chip-MultiprocessorsIEEE Transactions on Computers10.1109/TC.2013.16163:12(2905-2918)Online publication date: 1-Dec-2014
  • (2013)Bayesian Theory Based Adaptive Proximity Data Accessing for CMP CachesIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences10.1587/transfun.E96.A.1293E96.A:6(1293-1305)Online publication date: 2013
  • (2013)BibliographyMulticore Technology10.1201/b15268-20(409-450)Online publication date: 18-Jul-2013
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media