research-article

Improving the accuracy of snoop filtering using stream registers

Authors:

Valentina Salapura,

Matthias Blumrich,

Alan GaraAuthors Info & Claims

MEDEA '07: Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture

Pages 25 - 32

https://doi.org/10.1145/1327171.1327174

Published: 16 September 2007 Publication History

Abstract

Multi-core processors have become mainstream; they provide parallelism with relatively low complexity. As true on-chip SMPs evolve, coherence traffic between cores is becoming problematic, both in terms of performance and power. The negative effects of coherence (snoop) traffic can be significantly mitigated through snoop filtering. Shielding each cache with a device that can squash snoop requests for addresses known not to be in cache improves performance significantly for caches that cannot perform normal load and snoop lookups simultaneously. In addition, reducing snoop lookups yields power savings.

This paper introduces Stream Register snoop filtering, which captures the spatial locality of multiple memory reference streams in a few registers. We propose a snoop filter that combines Stream Registers with "snoop caching", a mechanism that captures the temporal locality of frequently accessed addresses. Simulations of Splash- 2 benchmarks on a 4-core multiprocessor illustrate tradeoffs and strengths of these two techniques. Their combination is most effective, eliminating 94-99% of all snoop requests using very few stream registers and snoop cache lines.

References

[1]

F. Aono and M. Kimura. The Azusa 16-way Itanium server. IEEE Micro, 20(5):54--60, September/October 2000.

Digital Library

[2]

F. Briggs, S. Chittor, and K. Cheng. Micro-architecture techniques in the intel e8870 scalable memory controller. In Proceedings of the 3rd Workshop on Memory Performance Issues, in conjunction with ISCA-31, pages 30--36, June 2004.

Digital Library

[3]

A. Bright, M. Ellavsky, A. Gara, R. Haring, G. Kopcsay, R. Lembach, J. Marcella, M. Ohmacht, and V. Salapura. Creating the BlueGene/L supercomputer from low power SoC ASICs. In Internationcal Solid State Circuits Conference. IEEE, February 2005.

[4]

S. Chinthamani and R. Iyer. Design and evaluation of snoop filters for web servers. In Proceedings of the 2004 Symposium on Performance Evaluation of Computer Telecommunication Systems, July 2004.

[5]

R. Dennard, F. Gaensslen, H.-N. Yu, V. Rideout, E. Bassous, and A. LeBlanc. Design of ion-implanted MOSFETs with very small physical dimensions. IEEE Journal of Solid-State Circuits, pages 256--268, 1974.

[6]

S. Ekman, F. Dahlgren, and P. Stenstrom. TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors. In Proceedings of the 2002 International Symposium on Low Power Electronics and Design, pages 243--246, August 2002.

Digital Library

[7]

S. Gochman, A. Mendelson, A. Naveh, and E. Rotem. Introduction to Intel Core Duo processor architecture. Intel Technology Journal, May 2006.

[8]

R. Gonzalez and M. Horowitz. Energy dissipation in general purpose microprocessors. IEEE Journal of Solid State Circuits, 31(9):1277--1284, September 1996.

[9]

M. Gschwind, P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, and T. Yamazaki. A novel SIMD architecture for the CELL heterogeneous chip-multiprocessor. In Hot Chips 17, Palo Alto, CA, August 2005.

[10]

IBM. IBM PowerPC 440 product brief. http://www-306.ibm.com/chips/techlib/techlib.nsf/products/PowerPC_440_Embedded_Core, July 2006.

[11]

J. P. Singh, W-D. Weber, and A. Gupta. Splash: Stanford parallel applications for shared memory. Computer Architecture News, pages 5--44, March 1992.

Digital Library

[12]

C. Keltcher, K. McGrath, A. Ahmed, and P. Conway. The AMD opteron processor for multiprocessor servers. IEEE Micro, 23(2):66--76, March/April 2003.

Digital Library

[13]

A. Moshovos. Regionscout: Exploiting coarse grain sharing in snoop-based coherence. In Proceedings of the 32nd Annual International Symposium on Computer Architecture, pages 234--245, June 2005.

Digital Library

[14]

A. Moshovos, G. Memik, B. Falsafi, and A. N. Choudhary. JETTY: Filtering snoops for reduced energy consumption in SMP servers. In HPCA-7, pages 85--96, 2001.

Digital Library

[15]

A.-T. Nguyen, M. Michael, A. Sharma, and J. Torrellas. The augmint multiprocessor simulation toolkit for intel x86 architectures. In Proceedings of 1996 International Conference on Computer Design, October 1996.

Digital Library

[16]

V. Salapura et al. Power and performance optimization at the system level. In Proceedings of Computing Frontiers 2005, Ischia, Italy, May 2005.

Digital Library

[17]

C. Saldanha and M. Lipasti. Power efficient cache coherence. In Proceedings of the Workshop on Memory Performance Issues, in conjunction with ISCA, June 2001.

[18]

V. Srinivasan, D. Brooks, M. Gschwind, P. Bose, V. Zyuban, P. Strenski, and P. Emma. Optimizing pipelines for power and performance. In ACM/IEEE, editor, Proceedings of the 35th Annual International Symposium on Microarchitecture, pages 333--344, Istanbul, Turkey, November 2002.

Digital Library

[19]

S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta. The splash-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture. ACM, June 1995.

Digital Library

Cited By

Wang ZWeng JLiu SNowatzki T(2022)Near-Stream Computing: General and Transparent Near-Cache Acceleration2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00032(331-345)Online publication date: Apr-2022
https://doi.org/10.1109/HPCA53966.2022.00032
Wang ZWeng JLowe-Power JGaur JNowatzki T(2021)Stream Floating: Enabling Proactive and Decentralized Cache Optimizations2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00060(640-653)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00060
Lin CCho SChang S(2018)An Adaptive Mechanism for Designing Efficient Snoop FiltersIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2018.281024126:7(1233-1240)Online publication date: 1-Jul-2018
https://dl.acm.org/doi/10.1109/TVLSI.2018.2810241
Show More Cited By

Recommendations

Energy-efficient MESI cache coherence with pro-active snoop filtering for multicore microprocessors
ISLPED '08: Proceedings of the 2008 international symposium on Low Power Electronics & Design

We present a snoop filtering mechanism for multicore microprocessors that implement coherent caches using the MESI protocol. The relatively small filter structure at each core maintains coarse-grain sharing information about regions within a page to ...
Exploring the architecture of a stream register-based snoop filter
Transactions on high-performance embedded architectures and compilers III

Multi-core processors have become mainstream; they provide parallelism with relatively low complexity. As true on-chip symmetric multiprocessors evolve, coherence traffic between cores is becoming problematic, both in terms of performance and power. The ...
Exploring the Architecture of a Stream Register-Based Snoop Filter
Proceedings of the 2011 conference on Transactions on High-Performance Embedded Architectures and Compilers III - Volume 6590

Multi-core processors have become mainstream; they provide parallelism with relatively low complexity. As true on-chip symmetric multiprocessors evolve, coherence traffic between cores is becoming problematic, both in terms of performance and power. The ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MEDEA '07: Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture

September 2007

113 pages

ISBN:9781595938077

DOI:10.1145/1327171

Conference Chairs:
Pierfrancesco Foglia
University of Pisa
,
Cosimo Antonio Prete
University of Pisa
,
Sandro Bartolini
University of Siena
,
Roberto Giorgi
University of Siena

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

PACT07

Sponsor:

SIGARCH

PACT07: International Conference on Parallel Architectures and Compilation Techniques

September 16, 2007

Brasov, Romania

Acceptance Rates

Overall Acceptance Rate 6 of 9 submissions, 67%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
400
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang ZWeng JLiu SNowatzki T(2022)Near-Stream Computing: General and Transparent Near-Cache Acceleration2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00032(331-345)Online publication date: Apr-2022
https://doi.org/10.1109/HPCA53966.2022.00032
Wang ZWeng JLowe-Power JGaur JNowatzki T(2021)Stream Floating: Enabling Proactive and Decentralized Cache Optimizations2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00060(640-653)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00060
Lin CCho SChang S(2018)An Adaptive Mechanism for Designing Efficient Snoop FiltersIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2018.281024126:7(1233-1240)Online publication date: 1-Jul-2018
https://dl.acm.org/doi/10.1109/TVLSI.2018.2810241
Peng YChen CTsai HYang KHuang PChang SJone WChen T(2017)Leak StopperACM Transactions on Design Automation of Electronic Systems10.1145/301577022:3(1-27)Online publication date: 10-Mar-2017
https://dl.acm.org/doi/10.1145/3015770
Själander MMartonosi MKaxiras S(2014)Power-Efficient Computer Architectures: Recent AdvancesSynthesis Lectures on Computer Architecture10.2200/S00611ED1V01Y201411CAC0309:3(1-96)Online publication date: Dec-2014
https://doi.org/10.2200/S00611ED1V01Y201411CAC030
Power JBasu AGu JPuthoor SBeckmann BHill MReinhardt SWood DFarrens MKozyrakis C(2013)Heterogeneous system coherence for integrated CPU-GPU systemsProceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/2540708.2540747(457-467)Online publication date: 7-Dec-2013
https://dl.acm.org/doi/10.1145/2540708.2540747
Ranganathan ABayrak AKluter TBrisk PCharbon EIenne P(2012)Counting stream registers: An efficient and effective snoop filter architecture2012 International Conference on Embedded Computer Systems (SAMOS)10.1109/SAMOS.2012.6404165(120-127)Online publication date: Jul-2012
https://doi.org/10.1109/SAMOS.2012.6404165
Bosque AViñals VIbañez PLlaberia J(2011)Filtering directory lookups in CMPs with write-through cachesProceedings of the 17th international conference on Parallel processing - Volume Part I10.5555/2033345.2033375(269-281)Online publication date: 29-Aug-2011
https://dl.acm.org/doi/10.5555/2033345.2033375
Blumrich MSalapura VGara A(2011)Exploring the architecture of a stream register-based snoop filterTransactions on high-performance embedded architectures and compilers III10.5555/1980776.1980784(93-114)Online publication date: 1-Jan-2011
https://dl.acm.org/doi/10.5555/1980776.1980784
Bosque AViñals VIbáñez PLlaberıa J(2011)Filtering directory lookups in CMPsMicroprocessors & Microsystems10.1016/j.micpro.2011.08.00635:8(695-707)Online publication date: 1-Nov-2011
https://dl.acm.org/doi/10.1016/j.micpro.2011.08.006
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten