skip to main content
10.1145/1327171.1327174acmconferencesArticle/Chapter ViewAbstractPublication PagesmedeaConference Proceedingsconference-collections
research-article

Improving the accuracy of snoop filtering using stream registers

Published: 16 September 2007 Publication History

Abstract

Multi-core processors have become mainstream; they provide parallelism with relatively low complexity. As true on-chip SMPs evolve, coherence traffic between cores is becoming problematic, both in terms of performance and power. The negative effects of coherence (snoop) traffic can be significantly mitigated through snoop filtering. Shielding each cache with a device that can squash snoop requests for addresses known not to be in cache improves performance significantly for caches that cannot perform normal load and snoop lookups simultaneously. In addition, reducing snoop lookups yields power savings.
This paper introduces Stream Register snoop filtering, which captures the spatial locality of multiple memory reference streams in a few registers. We propose a snoop filter that combines Stream Registers with "snoop caching", a mechanism that captures the temporal locality of frequently accessed addresses. Simulations of Splash- 2 benchmarks on a 4-core multiprocessor illustrate tradeoffs and strengths of these two techniques. Their combination is most effective, eliminating 94-99% of all snoop requests using very few stream registers and snoop cache lines.

References

[1]
F. Aono and M. Kimura. The Azusa 16-way Itanium server. IEEE Micro, 20(5):54--60, September/October 2000.
[2]
F. Briggs, S. Chittor, and K. Cheng. Micro-architecture techniques in the intel e8870 scalable memory controller. In Proceedings of the 3rd Workshop on Memory Performance Issues, in conjunction with ISCA-31, pages 30--36, June 2004.
[3]
A. Bright, M. Ellavsky, A. Gara, R. Haring, G. Kopcsay, R. Lembach, J. Marcella, M. Ohmacht, and V. Salapura. Creating the BlueGene/L supercomputer from low power SoC ASICs. In Internationcal Solid State Circuits Conference. IEEE, February 2005.
[4]
S. Chinthamani and R. Iyer. Design and evaluation of snoop filters for web servers. In Proceedings of the 2004 Symposium on Performance Evaluation of Computer Telecommunication Systems, July 2004.
[5]
R. Dennard, F. Gaensslen, H.-N. Yu, V. Rideout, E. Bassous, and A. LeBlanc. Design of ion-implanted MOSFETs with very small physical dimensions. IEEE Journal of Solid-State Circuits, pages 256--268, 1974.
[6]
S. Ekman, F. Dahlgren, and P. Stenstrom. TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors. In Proceedings of the 2002 International Symposium on Low Power Electronics and Design, pages 243--246, August 2002.
[7]
S. Gochman, A. Mendelson, A. Naveh, and E. Rotem. Introduction to Intel Core Duo processor architecture. Intel Technology Journal, May 2006.
[8]
R. Gonzalez and M. Horowitz. Energy dissipation in general purpose microprocessors. IEEE Journal of Solid State Circuits, 31(9):1277--1284, September 1996.
[9]
M. Gschwind, P. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, and T. Yamazaki. A novel SIMD architecture for the CELL heterogeneous chip-multiprocessor. In Hot Chips 17, Palo Alto, CA, August 2005.
[10]
IBM. IBM PowerPC 440 product brief. http://www-306.ibm.com/chips/techlib/techlib.nsf/products/PowerPC_440_Embedded_Core, July 2006.
[11]
J. P. Singh, W-D. Weber, and A. Gupta. Splash: Stanford parallel applications for shared memory. Computer Architecture News, pages 5--44, March 1992.
[12]
C. Keltcher, K. McGrath, A. Ahmed, and P. Conway. The AMD opteron processor for multiprocessor servers. IEEE Micro, 23(2):66--76, March/April 2003.
[13]
A. Moshovos. Regionscout: Exploiting coarse grain sharing in snoop-based coherence. In Proceedings of the 32nd Annual International Symposium on Computer Architecture, pages 234--245, June 2005.
[14]
A. Moshovos, G. Memik, B. Falsafi, and A. N. Choudhary. JETTY: Filtering snoops for reduced energy consumption in SMP servers. In HPCA-7, pages 85--96, 2001.
[15]
A.-T. Nguyen, M. Michael, A. Sharma, and J. Torrellas. The augmint multiprocessor simulation toolkit for intel x86 architectures. In Proceedings of 1996 International Conference on Computer Design, October 1996.
[16]
V. Salapura et al. Power and performance optimization at the system level. In Proceedings of Computing Frontiers 2005, Ischia, Italy, May 2005.
[17]
C. Saldanha and M. Lipasti. Power efficient cache coherence. In Proceedings of the Workshop on Memory Performance Issues, in conjunction with ISCA, June 2001.
[18]
V. Srinivasan, D. Brooks, M. Gschwind, P. Bose, V. Zyuban, P. Strenski, and P. Emma. Optimizing pipelines for power and performance. In ACM/IEEE, editor, Proceedings of the 35th Annual International Symposium on Microarchitecture, pages 333--344, Istanbul, Turkey, November 2002.
[19]
S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta. The splash-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture. ACM, June 1995.

Cited By

View all
  • (2022)Near-Stream Computing: General and Transparent Near-Cache Acceleration2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00032(331-345)Online publication date: Apr-2022
  • (2021)Stream Floating: Enabling Proactive and Decentralized Cache Optimizations2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00060(640-653)Online publication date: Feb-2021
  • (2018)An Adaptive Mechanism for Designing Efficient Snoop FiltersIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2018.281024126:7(1233-1240)Online publication date: 1-Jul-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MEDEA '07: Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
September 2007
113 pages
ISBN:9781595938077
DOI:10.1145/1327171
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2007

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

PACT07
Sponsor:

Acceptance Rates

Overall Acceptance Rate 6 of 9 submissions, 67%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Near-Stream Computing: General and Transparent Near-Cache Acceleration2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00032(331-345)Online publication date: Apr-2022
  • (2021)Stream Floating: Enabling Proactive and Decentralized Cache Optimizations2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00060(640-653)Online publication date: Feb-2021
  • (2018)An Adaptive Mechanism for Designing Efficient Snoop FiltersIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2018.281024126:7(1233-1240)Online publication date: 1-Jul-2018
  • (2017)Leak StopperACM Transactions on Design Automation of Electronic Systems10.1145/301577022:3(1-27)Online publication date: 10-Mar-2017
  • (2014)Power-Efficient Computer Architectures: Recent AdvancesSynthesis Lectures on Computer Architecture10.2200/S00611ED1V01Y201411CAC0309:3(1-96)Online publication date: Dec-2014
  • (2013)Heterogeneous system coherence for integrated CPU-GPU systemsProceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/2540708.2540747(457-467)Online publication date: 7-Dec-2013
  • (2012)Counting stream registers: An efficient and effective snoop filter architecture2012 International Conference on Embedded Computer Systems (SAMOS)10.1109/SAMOS.2012.6404165(120-127)Online publication date: Jul-2012
  • (2011)Filtering directory lookups in CMPs with write-through cachesProceedings of the 17th international conference on Parallel processing - Volume Part I10.5555/2033345.2033375(269-281)Online publication date: 29-Aug-2011
  • (2011)Exploring the architecture of a stream register-based snoop filterTransactions on high-performance embedded architectures and compilers III10.5555/1980776.1980784(93-114)Online publication date: 1-Jan-2011
  • (2011)Filtering directory lookups in CMPsMicroprocessors & Microsystems10.1016/j.micpro.2011.08.00635:8(695-707)Online publication date: 1-Nov-2011
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media