skip to main content
10.1145/1454115.1454144acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Leveraging on-chip networks for data cache migration in chip multiprocessors

Published: 25 October 2008 Publication History

Abstract

Recently, chip multiprocessors (CMPs) have arisen as the de facto design for modern high-performance processors, with increasing core counts. An important property of CMPs is that remote, but on-chip, L2 cache accesses are less costly than off-chip accesses; this is in contrast to earlier chip-to-chip or board-to-board multiprocessors, where an access to a remote node is just as costly if not more so than a main memory access. This motivates on-chip cache migration as a means to retain more data on-chip. However, previously proposed techniques do not scale to high core counts: they do not leverage the on-chip caches of all cores nor have a scalable migration mechanism. In this paper we propose ascalable in-network migration technique which uses hints embedded within the router microarchitecture to steer L2 cache evictions towards free/invalid cache slots in any on-chip core cache, rather than evicting it off-chip. We show that our technique can provide an average of a 19% reduction in the number of off-chip memory accesses over the state-of-the-art, beating the performance of a pseudo-optimal migration technique. This can be done with negligible area overhead and a manageable traffic overhead of 13.4%.

References

[1]
B. M. Beckmann et al. ASR: Adaptive Selective Replication for CMP Caches. In Proc. of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 443--454, December, 2006.
[2]
D. Burger et al. Memory Bandwidth Limitations of Future Microprocessors. In Proc. of the 23rd Annual International Symposium on Computer Architecture, pp. 78--89, May, 1996.
[3]
H. Cain et al. Precise and Accurate Processor Simulation. In Proc. of the 5th Workshop on Computer Architecture Evaluation Using Commercial Workloads, pp. 13--22, February, 2006.
[4]
J. Chang et al. Cooperative Caching for Chip Multiprocessors. In Proc. of the 33rd Annual International Symposium on Computer Architecture, pp. 264--276, May, 2006.
[5]
J. Chen et al. Hardware-Modulated Parallelism in Chip Multiprocessors. In DASCMP, November, 2005.
[6]
Z. Chishti et al. Optimizing Replication, Communication, and Capacity Allocation in CMPs. In Proc. of the 32nd Annual International Symposium on Computer Architecture, pp. 357--368, May, 2005.
[7]
N. Eisley et al. In-Network Cache Coherence. In Proc. of the 39th Annual International Symposium on Microarchitecture, pp. 321--332, December, 2006.
[8]
J. R. Goodman and P. J. Woest. The Wisconsin Multicube: A New Large-Scale Cache-Coherent Multiprocessor. in Proc. of the 15th International Symposium on High Performance Computer Architecture, pp. 422--431, June, 1988.
[9]
L. Hammond et al. The Stanford Hydra CMP. In IEEE Micro, Vol. 20, No. 2, pp. 71--84, 2000.
[10]
J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. San Francisco, CA, USA: Morgan Kaufmann Publishers, Inc., 2003.
[11]
J. Huh et al. A NUCA Substrate for Flexible CMP Cache Sharing. In Proc. of the 19th Annual International Conference on Supercomputing, pp. 31--40, June, 2005.
[12]
R. Iyer et al. Using Switch Directories to Speed up Cache-to-Cache Transfers in CC-NUMA Multiprocessors. In Proc. of the 14th International Parallel and Distributed Processing Symposium, pp. 721--728, May, 2000.
[13]
S. Kaxiras and J. R. Goodman. The GLOW Cache Coherence Protocol Extensions for Widely Shared Data. In Proc. of the 10th International Conference on Supercomputing, pp. 35--43, May, 1996.
[14]
L. Lamport. How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs. In IEEE Transactions on Computing, Vol. c-28, No. 9, pp. 690--691, September, 1979.
[15]
A. Mendelson et al. CMP Implementation in Systems Based on the Intel Core Duo Processor. In Intel Technology Journal, Vol. 10, No. 2, May, 2006.
[16]
H. E. Mizrahi et al. Introducing Memory into the Switch Elements of Multiprocessor Interconnection Networks. In Proc. of the 16th International Symposium on Computer Architecture, pp. 158--166, June, 1989.
[17]
K. Olukotun et al. The Case for a Single-Chip Multiprocessor. In IEEE SIGPLAN Notices, Vol. 31, No. 9, pp. 2--11, 1996.
[18]
S. J. E. Wilton and N. P. Jouppi. An Enhanced Access and Cycle Time Model for on-Chip Caches. DECWestern Research Laboratory, No. 93/5, 1994.
[19]
W. A. Wulf and S. A. McKee. Hitting the Memory Wall: Implications of the Obvious. In SIGARCH Computer Architecture News, Vol. 23, No. 1, pp. 20--24, 1995.
[20]
M. Zhang and K. Asanovic. Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors. In Proc. of the 32nd International Symposium on Computer Architecture, pp. 336--345, June, 2005.
[21]
M. Zhang and K. Asanovic. Victim Migration: Dynamically Adapting between Private and Shared CMP Caches. MIT Technical Report MIT-CSAIL-TR-2005-064, MIT-LCS-TR-1006, October, 2005.
[22]
http://www-128.ibm.com/developerworks/power/library/paexpert1.html
[23]
http://www-flash.stanford.edu/apps/SPLASH/
[24]
http://www.intel.com/multi-core/
[25]
http://www.sun.com/processors/throughput/
[26]
http://www.virtutech.com/

Cited By

View all
  • (2015)Static Task Partitioning for Locked Caches in Multicore Real-Time SystemsACM Transactions on Embedded Computing Systems10.1145/263855714:1(1-30)Online publication date: 21-Jan-2015
  • (2014)Locality-oblivious cache organization leveraging single-cycle multi-hop NoCsACM SIGARCH Computer Architecture News10.1145/2654822.254197642:1(715-728)Online publication date: 24-Feb-2014
  • (2014)Locality-oblivious cache organization leveraging single-cycle multi-hop NoCsACM SIGPLAN Notices10.1145/2644865.254197649:4(715-728)Online publication date: 24-Feb-2014
  • Show More Cited By

Index Terms

  1. Leveraging on-chip networks for data cache migration in chip multiprocessors

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques
      October 2008
      328 pages
      ISBN:9781605582825
      DOI:10.1145/1454115
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 25 October 2008

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. CMP
      2. chip-multiprocessor
      3. interconnection network
      4. migration
      5. network-driven computing

      Qualifiers

      • Research-article

      Conference

      PACT '08
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 121 of 471 submissions, 26%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)10
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 16 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2015)Static Task Partitioning for Locked Caches in Multicore Real-Time SystemsACM Transactions on Embedded Computing Systems10.1145/263855714:1(1-30)Online publication date: 21-Jan-2015
      • (2014)Locality-oblivious cache organization leveraging single-cycle multi-hop NoCsACM SIGARCH Computer Architecture News10.1145/2654822.254197642:1(715-728)Online publication date: 24-Feb-2014
      • (2014)Locality-oblivious cache organization leveraging single-cycle multi-hop NoCsACM SIGPLAN Notices10.1145/2644865.254197649:4(715-728)Online publication date: 24-Feb-2014
      • (2014)Locality-oblivious cache organization leveraging single-cycle multi-hop NoCsProceedings of the 19th international conference on Architectural support for programming languages and operating systems10.1145/2541940.2541976(715-728)Online publication date: 24-Feb-2014
      • (2014)Bandwidth Adaptive Cache Coherence Optimizations for Chip MultiprocessorsInternational Journal of Parallel Programming10.1007/s10766-013-0247-842:3(435-455)Online publication date: 1-Jun-2014
      • (2013)Write activity reduction on non-volatile main memories for embedded chip multiprocessorsACM Transactions on Embedded Computing Systems10.1145/2442116.244212712:3(1-27)Online publication date: 8-Apr-2013
      • (2012)Static task partitioning for locked caches in multi-core real-time systemsProceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems10.1145/2380403.2380434(161-170)Online publication date: 7-Oct-2012
      • (2012)The migration prefetcherACM Transactions on Architecture and Code Optimization10.1145/2086696.20867248:4(1-20)Online publication date: 26-Jan-2012
      • (2012)Bandwidth Adaptive Write-update Optimizations for Chip MultiprocessorsProceedings of the 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications10.1109/ISPA.2012.34(199-206)Online publication date: 10-Jul-2012
      • (2012)Robust SIMDProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium10.1109/IPDPS.2012.20(107-118)Online publication date: 21-May-2012
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media