Article

Exploring the energy efficiency of cache coherence protocols in single-chip multi-processors

Authors:
Mirko Loghi

Università di Verona, Verona, Italy

Università di Verona, Verona, Italy
View Profile

,
Martin Letis

Università di Verona, Verona, Italy

Università di Verona, Verona, Italy
View Profile

,
Luca Benini

Università di Bologna, Bologna, Italy

Università di Bologna, Bologna, Italy
View Profile

,
Massimo Poncino

Politecnico di Torino, Torino, Italy

Politecnico di Torino, Torino, Italy
View Profile

GLSVLSI '05: Proceedings of the 15th ACM Great Lakes symposium on VLSIApril 2005Pages 276–281https://doi.org/10.1145/1057661.1057728

Published:17 April 2005Publication History

GLSVLSI '05: Proceedings of the 15th ACM Great Lakes symposium on VLSI

Pages 276–281

ABSTRACT

The performance of the various cache coherence protocols proposed in the literature have been extensively analyzed in the context of high-performance multi-processor systems.A similar analysis for Multi-Processor Systems-on-Chips (MP-SoCs), where energy is at least as important as performace, and for which strict constraints on hardware and software resources do exist, has not been done yet.This work provides an effort in that sense, showing energy/performance tradeoffs for different snoop-based protocols on a realistic MPSoC architecture. The analysis leverage a multi-processor simulation platform, augmented with accurate power models, that allows cycle-accurate simulations.Our analysis show that (i) cache write policy is actually more important than the actual cache coherence protocol, and (ii) matching the programming model and style to the architecture may have dramatic effects on the energy and performance of the system.

References

"Broadening the Reach of the Intel Itanium 2 Processor Family," Technical White Paper, www.intel.com/ebusiness/pdf/prod/itanium/wp reach.pdfGoogle Scholar
M. Tremblay, J. Chen, S. Chaudry, A. Conigliaro, S.-S. Tse. "The MAJC Architecture: A Synthesis of Parallelism and Scalability,", IEEE Micro, Vol. 20, No. 6, Nov.-Dec. 2000, pp 12--25. Google ScholarDigital Library
J.M. Tendler, J.S. Dodson, J.S. Fields Jr., H. Le, B. Sin-Haroy. "POWER4 System Microarchitecture," IBM Journal of Research and Development, Vol. 46, No. 1, January 2002. Google ScholarDigital Library
P. Cumming "The TI OMAP Platform Approach to SoC," in Winning the SOC Revolution, Kluwer Academic Publishers, 2003.Google Scholar
S. Richardson, "MPOC: A Chip Multiprocessor for Embedded Systems,", HP Technical Report, HPL-2002-186, July 2002.Google Scholar
B. Ackland et al., "A Single Chip, 1.6 Billion, 16-b MAC/s Multiprocessor DSP," IEEE Journal of Solid State Circuits, Vol. 35, No. 3, March 2000.Google ScholarCross Ref
Philips Semiconductor, "Philips Nexperia Platform", www.semiconductors.philips.com/products/nexperia/home S. Dutta, R. Jensen, A. Rieckmann.Google Scholar
M. Grammatikakis, M. Coppola, F. Sensini, "Software for Multiprocessor Networks-on-Chip," Networks on Chip, Kluwer Academic Publishers, pp. 281--303, 2003. Google ScholarDigital Library
E. Aarts, R. Roovers, "IC Design Challenges for Ambient Intelligence," Design, Automation and Test in Europe, pp. 3--7, 2003. Google ScholarDigital Library
L. Benini, M. Poncino, "Ambient Intelligence: A Computational Platform Perspective" in: Ambient Intelligence: Impact on Embedded System Design, T. Basten, M. Geilen, H. de Groot eds. Kluwer Academic Publishers, 2003. Google ScholarDigital Library
A. Macii, L. Benini, M. Poncino, Memory Design Techniques for Low-Energy Embedded Systems, Kluwer Academic Publishers, 2002.Google Scholar
C. Lin, L. Snyder, "A Comparison of Programming Models for Shared Memory Multiprocessors," International Conference on Parallel Processing, pp. 163--170, 1990.Google Scholar
T.J. LeBlanc, E.P. Markatos, "Shared memory vs. message passing in shared-memory multiprocessors," Symposium on Parallel and Distributed Processing, pp. 254--263, Dec. 1992.Google Scholar
A.C. Klaiber, H.M. Levy, "A Comparison of Message Passing and Shared Memory Architectures for Data Parallel Programs," ISCA'94: International Symposium on Computer Architecture, pp. 94--105, 1994. Google ScholarDigital Library
S. Chandra, J. R. Larus, A. Rogers, "Where is Time Spent in Message-Passing and Shared-Memory Programs?" ASPLOS'94: International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 61--73, 1994. Google ScholarDigital Library
S. Karlsson and M. Brorsson. "A comparative characterization of communication patterns in applications using MPI and shared memory on an IBM SPI," International.Workshop on Communication, Architecture, and Applications for Network-Based Parallel Computing, pp. 189--201, 1998. Google ScholarDigital Library
H. Shan, J.P. Singh, L. Oliker, R. Biswas, "Message passing vs. shared address space on a cluster of SMPs," International Parallel and Distributed Processing Symposium, April 2001. Google ScholarDigital Library
D.E. Culler, A. Gupta. J.P. Singh, Parallel Computer Architecture: A Hardware/Software Approach Morgan Kaufmann Publishers, 1997. Google ScholarDigital Library
M. Ekman, F. Dahlgren, P. Stenström "Evaluation of Snoop-Energy Reduction Techniques for Chip-Multiprocessors," Workshop on Duplicating, Deconstructing and Debunking - in conjunction with ISCA'02: International Symposium on Computer Architecture, May 2002. ISCA'02, May 2002. Google ScholarDigital Library
M. Ekman, F. Dahgren, P. Stenström, "TLB and Snoop Energy-Reduction Using Virtual Caches in Low-Power Chip-Multiprocessors," ISLPED'02, : International Symposium on Low Power Electronics and Design, August 2002, pp. 243--246. Google ScholarDigital Library
M. Loghi, M. Poncino, "Exploring Energy/Performance Tradeoffs in Shared Memory MPSoCs: Snoop-Based Cache Coherence vs. Software Solutions" DATE'05: Design, Automation and Test in Europe, to appear. Google ScholarDigital Library
P. Stenström, "A Survey of Cache Coherence Schemes for Multiprocessors," IEEE Computer, Vol. 23, No. 6, June 1990, pp. 12--24. Google ScholarDigital Library
M. Tomasevic, V. M. Milutinovic, "Hardware Approaches to Cache Coherence in Shared-Memory Multiprocessors," IEEE Micro, Vol. 14, No. 5-6, pp. 52--59, October/December 1994. Google ScholarDigital Library
I. Tartalja, V. M. Milutinovic, "Classifying Software-Based Cache Coherence Solutions," IEEE Software, Vol. 14, No. 3, pp. 90--101, March 1997. Google ScholarDigital Library
A. Moshovos, B. Falsafi, A. Choudhary, "JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers", HPCA'01 January 2001, pp. 85--97. Google ScholarDigital Library
C. Saldanha and M. Lipasti, "Power Efficient Cache Coherence", High Performance Memory Systems, Springer-Verlag, 2003, pp. 63--78. Google ScholarDigital Library
M. Loghi, F. Angiolini, D. Bertozzi, L. Benini, R. Zafalon, "Analyzing On-Chip Communication in a MPSoC Environment", DATE'04: Design, Automation and Test in Europe, February 2004, pp. 752--757. Google ScholarDigital Library
Software ARM, www.g141.com/projects/swarm.Google Scholar
ARM Ltd., www.arm.com/products/solutions/AMBAHomePage.htmlGoogle Scholar
RTEMS home page, www.rtems.com.Google Scholar
L. Benini et al. "A power modeling and estimation framework for VLIW-based embedded systems," PATMOS'01, October 2001, pp. 26--28.Google Scholar
M. Chinosi, R. Zafalon, C. Guardiani, "Automatic Characterization and Modeling of Power Consumption in Static RAMs," ISLPED'98, Aug. 1998, pp. 112--114. Google ScholarDigital Library
A. Bona, V. Zaccaria, R. Zafalon, "System-Level Power Modeling and Simulation of High-End Industrial Network-on-chip", DATE'04,pp. 318--323. Google ScholarDigital Library
J. P. Singh, W.-D. Weber, A. Gupta, "SPLASH: Stanford Parallel Applications for Shared-Memory", Computer Architecture News, Vol. 20, No. 1, pages 5--44, March 1992. Google ScholarDigital Library

Index Terms

Exploring the energy efficiency of cache coherence protocols in single-chip multi-processors
1. General and reference
  1. Cross-computing tools and techniques
    1. Measurement
    2. Metrics

Recommendations

Cache coherence tradeoffs in shared-memory MPSoCs

Shared memory is a common interprocessor communication paradigm for single-chip multiprocessor platforms. Snoop-based cache coherence is a very successful technique that provides a clean shared-memory programming abstraction in general-purpose chip ...
Read More
Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Locality has always been a critical factor in on-chip data placement on CMPs as accessing further-away caches has in the past been more costly than accessing nearby ones. Substantial research on locality-aware designs have thus focused on keeping a copy ...
Read More
Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs
ASPLOS '14

Locality has always been a critical factor in on-chip data placement on CMPs as accessing further-away caches has in the past been more costly than accessing nearby ones. Substantial research on locality-aware designs have thus focused on keeping a copy ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
GLSVLSI '05: Proceedings of the 15th ACM Great Lakes symposium on VLSI
April 2005
518 pages
ISBN:1595930574
DOI:10.1145/1057661
General Chair:
John Lach
University of Virginia
,
Program Chairs:
Gang Qu
University of Maryland, College Park
,
Yehea Ismail
Northwestern University
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 April 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
cache coherence
low power
multiprocessor
system-on-chip
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate312of1,156submissions,27%
Upcoming Conference
GLSVLSI '24

Sponsor:

sigda

Great Lakes Symposium on VLSI 2024

June 12 - 14, 2024

Clearwater , FL , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 792
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Exploring the energy efficiency of cache coherence protocols in single-chip multi-processors

GLSVLSI '05: Proceedings of the 15th ACM Great Lakes symposium on VLSI

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cache coherence tradeoffs in shared-memory MPSoCs

Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs

Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Exploring the energy efficiency of cache coherence protocols in single-chip multi-processors

GLSVLSI '05: Proceedings of the 15th ACM Great Lakes symposium on VLSI

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cache coherence tradeoffs in shared-memory MPSoCs

Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs

Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media