ABSTRACT
The performance of the various cache coherence protocols proposed in the literature have been extensively analyzed in the context of high-performance multi-processor systems.A similar analysis for Multi-Processor Systems-on-Chips (MP-SoCs), where energy is at least as important as performace, and for which strict constraints on hardware and software resources do exist, has not been done yet.This work provides an effort in that sense, showing energy/performance tradeoffs for different snoop-based protocols on a realistic MPSoC architecture. The analysis leverage a multi-processor simulation platform, augmented with accurate power models, that allows cycle-accurate simulations.Our analysis show that (i) cache write policy is actually more important than the actual cache coherence protocol, and (ii) matching the programming model and style to the architecture may have dramatic effects on the energy and performance of the system.
- "Broadening the Reach of the Intel Itanium 2 Processor Family," Technical White Paper, www.intel.com/ebusiness/pdf/prod/itanium/wp reach.pdfGoogle Scholar
- M. Tremblay, J. Chen, S. Chaudry, A. Conigliaro, S.-S. Tse. "The MAJC Architecture: A Synthesis of Parallelism and Scalability,", IEEE Micro, Vol. 20, No. 6, Nov.-Dec. 2000, pp 12--25. Google ScholarDigital Library
- J.M. Tendler, J.S. Dodson, J.S. Fields Jr., H. Le, B. Sin-Haroy. "POWER4 System Microarchitecture," IBM Journal of Research and Development, Vol. 46, No. 1, January 2002. Google ScholarDigital Library
- P. Cumming "The TI OMAP Platform Approach to SoC," in Winning the SOC Revolution, Kluwer Academic Publishers, 2003.Google Scholar
- S. Richardson, "MPOC: A Chip Multiprocessor for Embedded Systems,", HP Technical Report, HPL-2002-186, July 2002.Google Scholar
- B. Ackland et al., "A Single Chip, 1.6 Billion, 16-b MAC/s Multiprocessor DSP," IEEE Journal of Solid State Circuits, Vol. 35, No. 3, March 2000.Google ScholarCross Ref
- Philips Semiconductor, "Philips Nexperia Platform", www.semiconductors.philips.com/products/nexperia/home S. Dutta, R. Jensen, A. Rieckmann.Google Scholar
- M. Grammatikakis, M. Coppola, F. Sensini, "Software for Multiprocessor Networks-on-Chip," Networks on Chip, Kluwer Academic Publishers, pp. 281--303, 2003. Google ScholarDigital Library
- E. Aarts, R. Roovers, "IC Design Challenges for Ambient Intelligence," Design, Automation and Test in Europe, pp. 3--7, 2003. Google ScholarDigital Library
- L. Benini, M. Poncino, "Ambient Intelligence: A Computational Platform Perspective" in: Ambient Intelligence: Impact on Embedded System Design, T. Basten, M. Geilen, H. de Groot eds. Kluwer Academic Publishers, 2003. Google ScholarDigital Library
- A. Macii, L. Benini, M. Poncino, Memory Design Techniques for Low-Energy Embedded Systems, Kluwer Academic Publishers, 2002.Google Scholar
- C. Lin, L. Snyder, "A Comparison of Programming Models for Shared Memory Multiprocessors," International Conference on Parallel Processing, pp. 163--170, 1990.Google Scholar
- T.J. LeBlanc, E.P. Markatos, "Shared memory vs. message passing in shared-memory multiprocessors," Symposium on Parallel and Distributed Processing, pp. 254--263, Dec. 1992.Google Scholar
- A.C. Klaiber, H.M. Levy, "A Comparison of Message Passing and Shared Memory Architectures for Data Parallel Programs," ISCA'94: International Symposium on Computer Architecture, pp. 94--105, 1994. Google ScholarDigital Library
- S. Chandra, J. R. Larus, A. Rogers, "Where is Time Spent in Message-Passing and Shared-Memory Programs?" ASPLOS'94: International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 61--73, 1994. Google ScholarDigital Library
- S. Karlsson and M. Brorsson. "A comparative characterization of communication patterns in applications using MPI and shared memory on an IBM SPI," International.Workshop on Communication, Architecture, and Applications for Network-Based Parallel Computing, pp. 189--201, 1998. Google ScholarDigital Library
- H. Shan, J.P. Singh, L. Oliker, R. Biswas, "Message passing vs. shared address space on a cluster of SMPs," International Parallel and Distributed Processing Symposium, April 2001. Google ScholarDigital Library
- D.E. Culler, A. Gupta. J.P. Singh, Parallel Computer Architecture: A Hardware/Software Approach Morgan Kaufmann Publishers, 1997. Google ScholarDigital Library
- M. Ekman, F. Dahlgren, P. Stenström "Evaluation of Snoop-Energy Reduction Techniques for Chip-Multiprocessors," Workshop on Duplicating, Deconstructing and Debunking - in conjunction with ISCA'02: International Symposium on Computer Architecture, May 2002. ISCA'02, May 2002. Google ScholarDigital Library
- M. Ekman, F. Dahgren, P. Stenström, "TLB and Snoop Energy-Reduction Using Virtual Caches in Low-Power Chip-Multiprocessors," ISLPED'02, : International Symposium on Low Power Electronics and Design, August 2002, pp. 243--246. Google ScholarDigital Library
- M. Loghi, M. Poncino, "Exploring Energy/Performance Tradeoffs in Shared Memory MPSoCs: Snoop-Based Cache Coherence vs. Software Solutions" DATE'05: Design, Automation and Test in Europe, to appear. Google ScholarDigital Library
- P. Stenström, "A Survey of Cache Coherence Schemes for Multiprocessors," IEEE Computer, Vol. 23, No. 6, June 1990, pp. 12--24. Google ScholarDigital Library
- M. Tomasevic, V. M. Milutinovic, "Hardware Approaches to Cache Coherence in Shared-Memory Multiprocessors," IEEE Micro, Vol. 14, No. 5-6, pp. 52--59, October/December 1994. Google ScholarDigital Library
- I. Tartalja, V. M. Milutinovic, "Classifying Software-Based Cache Coherence Solutions," IEEE Software, Vol. 14, No. 3, pp. 90--101, March 1997. Google ScholarDigital Library
- A. Moshovos, B. Falsafi, A. Choudhary, "JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers", HPCA'01 January 2001, pp. 85--97. Google ScholarDigital Library
- C. Saldanha and M. Lipasti, "Power Efficient Cache Coherence", High Performance Memory Systems, Springer-Verlag, 2003, pp. 63--78. Google ScholarDigital Library
- M. Loghi, F. Angiolini, D. Bertozzi, L. Benini, R. Zafalon, "Analyzing On-Chip Communication in a MPSoC Environment", DATE'04: Design, Automation and Test in Europe, February 2004, pp. 752--757. Google ScholarDigital Library
- Software ARM, www.g141.com/projects/swarm.Google Scholar
- ARM Ltd., www.arm.com/products/solutions/AMBAHomePage.htmlGoogle Scholar
- RTEMS home page, www.rtems.com.Google Scholar
- L. Benini et al. "A power modeling and estimation framework for VLIW-based embedded systems," PATMOS'01, October 2001, pp. 26--28.Google Scholar
- M. Chinosi, R. Zafalon, C. Guardiani, "Automatic Characterization and Modeling of Power Consumption in Static RAMs," ISLPED'98, Aug. 1998, pp. 112--114. Google ScholarDigital Library
- A. Bona, V. Zaccaria, R. Zafalon, "System-Level Power Modeling and Simulation of High-End Industrial Network-on-chip", DATE'04,pp. 318--323. Google ScholarDigital Library
- J. P. Singh, W.-D. Weber, A. Gupta, "SPLASH: Stanford Parallel Applications for Shared-Memory", Computer Architecture News, Vol. 20, No. 1, pages 5--44, March 1992. Google ScholarDigital Library
Index Terms
- Exploring the energy efficiency of cache coherence protocols in single-chip multi-processors
Recommendations
Cache coherence tradeoffs in shared-memory MPSoCs
Shared memory is a common interprocessor communication paradigm for single-chip multiprocessor platforms. Snoop-based cache coherence is a very successful technique that provides a clean shared-memory programming abstraction in general-purpose chip ...
Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systemsLocality has always been a critical factor in on-chip data placement on CMPs as accessing further-away caches has in the past been more costly than accessing nearby ones. Substantial research on locality-aware designs have thus focused on keeping a copy ...
Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs
ASPLOS '14Locality has always been a critical factor in on-chip data placement on CMPs as accessing further-away caches has in the past been more costly than accessing nearby ones. Substantial research on locality-aware designs have thus focused on keeping a copy ...
Comments