skip to main content
10.1145/1736020.1736044acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Conservation cores: reducing the energy of mature computations

Published: 13 March 2010 Publication History

Abstract

Growing transistor counts, limited power budgets, and the breakdown of voltage scaling are currently conspiring to create a utilization wall that limits the fraction of a chip that can run at full speed at one time. In this regime, specialized, energy-efficient processors can increase parallelism by reducing the per-computation power requirements and allowing more computations to execute under the same power budget. To pursue this goal, this paper introduces conservation cores. Conservation cores, or c-cores, are specialized processors that focus on reducing energy and energy-delay instead of increasing performance. This focus on energy makes c-cores an excellent match for many applications that would be poor candidates for hardware acceleration (e.g., irregular integer codes). We present a toolchain for automatically synthesizing c-cores from application source code and demonstrate that they can significantly reduce energy and energy-delay for a wide range of applications. The c-cores support patching, a form of targeted reconfigurability, that allows them to adapt to new versions of the software they target. Our results show that conservation cores can reduce energy consumption by up to 16.0x for functions and by up to 2.1x for whole applications, while patching can extend the useful lifetime of individual c-cores to match that of conventional processors.

References

[1]
S. Aditya, B. R. Rau, and V. Kathail. Automatic architectural synthesis of VLIW and EPIC processors. In ISSS '99: Proceedings of the 12th international symposium on System synthesis, page 107. IEEE Computer Society, 1999.
[2]
Ageia Technologies. PhysX by Ageia. http://www.ageia.com/pdf/ds\_product\_overview.pdf.
[3]
J. H. Ahn, W. J. Dally, B. Khailany, U. J. Kapasi, and A. Das. Evaluating the Imagine Stream Architecture. In ISCA'04: Proceedings of the 31st Annual International Symposium on Computer Architecture, pages 14--25. IEEE Computer Society, 2004.
[4]
ATI website. http://www.ati.com.
[5]
S. Balakrishnan, R. Rajwar, M. Upton, and K. Lai. The impact of performance asymmetry in emerging multicore architectures. In ISCA '05: Proceedings of the 32nd annual international symposium on Computer Architecture, pages 506--517, Washington, DC, USA, 2005. IEEE Computer Society.
[6]
N. Clark, J. Blome, M. Chu, S. Mahlke, S. Biles, and K. Flautner. An architecture framework for transparent instruction set customization in embedded processors. In ISCA'05: Proceedings of the 32nd Annual International Symposium on Computer Architecture, pages 272--283. IEEE Computer Society, 2005.
[7]
N. Clark, H. Zhong, K. Fan, S. Mahlke, K. Flautner, and K. V. Nieuwenhove. OptimoDE: Programmable accelerator engines through retargetable customization. In HotChips, 2004.
[8]
CodeSurfer by GrammaTech, Inc. http://www.grammatech.com/products/codesurfer/.
[9]
R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. An efficient method of computing static single assignment form. In POPL '89: Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 25--35. ACM Press, 1989.
[10]
W. J. Dally, F. Labonte, A. Das, P. Hanrahan, J.-H. Ahn, J. Gummaraju, M. Erez, N. Jayasena, I. Buck, T. J. Knight, and U. J. Kapasi. Merrimac: Supercomputing with streams. In SC '03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing, page 35. IEEE Computer Society, 2003.
[11]
R. Dennard, F. H. Gaensslen, V. L. Rideout, E. Bassous, and A. R. LeBlanc. Design of Ion-Implanted MOSFET's with Very Small Physical Dimensions. In IEEE Journal of Solid-State Circuits, October 1974.
[12]
C. Ebeling, D. C. Cronquist, and P. Franklin. RaPiD -- reconfigurable pipelined datapath. In FPL'96: Proceedings of the 6th International Workshop on Field-Programmable Logic, Smart Applications, New Paradigms and Compilers, pages 126--135. Springer-Verlag, 1996.
[13]
P. W. et al. Exochi: architecture and programming environment for a heterogeneous multi-core multithreaded system. In PLDI '07: Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pages 156--166, New York, NY, USA, 2007. ACM Press.
[14]
R. K. et al. Core architecture optimization for heterogeneous chip multiprocessors. In PACT'06: Proceedings of the 15th international conference on Parallel architectures and compilation techniques, pages 23--32, New York, NY, USA, 2006. ACM Press.
[15]
K. Fan, M. Kudlur, G. Dasika, and S. Mahlke. Bridging the computation gap between programmable processors and hardwired accelerators. In HPCA: High Performance Computer Architecture., pages 313--322, Feb. 2009.
[16]
S. C. Goldstein, H. Schmit, M. Moe, M. Budiu, S. Cadambi, R. R. Taylor, and R. Laufer. PipeRench: A Coprocessor for Streaming Multimedia Acceleration. In ISCA'99: Proceedings of the 26th Annual International Symposium on Computer Architecture, pages 28--39. IEEE Computer Society, 1999.
[17]
E. Grochowski, R. Ronen, J. Shen, and H. Wang. Best of both latency and throughput. In ICCD'04: Proceedings of the IEEE International Conference on Computer Design (ICCD'04), pages 236--243, Washington, DC, USA, 2004. IEEE Computer Society.
[18]
J. R. Hauser and J. Wawrzynek. Garp: A MIPS Processor with a Reconfigurable Coprocessor. In K. L. Pocek and J. Arnold, editors, FCCM'97: IEEE Symposium on FPGAs for Custom Computing Machines, pages 12--21. IEEE Computer Society Press, 1997.
[19]
M. Horowitz, E. Alon, D. Patil, S. Naffziger, R. Kumar, and K. Bernstein. Scaling, Power, and the Future of CMOS. In IEDM'05: IEEE International Electron Devices Meeting, 2005.
[20]
J. Kahle. The CELL processor architecture. In MICRO 38: Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, page 3. IEEE Computer Society, 2005.
[21]
J. S. Kim, M. B. Taylor, J. Miller, and D. Wentzlaff. Energy characterization of a tiled architecture processor with on-chip networks. In International Symposium on Low Power Electronics and Design, San Diego, CA, USA, August 2003.
[22]
R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, and K. I. Farkas. Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance. In ISCA'04: Proceedings of the 31st Annual International Symposium on Computer Architecture, page 64. IEEE Computer Society, 2004.
[23]
C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In CGO'04: Proceedings of the international symposium on Code generation and optimization, page 75. IEEE Computer Society, 2004.
[24]
J. Li and J. F. Martínez. Power-performance considerations of parallel computing on chip multiprocessors. ACM Trans. Archit. Code Optim., 2(4):397--422, 2005.
[25]
MAP-CA datasheet, June 2001. Equator Technologies.
[26]
MIPS Technologies. MIPS Technologies product page. http://www.mips.com/products/processors/32-64-bit-cores/mips32--24ke, 2008--2009.
[27]
M. Mishra, T. J. Callahan, T. Chelcea, G. Venkataramani, S. C. Goldstein, and M. Budiu. Tartan: evaluating spatial computation for whole program execution. SIGOPS Oper. Syst. Rev., 40(5):163--174, 2006.
[28]
nVidia website. http://www.nvidia.com.
[29]
OpenImpact Website. http://gelato.uiuc.edu/.
[30]
J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A. E. Lefohn, and T. J. Purcell. A survey of general-purpose computation on graphics hardware. In Eurographics 2005, State of the Art Reports, pages 21--51, August 2005.
[31]
D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick. A case for intelligent RAM. IEEE Micro, 17(2):34--44, April 1997.
[32]
TM1000 preliminary data book, 1997. http://www.semiconductors.philips.com/acrobat/other/tm1000.pdf.
[33]
R. Razdan and M. D. Smith. A high-performance microarchitecture with hardware-programmable functional units. In MICRO 27: Proceedings of the 27th annual international symposium on Microarchitecture, pages 172--180. ACM Press, 1994.
[34]
L. Strozek and D. Brooks. Efficient architectures through application clustering and architectural heterogeneity. In CASES '06: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, pages 190--200, New York, NY, USA, 2006. ACM Press.
[35]
S. Swanson, A. Schwerin, M. Mercaldi, A. Petersen, A. Putnam, K. Michelson, M. Oskin, and S. J. Eggers. The wavescalar architecture. ACM Trans. Comput. Syst., 25(2):4, 2007.
[36]
M. B. Taylor, W. Lee, J. Miller, D. Wentzlaff, I. Bratt, B. Greenwald, H. Hoffmann, P. Johnson, J. Kim, J. Psota, A. Saraf, N. Shnidman, V. Strumpen, M. Frank, S. Amarasinghe, and A. Agarwal. Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams. In ISCA '04: Proceedings of the 31st annual International Symposium on Computer Architecture, page 2. IEEE Computer Society, 2004.
[37]
S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi. Cacti 5.1. Technical Report HPL-2008-20, HP Labs, Palo Alto, 2008.
[38]
A. Wang, E. Killian, D. Maydan, and C. Rowen. Hardware/software instruction set configurability for system-on-chip processors. In DAC'01: Proceedings of the 38th conference on Design automation, pages 184--188. ACM Press, 2001.
[39]
L. Wu, C. Weaver, and T. Austin. Cryptomaniac: A fast flexible architecture for secure communication. In ISCA'01: Proceedings of the 28th Annual International Symposium on Computer Architecture, pages 110--119. ACM Press, 2001.
[40]
Z. A. Ye, A. Moshovos, S. Hauck, and P. Banerjee. CHIMAERA: A High-Performance Architecture with a Tightly-Coupled Reconfigurable Functional Unit. In ISCA'00: Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 225--235. ACM Press, 2000.
[41]
S. Yehia, S. Girbal, H. Berry, and O. Temam. Reconciling specialization and flexibility through compound circuits. In HPCA 15: High Performance Computer Architecture, pages 277--288, Feb. 2009.

Cited By

View all
  • (2024)MPCGPU: Real-Time Nonlinear Model Predictive Control through Preconditioned Conjugate Gradient on the GPU2024 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA57147.2024.10611212(9787-9794)Online publication date: 13-May-2024
  • (2024)RobotPerf: An Open-Source, Vendor-Agnostic, Benchmarking Suite for Evaluating Robotics Computing System Performance2024 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA57147.2024.10610841(8288-8297)Online publication date: 13-May-2024
  • (2024)Symmetric Stair Preconditioning of Linear Systems for Parallel Trajectory Optimization2024 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA57147.2024.10610386(9779-9786)Online publication date: 13-May-2024
  • Show More Cited By

Index Terms

  1. Conservation cores: reducing the energy of mature computations

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems
    March 2010
    422 pages
    ISBN:9781605588391
    DOI:10.1145/1736020
    • General Chair:
    • James C. Hoe,
    • Program Chair:
    • Vikram S. Adve
    • cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 45, Issue 3
      ASPLOS '10
      March 2010
      399 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/1735971
      Issue’s Table of Contents
    • cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 38, Issue 1
      ASPLOS '10
      March 2010
      399 pages
      ISSN:0163-5964
      DOI:10.1145/1735970
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 March 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. conservation core
    2. heterogeneous many-core
    3. patching
    4. utilization wall

    Qualifiers

    • Research-article

    Conference

    ASPLOS '10

    Acceptance Rates

    ASPLOS XV Paper Acceptance Rate 32 of 181 submissions, 18%;
    Overall Acceptance Rate 535 of 2,713 submissions, 20%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)132
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 22 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)MPCGPU: Real-Time Nonlinear Model Predictive Control through Preconditioned Conjugate Gradient on the GPU2024 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA57147.2024.10611212(9787-9794)Online publication date: 13-May-2024
    • (2024)RobotPerf: An Open-Source, Vendor-Agnostic, Benchmarking Suite for Evaluating Robotics Computing System Performance2024 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA57147.2024.10610841(8288-8297)Online publication date: 13-May-2024
    • (2024)Symmetric Stair Preconditioning of Linear Systems for Parallel Trajectory Optimization2024 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA57147.2024.10610386(9779-9786)Online publication date: 13-May-2024
    • (2024)Data Motion Acceleration: Chaining Cross-Domain Multi Accelerators2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00083(1043-1062)Online publication date: 2-Mar-2024
    • (2023)Spatula: A Hardware Accelerator for Sparse Matrix FactorizationProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623783(91-104)Online publication date: 28-Oct-2023
    • (2023)APEX: A Framework for Automated Processing Element Design Space Exploration using Frequent Subgraph AnalysisProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582070(33-45)Online publication date: 25-Mar-2023
    • (2023)MESA: Microarchitecture Extensions for Spatial Architecture GenerationProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589084(1-14)Online publication date: 17-Jun-2023
    • (2023)Supply Chain Aware Computer ArchitectureProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589052(1-15)Online publication date: 17-Jun-2023
    • (2023)Early DSE and Automatic Generation of Coarse-grained Merged AcceleratorsACM Transactions on Embedded Computing Systems10.1145/354607022:2(1-29)Online publication date: 24-Jan-2023
    • (2023)Program Balancing in Compilation for Buffered Hybrid Dataflow Processors2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC57700.2023.00018(57-66)Online publication date: Jun-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media