ABSTRACT
Device level heterogeneity promises high energy efficiency over a larger range of voltages than a single device technology alone can provide. In this paper, starting from device models, we first present ground-up modeling of CMOS and TFET cores, and verify this model against existing processors. Using our core models, we construct a 32-core TFET-CMOS heterogeneous multicore. We then show that it is a very challenging task to identify the ideal runtime configuration to use in such a heterogeneous multicore, which includes finding the best number/type of cores to activate and the corresponding voltages/frequencies to select for these cores. In order to effectively utilize this heterogeneous processor, we propose a novel automated runtime scheme. Our scheme is designed to automatically improve the performance of applications running on heterogeneous CMOS-TFET multicores operating under a fixed power budget, without requiring any effort from the application programmer or the user. Our scheme combines heterogeneous thread-to-core mapping, dynamic work partitioning, and dynamic power partitioning to identify energy efficient operating points. With simulations we show that our runtime scheme can enable a CMOS-TFET multicore to serve a diversity of workloads with high energy efficiency and achieve 21% average speedup over the best performing equivalent homogeneous multicore.
- S. Borkar. Thousand Core Chips: A Technology Perspective. In DAC, 2007. Google ScholarDigital Library
- H. Wei et al. Scaling with Design Constraints: Predicting the Future of Big Chips. IEEE Micro, 2011. Google ScholarDigital Library
- D.K Mohata et al. Demonstration of MOSFET-Like On-Current Performance in Arsenide/Antimonide Tunnel FETs with Staggered Hetero-junctions for 300mV Logic Applications. In IEDM, 2011.Google Scholar
- V. Saripalli et al. An Energy-Efficient Heterogeneous CMP based on Hybrid TFET-CMOS cores. In DAC, 2011. Google ScholarDigital Library
- International Technology Roadmap for Semiconductors. 2011.Google Scholar
- B. Wheeler. Calxeda Spins 4W Server-on-a-Chip. In Microprocessor Report, Nov 2011.Google Scholar
- V. Aslot et al. SPECOMP: A New Benchmark Suite for Measuring Parallel Computer Performance. In International Workshop on OpenMP Applications and Tools, 2001. Google ScholarDigital Library
- S. Mookerjea et al. Experimental Demonstration of 100nm Channel Length In0.53Ga0.47As-based Vertical Inter-band Tunnel Field Effect Transistors (TFETs) for Ultra Low-Power Logic and SRAM Applications. In IEDM, 2009.Google Scholar
- R. Gandhi et al. Vertical Si-Nanowire n -Type Tunneling FETs With Low Subthreshold Swing ( ≤ 50mV/decade ) at Room Temperature. IEDM, 2011.Google Scholar
- D.K. Mohata et al. Self-aligned Gate NanoPillar In0.53Ga0.47As Vertical Tunnel Transistor. In DRC, 2011.Google Scholar
- M. Luisier and G. Klimeck. Performance Comparisons of Tunneling Field-Effect Transistors Made of InSb, Carbon, and GaSb-InAs Broken Gap Heterostructures. In IEDM, 2009.Google ScholarCross Ref
- U. E. Avci et al. Comparison of Performance, Switching Energy and Process Variations for the TFET and MOSFET in Logic. In VLSIT, 2011.Google Scholar
- Intel Corporation. Intel 22nm 3-D Tri-Gate Transistor Technology, May 2011.Google Scholar
- M. LaPedus. TSMC to make FinFETs in 450-mm fab, February 2011.Google Scholar
- C.C. Wu et al. High Performance 22/20nm FinFET CMOS Devices with Advanced high-K/metal Gate Scheme. In IEDM, 2010.Google ScholarCross Ref
- Synopsys. TCAD Sentaurus Device Manual, Release: C-2009.06, 2009.Google Scholar
- Wei Zhao and Yu Cao. New Generation of Predictive Technology Model for Sub-45nm Design Exploration. In ISQED, 2006. Google ScholarDigital Library
- Intel Corporation. Intel Atom Processor Z5xx Series - Datasheet, June 2010.Google Scholar
- Xilinx. Xilinx Power Tools Tutorial: Spartan and Virtex 6 FPGAs.Google Scholar
- A. Sinha and A.P. Chandrakasan. JouleTrack-a Web based tool for software energy profiling. In DAC, 2001. Google ScholarDigital Library
- S. Datta et al. Ultrahigh-Speed 0.5 V Supply Voltage In0.7Ga0.3As Quantum-Well Transistors on Silicon Substrate. IEEE Electron Device Letters, 28(8):685 --687, Aug. 2007.Google ScholarCross Ref
- V. Saripalli et al. Variation-tolerant Ultra Low-Power Heterojunction Tunnel FET SRAM Design. Nanoscale Architectures, 2011. Google ScholarDigital Library
- C. D. Polychronopoulos and D. J. Kuck. Guided self-scheduling: A practical scheduling scheme for parallel supercomputers. IEEE Trans. Comput., 36(12), December 1987. Google ScholarDigital Library
- P. S. Magnusson et al. Simics: A Full System Simulation Platform. Computer, 35, February 2002. Google ScholarDigital Library
- W. Kim et al. System Level Analysis of Fast, Per-Core DVFS using On-Chip Switching Regulators. In ISCA, 2008.Google Scholar
- OpenMP. OpenMP, http://www.openmp.org.Google Scholar
- R. Kumar et al. Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction. In MICRO, 2003. Google ScholarDigital Library
- T. Y. Morad et al. Performance, Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors. IEEE Comput. Archit. Lett., 5, January 2006. Google ScholarDigital Library
- E. Ipek et al. Core Fusion: Accommodating Software Diversity in Chip Multiprocessors. In ISCA, 2007. Google ScholarDigital Library
- M. A. Suleman et al. Accelerating Critical Section Execution with Asymmetric Multi-core Architectures. In ASPLOS, 2009. Google ScholarDigital Library
- E. S. Chung et al. Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? In MICRO, 2010. Google ScholarDigital Library
- Ganesh Venkatesh et al. Conservation Cores: Reducing the Energy of Mature Computations. In ASPLOS, 2010. Google ScholarDigital Library
- E. Humenay et al. Impact of Process Variations on Multicore Performance Symmetry. In DATE, 2007. Google ScholarDigital Library
- R. Teodorescu and J. Torrellas. Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors. In ISCA, 2008. Google ScholarDigital Library
- U. R. Karpuzcu et al. The BubbleWrap many-core: popping cores for sequential acceleration. In MICRO, 2009. Google ScholarDigital Library
- K. Swaminathan et al. Improving Energy Efficiency of Multi-Threaded Applications using Heterogeneous CMOS-TFET Multicores. In ISLPED, 2011. Google ScholarDigital Library
- S. Balakrishnan et al. The Impact of Performance Asymmetry in Emerging Multicore Architectures. In ISCA, 2005. Google ScholarDigital Library
- M. Bhadauria et al. Accomodating Diversity in CMPs with Heterogeneous Frequencies. In HiPEAC, 2009. Google ScholarDigital Library
- C-K. Luk et al. Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors with Adaptive Mapping. In MICRO, 2009. Google ScholarDigital Library
- R. Kumar et al. Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance. In ISCA, 2004. Google ScholarDigital Library
- M. D. Hill and M. R. Marty. Amdahl's Law in the Multicore Era. Computer, 41, July 2008. Google ScholarDigital Library
Index Terms
- Performance enhancement under power constraints using heterogeneous CMOS-TFET multicores
Recommendations
Improving energy efficiency of multi-threaded applications using heterogeneous CMOS-TFET multicores
ISLPED '11: Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and designEnergy-Delay-Product-aware DVFS is a widely-used technique that improves energy efficiency by dynamically adjusting the frequencies of cores. Further, for multithreaded applications, barrier-aware DVFS is a method that can dynamically tune the ...
Heterogeneous parallel_for Template for CPU---GPU Chips
Heterogeneous processors, comprising CPU cores and a GPU, are the de facto standard in desktop and mobile platforms. In many cases it is worthwhile to exploit both the CPU and GPU simultaneously. However, the workload distribution poses a challenge when ...
Steep-Slope Devices: From Dark to Dim Silicon
Although the superior subthreshold characteristics of steep-slope devices can help power up more cores, researchers still need CMOS technology to accelerate sequential applications, because it can reach higher frequencies. Device-level heterogeneous ...
Comments