ABSTRACT
The need for power-efficiency is driving a rethink of design decisions in processor architectures. While vector processors succeeded in the high-performance market in the past, they need a re-tailoring for the mobile market that they are entering now. Floating point fused multiply-add, being a power consuming functional unit, deserves special attention. Although clock-gating is a well-known method to reduce switching power in synchronous designs, there are unexplored opportunities for its application to vector processors, especially when considering active operating mode. In this research, we comprehensively identify, propose, and evaluate the most suitable clock-gating techniques for vector fused multiply-add units (VFU). These techniques ensure power savings without jeopardizing the timing. Using vector masking and vector multi-lane-aware clock-gating, we report power reductions of up to 52%, assuming active VFU operating at the peak performance. Among other findings, we observe that vector instruction-based clock-gating techniques achieve power savings for all vector floating-point instructions. We perform this research in a fully parameterizable and automated fashion using various tools at both architectural and circuit levels.
- Berkeley hardware floating-point units. https://github.com/ucb-bar/berkeley-hardfloat/, 2015.Google Scholar
- Reference Manual for ARM Architecture - ARMv7-A. http://arm.com/, 2015.Google Scholar
- O. Arcas et al. An empirical evaluation of high-level synthesis languages and tools for database acceleration. In FPL, pages 1--8, 2014.Google Scholar
- K. Asanović. Vector microprocessor. PhD Thesis, 1998. Google ScholarDigital Library
- J. Bachrach et al. Chisel: constructing hardware in a scala embedded language. In DAG, pages 1216--1225, 2012. Google ScholarDigital Library
- M. Ekman and P. Stenstrom. A robust main-memory compression scheme. In ISGA, 2005.Google ScholarDigital Library
- M. Ercegovac and T. Lang. Digital Arithmetic. MKP, 2003.Google Scholar
- R. Espasa et al. Vector architectures: past, present and future. In ISC, 1998. Google ScholarDigital Library
- R. Espasa et al. Tarantula: a vector extension to the Alpha architecture. In ISCA, pages 281--292, 2002. Google ScholarDigital Library
- S. Galal et al. Fpu generator for design space exploration. In ARITH, pages 25--34, 2013. Google ScholarDigital Library
- K. R. Gandhi and N. R. Mahapatra. A study of hardware techniques that dynamically exploit frequent operands to reduce power consumption in integer function units. In ICCD, pages 426--428, 2003. Google ScholarDigital Library
- Y. Lee et al. Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators. In ISCA, 2011. Google ScholarDigital Library
- H. Li et al. Deterministic clock gating for microprocessor power reduction. HPCA, pages 113--, 2003. Google ScholarDigital Library
- N. Mohyuddin et al. Deterministic clock gating to eliminate wasteful activity due to wrong-path instructions in out-of-order superscalar processors. In ICCD, 2009. Google ScholarDigital Library
- B. Nikolic. Simpler, more efficient design. In ESSCIRC, 2015.Google ScholarCross Ref
- J. Preiss et al. Advanced clockgating schemes for fused-multiply-add-type floating-point units. In ARITH, pages 48--56, 2009. Google ScholarDigital Library
- I. Ratković et al. On the selection of adder unit in energy efficient vector processing. In ISQED, 2013.Google ScholarCross Ref
- I. Ratković et al. Joint circuit-system design space exploration of multiplier unit structure for energy-efficient vector processors. In ISVLSI, 2015.Google ScholarCross Ref
- I. Ratković et al. An overview of architecture-level power-and energy-efficient design techniques. Advances in Computers, 2015.Google Scholar
- M. Stanić et al. Valib and simplevector: tools for rapid initial research on vector architectures. In CF, page 7, 2014. Google ScholarDigital Library
- T. Xanthopoulos and A. P. Chandrakasan. A low-power idct macro-cell for mpeg-2 mp@ ml exploiting data distribution properties for minimal activity. IEEE JSSC, 34(5):693--703, 1999.Google ScholarCross Ref
- B. Zimmer et al. A risc-v vector processor with simultaneous-switching switched-capacitor dc--dc converters in 28 nm fdsoi. IEEE Journal of Solid-State Circuits, 51(4):930--942, 2016.Google ScholarCross Ref
Index Terms
- A Fully Parameterizable Low Power Design of Vector Fused Multiply-Add Using Active Clock-Gating Techniques
Recommendations
Decoupling Capacitance Design Strategies for Power Delivery Networks with Power Gating
Power gating is a widely used leakage power saving strategy in modern chip designs. However, power gating introduces unique power integrity issues and trade-offs between switching and rush current (wake-up) supply noises. At the same time, the amount of ...
Value-based clock gating and operation packing: dynamic strategies for improving processor power and performance
The large address space needs of many current applications have pushed processor designs toward 64-bit word widths. Although full 64-bit addresses and operations are indeed sometimes needed, arithmetic operations on much smaller quantities are still ...
Advanced Clockgating Schemes for Fused-Multiply-Add-Type Floating-Point Units
ARITH '09: Proceedings of the 2009 19th IEEE Symposium on Computer ArithmeticThe paper introduces fine-grain clockgating schemes for fused multiply-add-type floating-point units (FPU). The clockgating is based on instruction type, precision and operand values. The presented schemes focus on reducing the power at peak performance,...
Comments