ABSTRACT
High-performance embedded processors are frequently designed as arrays of small, in-order scalar cores, even when their workloads exhibit high degrees of data-level parallelism (DLP). We show that these multiple instruction, multiple data (MIMD) systems can be made more efficient by instead directly exploiting DLP using a modern vector architecture. In our study, we compare arrays of scalar cores to vector machines of comparable silicon area and power consumption. Since vectors provide greater performance across the board - in some cases even with better programmability - we believe that embedded system designers should increasingly pursue vector architectures for machines at this scale.
- A case for os-friendly hardware accelerators. 7th Workshop on the Interaction between Operating System and Computer Architecture (WIVOSCA-2013), at the 40th International Symposium on Computer Architecture (ISCA-40), 2013.Google Scholar
- Krste Asanović, Rimas Avizienis, Jonathan Bachrach, Scott Beamer, David Biancolin, Christopher Celio, Henry Cook, Daniel Dabbelt, John Hauser, Adam Izraelevitz, Sagar Karandikar, Ben Keller, Donggyu Kim, John Koenig, Yunsup Lee, Eric Love, Martin Maas, Albert Magyar, Howard Mao, Miquel Moreto, Albert Ou, David A. Patterson, Brian Richards, Colin Schmidt, Stephen Twigg, Huy Vo, and Andrew Waterman. The rocket chip generator. Technical Report UCB/EECS-2016-17, EECS Department, University of California, Berkeley, Apr 2016.Google Scholar
- Jonathan Bachrach, Huy Vo, Brian Richards, Yunsup Lee, Andrew Waterman, Rimas Avižienis, John Wawrzynek, and Krste Asanović. Chisel: constructing hardware in a scala embedded language. In Proceedings of the 49th Annual Design Automation Conference, pages 1216--1225. ACM, 2012. Google ScholarDigital Library
- J. Balfour, W. Dally, D. Black-Schaffer, V. Parikh, and J. Park. An energy-efficient processor architecture for embedded systems. IEEE Computer Architecture Letters, 7(1):29--32, Jan 2008. Google ScholarDigital Library
- John H. Kelm, Daniel R. Johnson, Matthew R. Johnson, Neal C. Crago, William Tuohy, Aqeel Mahesri, Steven S. Lumetta, Matthew I. Frank, and Sanjay J. Patel. Rigel: An architecture and scalable programming interface for a 1000-core accelerator. In Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA '09, pages 140--151, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- C. E. Kozyrakis and D. A. Patterson. Scalable, vector processors for embedded systems. IEEE Micro, 23(6):36--45, Nov 2003. Google ScholarDigital Library
- Christoforos Kozyrakis. Scalable Vector Media-processors for Embedded Systems. PhD thesis, 2002. AAI3063439. Google ScholarDigital Library
- Y. Lee, A. Waterman, R. Avizienis, H. Cook, C. Sun, V. Stojanović, and K. Asanović. A 45nm 1.3ghz 16.7 double-precision gops/w risc-v processor with vector accelerators. In European Solid State Circuits Conference (ESSCIRC), ESSCIRC 2014 - 40th, pages 199--202, Sept 2014.Google ScholarCross Ref
- Yunsup Lee, Rimas Avizienis, Alex Bishara, Richard Xia, Derek Lockhart, Christopher Batten, and Krste Asanović. Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators. SIGARCH Comput. Archit. News, 39(3):129--140, June 2011. Google ScholarDigital Library
- Yunsup Lee, Albert Ou, Colin Schmidt, Sagar Karandikar, Howard Mao, and Krste Asanović. The hwacha microarchitecture manual, version 3.8.1. Technical Report UCB/EECS-2015-263, EECS Department, University of California, Berkeley, Dec 2015.Google Scholar
- Yunsup Lee, Colin Schmidt, Sagar Karandikar, Daniel Dabbelt, Albert Ou, and Krste Asanović. Hwacha preliminary evaluation results, version 3.8.1. Technical Report UCB/EECS-2015-264, EECS Department, University of California, Berkeley, Dec 2015.Google Scholar
- Yunsup Lee, Colin Schmidt, Albert Ou, Andrew Waterman, and Krste Asanović. The hwacha vector-fetch architecture manual, version 3.8.1. Technical Report UCB/EECS-2015-262, EECS Department, University of California, Berkeley, Dec 2015.Google Scholar
- Yunsup Lee, Brian Zimmer, Andrew Waterman, Alberto Puggelli, Jaehwa Kwak, Ruzica Jevtic, Ben Keller, Stevo Bailey, Milovan Blagojevic, Pi-Feng Chiu, Henry Cook, Rimas Avizienis, Brian Richards, Elad Alon, Borivoje Nikolic, and Krste Asanovic. Raven: A 28nm risc-v vector processor with integrated switched-capacitor dc-dc converters and adaptive clocking. HotChips, 2015.Google ScholarCross Ref
- Vector Processors for Energy-Efficient Embedded Systems
Recommendations
Control Speculation for Energy-Efficient Next-Generation Superscalar Processors
Conventional front-end designs attempt to maximize the number of "in-flight” instructions in the pipeline. However, branch mispredictions cause the processor to fetch useless instructions that are eventually squashed, increasing front-end energy and ...
Comments