Abstract
Planet-scale applications are driving the exponential growth of the cloud, and datacenter specialization is the key enabler of this trend, providing order of magnitudes improvements in cost-effectiveness and energy-efficiency. While exascale computing remains a goal for supercomputing, specialized datacenters have emerged and have demonstrated beyond-exascale performance and efficiency in specific domains. This paper generalizes the applications, design methodology, and deployment challenges of the most extreme form of specialized datacenter: ASIC Clouds. It analyzes two game-changing, real-world ASIC Clouds-Bitcoin Cryptocurrency Clouds and Tensor Processing Clouds-discuss their incentives, the empowering technologies and how they benefit from the specialized ASICs. Their business models, architectures and deployment methods are useful for envisioning future potential ASIC Clouds and forecasting how they will transform computing, the economy and society.
- May 8, 2016. ASIC Clouds: Specializing the Datacenter . https://csetechrep.ucsd. edu/Dienst/UI/2.0/Describe/ncstrl.ucsd_cse/CS2016-1016.Google Scholar
- Retrieved 2016. Glassdoor salaries, 2016. https://www.glassdoor.com.Google Scholar
- Retrieved Jun, 2018. Accelerate Genomics Research with the Broad-Intel Genomics Stack. https://www.intel. com/content/dam/www/public/us/en/documents/white-papers/ accelerate-genomics-research-with-the-broad-intel-genomics-stack-paper. pdf.Google Scholar
- Retrieved Jun, 2018. Amazon EC2. https://aws.amazon.com/ec2/.Google Scholar
- Retrieved Jun, 2018. DRAGEN Bio-IT Platform. http://edicogenome.com/ dragen-bioit-platform/.Google Scholar
- Retrieved Jun, 2018. Ethereum Miner pool. https://ethermine.org.Google Scholar
- Retrieved Jun, 2018. Falcon Accelerated Genomics Pipelines. https://aws.amazon. com/marketplace/pp/B07C3NV88G.Google Scholar
- Retrieved Jun, 2018. Litecoin Miner pool. https://www.ltcminer.com.Google Scholar
- Retrieved Jun, 2018. Microsoft Genomics Acceleration. https://www.microsoft. com/en-us/research/project/genomicsacceleration/.Google Scholar
- Retrieved Jun, 2018. OpenCL miner for BitCoin. https://github.com/Diablo-D3/ DiabloMiner/blob/master/src/main/resources/DiabloMiner.cl.Google Scholar
- Retrieved Jun, 2018. Tensorflow CNN Benchmarks. https://github.com/ tensorflow/benchmarks/tree/a03070c016ab33f491ea7962765e378000490d99/ scripts/tf_cnn_benchmarks.Google Scholar
- Junwhan Ahn et al. 2015. A scalable processing-in-memory accelerator for parallel graph processing.Google Scholar
- Jorge Albericio et al. 2016. Cnvlutin: Ineffectual-neuron-free deep neural network computing. In International Symposium on Computer Architecture (ISCA). Google ScholarDigital Library
- XenParavirtOps. https://wiki.xenproject.org/wiki/XenParavirtOps, 2016.Google Scholar
- Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle. 2013. The datacenter as a computer: An introduction to the design of warehouse-scale machines. Synthesis lectures on computer architecture (2013).Google Scholar
- John Beetem et al. 1985. The GF11 Supercomputer. In International Symposium on Computer Architecture (ISCA). Google ScholarDigital Library
- Mahdi Nazm Bojnordi et al. 2016. Memristive boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning. In International Symposium on High Performance Computer Architecture (HPCA).Google Scholar
- J. Adam Butts et al. 2014. The ANTON 2 chip a second-generation ASIC for molecular dynamics. In Hot Chips: A Symposium on High Performance Chips (HOTCHIPS).Google Scholar
- Yunji Chen et al. 2014. DaDianNao: A Machine-Learning Supercomputer. In International Symposium on Microarchitecture (MICRO). Google ScholarDigital Library
- Yu-Hsin Chen et al. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In International Symposium on Computer Architecture (ISCA). Google ScholarDigital Library
- Ping Chi et al. 2016. PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In International Symposium on Computer Architecture (ISCA). Google ScholarDigital Library
- Eric Chung et al. Mar 2018. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave. IEEE Micro (Mar 2018).Google Scholar
- MartinMDeneroff et al. 2008. Anton: A specialized ASIC for molecular dynamics. In Hot Chips: A Symposium on High Performance Chips (HOTCHIPS).Google Scholar
- Daichi Fuijiki et al. 2018. GenAx: A Genome Sequencing Accelerator. In International Symposium on Computer Architecture (ISCA).Google Scholar
- Boncheol Gu et al. 2016. Biscuit: A framework for near-data processing of big data workloads. In International Symposium on Computer Architecture (ISCA). Google ScholarDigital Library
- Anthony Gutierrez et al. 2014. Integrated 3D-stacked Server Designs for Increasing Physical Density of Key-value Stores. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google ScholarDigital Library
- Tae Jun Ham et al. 2016. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In International Symposium on Microarchitecture (MICRO). Google ScholarDigital Library
- Song Han et al. 2016. EIE: efficient inference engine on compressed deep neural network. In International Symposium on Computer Architecture (ISCA). Google ScholarDigital Library
- Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. 2011. Toward dark silicon in servers. IEEE Micro (2011). Google ScholarDigital Library
- Elmar Haubmann. Retrieved Jun, 2018. Comparing Google's TPUv2 against Nvidia's V100 on ResNet-50. https://blog.riseml.com/ comparing-google-tpuv2-against-nvidia-v100-on-resnet-50-c2bbb6a51e5e.Google Scholar
- Yu Ji et al. 2016. NEUTRAMS: Neural network transformation and co-design under neuromorphic hardware constraints. In International Symposium on Microarchitecture (MICRO). Google ScholarDigital Library
- H Jones. 2014. Whitepaper: strategies in optimizing market positions for semiconductor vendors based on IP leverage. International Business Strategies. Inc.(IBS). Google Scholar (2014).Google Scholar
- Norman P. Jouppi et al. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In International Symposium on Computer Architecture (ISCA). Google ScholarDigital Library
- Chi-Cheng Ju et al. 2015. 18.6 A 0.5 nJ/pixel 4K H. 265/HEVC codec LSI for multiformat smartphone applications. In International Solid-State Circuits Conference (ISSCC).Google Scholar
- Moein Khazraee et al. 2017. Moonwalk: NRE Optimization in ASIC Clouds. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google ScholarDigital Library
- Moein Khazraee, Luis Vega, Ikuo Magaki, and Michael Taylor. 2017. Specializing a Planet's Computation: ASIC Clouds. IEEE Micro (May 2017).Google Scholar
- Duckhwan Kim et al. 2016. Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory. In International Symposium on Computer Architecture (ISCA). Google ScholarDigital Library
- Onur Kocberber et al. 2013. Meet the walkers: Accelerating index traversals for in-memory databases. In International Symposium on Microarchitecture (MICRO). Google ScholarDigital Library
- Alex Krizhevsky et al. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. Google ScholarDigital Library
- Christian Leber et al. 2011. High frequency trading acceleration using FPGAs. In Field Programmable Logic and Applications (FPL). Google ScholarDigital Library
- Kevin Lim et al. 2013. Thin servers with smart pipes: designing SoC accelerators for memcached. In International Symposium on Computer Architecture (ISCA). Google ScholarDigital Library
- Shaoli Liu et al. 2016. Cambricon: An instruction set architecture for neural networks. In International Symposium on Computer Architecture (ISCA). Google ScholarDigital Library
- Ikuo Magaki et al. 2016. ASIC Clouds: Specializing the Datacenter. In International Symposium on Computer Architecture (ISCA). Google ScholarDigital Library
- Junichiro Makino et al. 2012. GRAPE-8-An accelerator for gravitational N-body simulation with 20.5 Gflops/W performance. In High Performance Computing, Networking, Storage and Analysis (SC). Google ScholarDigital Library
- Satoshi Nakamoto. 2008. Bitcoin: A peer-to-peer electronic cash system. (2008).Google Scholar
- Courtois Nicolas et al. 2014. Optimizing sha256 in bitcoin mining. In International Conference on Cryptography and Security Systems (CCS).Google Scholar
- Muhammet Mustafa Ozdal et al. 2016. Energy efficient architecture for graph analytics accelerators. In International Symposium on Computer Architecture (ISCA). Google ScholarDigital Library
- A. Pedram et al. 2016. Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon Era. IEEE Design and Test (2016).Google Scholar
- Putnam et al. 2014. A Reconfigurable Fabric for Accelerating Large-scale Datacenter Services. In International Symposium on Computer Architecture (ISCA). Google ScholarDigital Library
- Brandon Reagen et al. 2016. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In International Symposium on Computer Architecture (ISCA) Google ScholarDigital Library
- Ali Shafiee et al. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. International Symposium on Computer Architecture (ISCA). Google ScholarDigital Library
- Stephen Weston. 2011. FPGA Accelerators at JP Morgan Chase. Stanford Computer Systems Colloquium, https://www.youtube.com/watch?v=9NqX1ETADn0.Google Scholar
- Michael Taylor. 2013. Bitcoin and the Age of Bespoke Silicon. In International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES). Google ScholarDigital Library
- Michael Taylor. 2013. A Landscape of the New Dark Silicon Design Regime. Micro, IEEE (Sept-Oct. 2013). Google ScholarDigital Library
- Michael B. Taylor. 2012. Is Dark Silicon Useful? Harnessing the Four Horesemen of the Coming Dark Silicon Apocalypse. In DAC. Google ScholarDigital Library
- Michael Bedford Taylor. 2017. The Evolution of Bitcoin Hardware. Computer 50, 9 (2017), 58-66Google ScholarDigital Library
- Paul Teich. Retrieved Jun, 2018. TEARING APART GOOGLE'S TPU 3.0 AI COPROCESSOR. https://www.nextplatform.com/2018/05/10/ tearing-apart-googles-tpu-3-0-ai-coprocessor/.Google Scholar
- Yatish Turakhia et al. 2017. Darwin: A Hardware-acceleration Framework for Genomic Sequence Alignment. bioRxiv (2017).Google Scholar
- Ganesh Venkatesh et al. 2010. Conservation cores: reducing the energy of mature computations. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google ScholarDigital Library
- Shijin Zhang et al. 2016. Cambricon-X: An accelerator for sparse neural networks. In International Symposium on Microarchitecture (MICRO). Google ScholarDigital Library
Recommendations
Moonwalk: NRE Optimization in ASIC Clouds
Asplos'17Cloud services are becoming increasingly globalized and data-center workloads are expanding exponentially. GPU and FPGA-based clouds have illustrated improvements in power and performance by accelerating compute-intensive workloads. ASIC-based clouds ...
Moonwalk: NRE Optimization in ASIC Clouds
ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating SystemsCloud services are becoming increasingly globalized and data-center workloads are expanding exponentially. GPU and FPGA-based clouds have illustrated improvements in power and performance by accelerating compute-intensive workloads. ASIC-based clouds ...
Moonwalk: NRE Optimization in ASIC Clouds
ASPLOS '17Cloud services are becoming increasingly globalized and data-center workloads are expanding exponentially. GPU and FPGA-based clouds have illustrated improvements in power and performance by accelerating compute-intensive workloads. ASIC-based clouds ...
Comments