research-article

Moonwalk: NRE Optimization in ASIC Clouds

Authors:
Moein Khazraee

University of California, San Diego, San Diego, CA, USA

University of California, San Diego, San Diego, CA, USA
View Profile

,
Lu Zhang

University of California, San Diego, San Diego, CA, USA

University of California, San Diego, San Diego, CA, USA
View Profile

,
Luis Vega

University of California, San Diego, San Diego, CA, USA

University of California, San Diego, San Diego, CA, USA
View Profile

,
Michael Bedford Taylor

University of California, San Diego, San Diego, CA, USA

University of California, San Diego, San Diego, CA, USA
View Profile

ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating SystemsApril 2017Pages 511–526https://doi.org/10.1145/3037697.3037749

Published:04 April 2017Publication History

ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 511–526

ABSTRACT

Cloud services are becoming increasingly globalized and data-center workloads are expanding exponentially. GPU and FPGA-based clouds have illustrated improvements in power and performance by accelerating compute-intensive workloads. ASIC-based clouds are a promising way to optimize the Total Cost of Ownership (TCO) of a given datacenter computation (e.g. YouTube transcoding) by reducing both energy consumption and marginal computation cost.

The feasibility of an ASIC Cloud for a particular application is directly gated by the ability to manage the Non-Recurring Engineering (NRE) costs of designing and fabricating the ASIC, so that it is significantly lower (e.g. 2X) than the TCO of the best available alternative.

In this paper, we show that technology node selection is a major tool for managing ASIC Cloud NRE, and allows the designer to trade off an accelerator's excess energy efficiency and cost performance for lower total cost.

We explore NRE and cross-technology optimization of ASIC Clouds for four different applications: Bitcoin mining, YouTube-style video transcoding, Litecoin, and Deep Learning. We address these challenges and show large reductions in the NRE, potentially enabling ASIC Clouds to address a wider variety of datacenter workloads. Our results suggest that advanced nodes like 16nm will lead to sub-optimal TCO for many workloads, and that use of older nodes like 65nm can enable a greater diversity of ASIC Clouds.

References

M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: a system for large-scale machine learning.In OSDI, 2016.Google ScholarDigital Library
M. Abdelfattah, A. Hagiescu, and D. Singh.Gzip on a chip: High performance lossless data compression on FPGAs using opencl.In International Workshop on OpenCL (IWOC, 2014.Google ScholarDigital Library
J. Ahn, S. Hong, S. Yoo, O. Mutlu, and K. Choi.A Scalable Processing-in-memory Accelerator for Parallel Graph Processing.In ISCA, 2015. Google ScholarDigital Library
J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. Jerger, and A. Moshovos.Cnvlutin: ineffectual-neuron-free deep neural network computing.In ISCA, 2016. Google ScholarDigital Library
K. Asanovic, R. Avizienis, J. Bachrach, S. Beamer, D. Biancolin, C. Celio, H. Cook, D. Dabbelt, J. Hauser, A. Izraelevitz, S. Karandikar, B. Keller, D. Kim, and J. Koenig.The Rocket Chip Generator.Technical Report No. UCB/EECS-2016--17, 2016.Google Scholar
J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Waterman, R. Avizienis, J. Wawrzynek, and K. Asanovic.Chisel: Constructing hardware in a Scala embedded language.In DAC, 2012.Google ScholarDigital Library
J. Balkind, M. McKeown, Y. Fu, T. Nguyen, Y. Zhou, A. Lavrov, M. Shahrad, A. Fuchs, S. Payne, X. Liang, M. Matl, and D. Wentzlaff.OpenPiton: An Open Source Manycore Research Framework.In ASPLOS, 2016.Google ScholarDigital Library
L. Barroso, J. Clidaras, and U. Holzle.\ The Datacenter As a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition. SynthesisLectures on Computer Architecture, 2013.Google Scholar
J. Beetem, M. Denneau, and D. Weingarten.The GF11 Supercomputer.In ISCA, 1985. Google ScholarDigital Library
M. Bojnordi, and E. Ipek.Memristive Boltzmann Machine: A Hardware Accelerator for Combinatorial Optimization and Deep Learning.In HPCA, 2016. Google ScholarCross Ref
I. Bolsens.2.5 D ICs: Just a Stepping Stone or a Long Term Alternative to 3D?. Keynote Talk at 3-D Architectures for Semiconductor Integration and Packaging Conference, 2011.Google Scholar
A. Caulfield, E. Chung, A. Putnam, H. Angepat, J. Fowers, M. Haselman, S. Heil, M. Humphrey, P. Kaur, J. Kim, D. Lo, T. Massengill, K. Ovtcharov, M. Papamichael, L. Woods, S. Lanka, D. Chiou, and D. BurgerA Cloud-Scale Acceleration Architecture.In MICRO, 2016.Google ScholarCross Ref
Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam.DaDianNao: A Machine-Learning Supercomputer.In MICRO, 2014. Google ScholarDigital Library
Y. Chen, J. Emer, and V. Sze.Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks.In ISCA, 2016. Google ScholarDigital Library
Q. Chen, H. Yang, J. Mars, and L. Tang.Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers.In ASPLOS, 2016.Google ScholarDigital Library
P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie.PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory.In ISCA, 2016. Google ScholarDigital Library
H. Esmaeilzadeh, E. Blem, R. Amant, K. Sankaralingam, and D. Burger.Dark Silicon and the End of Multicore Scaling.In ISCA, 2011. Google ScholarDigital Library
V. Gangadhar, R. Balasubramanian, M. Drumond, Z. Guo, J. Menon, C. Joseph, R. Prakash, S. Prasad, P. Vallathol, and K. Sankaralingam.MIAOW: An open source GPGPU.In IEEE Hot Chips 27 Symposium, 2015.Google Scholar
Glassdoor.Glassdoor salaries, 2016.https://www.glassdoor.comGoogle Scholar
V. Gogte, A. Kolli, M. Cafarella, L. D'Antoni, and T. Wenisch.HARE: Hardware accelerator for regular expressions.In MICRO, 2016.Google ScholarCross Ref
N. Goulding, J. Sampson, G. Venkatesh, S. Garcia, J. Auricchio, J. Babb, M. Taylor, and S. Swanson.GreenDroid: A mobile application processor for a future of dark silicon.In IEEE Hot Chips 22 Symposium, 2010. Google ScholarCross Ref
N. Goulding-Hotta, J. Sampson, G. Venkatesh, S. Garcia, J. Auricchio, P. Huang, M. Arora, S. Nath, V. Bhatt, J. Babb, S. Swanson, and M. Taylor.The GreenDroid Mobile Application Processor: An Architecture for Silicon's Dark Future.In IEEE MICRO, 2011.Google ScholarDigital Library
B. Gu, A. Yoon, D. Bae, I. Jo, J. Lee, J. Yoon, J. Kang, M. Kwon, C. Yoon, S. Cho, J. Jeong, and D. Chang.Biscuit: a framework for near-data processing of big data workloads.In ISCA, 2016. Google ScholarDigital Library
A. Gutierrez, M. Cieslak, B. Giridhar, R. G. Dreslinski, L. Ceze, and T. Mudge.Integrated 3D-stacked Server Designs for Increasing Physical Density of Key-value Stores.In ASPLOS, 2014.Google ScholarDigital Library
T. Ham, L. Wu, N. Sundaram, N. Satish, and M. Martonosi.Graphicionado: A High-Performance and Energy-Efficient Accelerator for Graph Analytics.In MICRO, 2016. Google ScholarCross Ref
R. Hameed, W. Qadeer, M. Wachs, O. Azizi, A. Solomatnikov, B. Lee, S. Richardson, C. Kozyrakis, and M. Horowitz.Understanding sources of inefficiency in general-purpose chips.In ISCA, 2012.Google Scholar
S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. Horowitz, and W. Dally. EIE: Efficient Inference Engine on Compressed Deep Neural Network.In ISCA, 2016.Google ScholarDigital Library
J. Hauswald, M. Laurenzano, Y. Zhang, C. Li, A. Rovinski, A. Khurana, R. Dreslinski, T. Mudge, V. Petrucci, L. Tang, and J. Mars.Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers.In ASPLOS, 2015.Google ScholarDigital Library
Y. Ji, Y. Zhang, S. Li, P. Chi, C. Jiang, P. Qu, Y. Xie, and W. ChenNEUTRAMS: Neural Network Transformation and Co-design under Neuromorphic Hardware Constraints.In MICRO, 2016.Google ScholarCross Ref
H. Jones.Strategies in Optimizing Market Positions for Semiconductor Vendors Based on IP Leverage.IBS White Paper, 2014.Google Scholar
C. Ju, T. Liu, K. Lee, Y. Chang, H. Chou, C. Wang, T. Wu, H. Lin, Y. Huang, C. Cheng, T. Lin, C. Chen, Y. Lin, M. Chiu, W. Li, S. Wang, Y. Lai, P. Chao, C. Chien, M. Hu, P. Wang, Y. Huang, S. Chuang, L. Chen, H. Lin, M. Wu, and C. Chen.A 0.5 nJ/Pixel 4 K H.265/HEVC Codec LSI for Multi-Format Smartphone Applications.In JSSC, 2016.Google Scholar
S. Jun, M. Liu, S. Lee, Hicks, Ankcorn, King, Myron, S. Xu, and Arvind.BlueDBM: An Appliance for Big Data Analytics.In ISCA, 2015.Google ScholarDigital Library
A. Kannan, N. Jerger, and G. Loh.Enabling Interposer-based Disintegration of Multi-core Processors.In MICRO, 2015. Google ScholarDigital Library
M. Kim, M. Mehrara, M. Oskin, and T. Austin.Architectural Implications of Brick and Mortar Silicon Manufacturing.In ISCA, 2007. Google ScholarDigital Library
D. Kim, J. Kung, S. Chai, S. Yalamanchili, and S. Mukhopadhyay.Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory.In ISCA, 2016.Google Scholar
O. Kocberber, B. Grot, J. Picorel, B. Falsafi, K. Lim, and P. Ranganathan.Meet the Walkers: Accelerating Index Traversals for In-memory Databases.In MICRO, 2013.Google ScholarDigital Library
K. Lim, D. Meisner, A. Saidi, P. Ranganathan, and T. Wenisch.Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached.In ISCA, 2013.Google Scholar
S. Liu, Z. Du, J. Tao, D. Han, T. Luo, Y. Xie, Y. Chen, and T. Chen.Cambricon: An Instruction Set Architecture for Neural Networks.In ISCA, 2016.Google Scholar
I. Magaki, M. Khazraee, L. Vega, M. B. Taylor.ASIC Clouds: Specializing the Datacenter.In ISCA, 2016.Google Scholar
M. Ozdal, S. Yesil, T. Kim, A. Ayupov, J. Greth, S. Burns, and O. Ozturk.Energy efficient architecture for graph analytics accelerators.In ISCA, 2016. Google ScholarDigital Library
A. Putnam, A. Caulfield, E. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. Gopal, J. Gray, M. Haselman, S. Hauck, S. Heil, A. Hormati, J. Kim, S. Lanka, J. Larus, E. Peterson, S. Pope, A. Smith, J. Thong, P. Xiao, and D. Burger.A Reconfigurable Fabric for Accelerating Large-scale Datacenter Services.In ISCA, 2014. Google ScholarCross Ref
W. Qadeer, R. Hameed, O. Shacham, P. Venkatesan, C. Kozyrakis, and M. Horowitz.Convolution engine: balancing efficiency and flexibility in specialized computing.In ISCA, 2013. Google ScholarDigital Library
B. Reagen, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. Lee, J. Hernández-Lobato, G. Wei, and D. Brooks.Minerva: enabling low-power, highly-accurate deep neural network accelerators.In ISCA, 2016. Google ScholarDigital Library
J. Sampson, G. Venkatesh, N. Goulding-Hotta, S. Garcia, S. Swanson and M. Taylor.Efficient Complex Operators for Irregular Codes.In HPCA, 2011. Google ScholarCross Ref
R. Sampson, M. Yang, S. Wei, C. Chakrabarti, and T. Wenisch.Sonic Millip3De: A Massively Parallel 3D-Stacked Accelerator for 3D Ultrasound.In HPCA, 2013.Google ScholarDigital Library
A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. Strachan, M. Hu, R. Williams, and V. Srikumar.ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars.In ISCA, 2016. Google ScholarDigital Library
Y. Shao, B. Reagen, G. Wei, and D. Brooks.Aladdin: a Pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures.In ISCA, 2014. Google ScholarDigital Library
D. Shaw, M. Deneroff, R. Dror, J. Kuskin, R. Larson, J. Salmon, C. Young, B. Batson, K. Bowers, J. Chao, M. Eastwood, J. Gagliardo, J. Grossman, C. Ho, D. Ierardi, I. Kolossváry, J. Klepeis, T. Layman, C. McLeavey, M. Moraes, R. Mueller, E. Priest, Y. Shan, J. Spengler, M. Theobald, B. Towles, and S. Wang.Anton, a Special-purpose Machine for Molecular Dynamics Simulation.In ISCA, 2007. Google ScholarDigital Library
A. Solomatnikov, A. Firoozshahian, W. Qadeer, O. Shacham, K. Kelley, Z. Asgar, M. Wachs, R. Hameed, and M. Horowitz.Chip Multi-processor Generator.In DAC, 2007. Google ScholarDigital Library
A. Pedram, S. Richardson, S. Galal, S. Kvatinsky, and M. Horowitz.Dark Memory and Accelerator-Rich System Optimization in the Dark Silicon Era.In IEEE Design Test, 2016.Google Scholar
P. Tandon, J. Chang, R. Dreslinski, V. Qazvinian, P. Ranganathan, and T. Wenisch.Hardware Acceleration for Similarity Measurement in Natural Language Processing.In ISLPED, 2013. Google ScholarCross Ref
M. Taylor.A Landscape of the New Dark Silicon Design Regime.In IEEE Micro, 2013. Google ScholarCross Ref
M. Taylor.Is Dark Silicon Useful? Harnessing the Four Horsemen of the Coming Dark Silicon Apocalypse.In DAC, 2012.Google ScholarDigital Library
M. Taylor.Bitcoin and the Age of Bespoke Silicon.In CASES, 2013.Google Scholar
G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo-Martinez, S. Swanson, and M. Taylor.Conservation cores: reducing the energy of mature computationsIn ASPLOS, 2010.Google Scholar
G. Venkatesh, J. Sampson, N. Goulding-Hotta, S. Kota Venkata, M. Taylor, and S. Swanson.QsCores: Configurable Co-processors to Trade Dark Silicon for Energy Efficiency in a Scalable Manner.In MICRO, 2011.Google Scholar
M. Wachs, O. Shacham, Z. Asgar, A. Firoozshahian, S. Richardson and M. Horowitz.Bringing up a chip on the cheap.\ IEEE Design Test of Computers, 2012. Google ScholarDigital Library
J. Wong, F. Kourshanfar and M. Potkonjak.Flexible ASIC: shared masking for multiple media processors.In DAC, 2005. Google ScholarDigital Library
K. Wu, and Y. Tsai.Structured ASIC, Evolution or Revolution?.In Proceedings of the International Symposium on Physical Design (ISPD), 2004. Google ScholarDigital Library
L. Wu, A. Lottarini, T. Paine, M. Kim, and K. Ross.Q100: The Architecture and Design of a Database Processing Unit.In ASPLOS, 2014.Google ScholarDigital Library
N. Xu, X. Cai, R. Gao, L. Zhang, and F. Hsu.FPGA Acceleration of RankBoost in Web Search Engines.In ACM Transactions on Reconfigurable Technology and Systems (TRETS), 2009. Google ScholarDigital Library
R. Yazdani, A. Segura, J. Arnau, and A. Gonzalez.An ultra low-power hardware accelerator for automatic speech recognition.In MICRO, 2016. Google ScholarCross Ref
B. Zahiri.Structured ASICs: opportunities and challenges.In Proceedings of the 21st International Conference on Computer Design (ICCD), 2003. Google ScholarCross Ref
S. Zhang, Z. Du, L. Zhang, H. Lan, S. Liu, L. Li, Q. Guo, T. Chen, and Y. Chen.Cambricon-X: An accelerator for sparse neural networks.In MICRO, 2016.Google ScholarDigital Library

Index Terms

Moonwalk: NRE Optimization in ASIC Clouds
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Cloud computing
2. Hardware

Recommendations

Moonwalk: NRE Optimization in ASIC Clouds
Asplos'17

Cloud services are becoming increasingly globalized and data-center workloads are expanding exponentially. GPU and FPGA-based clouds have illustrated improvements in power and performance by accelerating compute-intensive workloads. ASIC-based clouds ...
Read More
Moonwalk: NRE Optimization in ASIC Clouds
ASPLOS '17

Cloud services are becoming increasingly globalized and data-center workloads are expanding exponentially. GPU and FPGA-based clouds have illustrated improvements in power and performance by accelerating compute-intensive workloads. ASIC-based clouds ...
Read More
Extreme Datacenter Specialization for Planet-Scale Computing: ASIC Clouds
Special Topics

Planet-scale applications are driving the exponential growth of the cloud, and datacenter specialization is the key enabler of this trend, providing order of magnitudes improvements in cost-effectiveness and energy-efficiency. While exascale computing ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems
April 2017
856 pages
ISBN:9781450344654
DOI:10.1145/3037697
General Chairs:
Yunji Chen
Institute of Computing Technology, CAS, China
,
Olivier Temam
Google, USA
,
Program Chair:
John Carter
IBM, USA
ACM SIGARCH Computer Architecture News Volume 45, Issue 1
Asplos'17
March 2017
812 pages
ISSN:0163-5964
DOI:10.1145/3093337
Editor:
Babak Falsafi
Interim
Issue’s Table of Contents
ACM SIGPLAN Notices Volume 52, Issue 4
ASPLOS '17
April 2017
811 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3093336
Editor:
Matthew Fluet
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 April 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
accelerator
asic cloud
datacenter
nre
tco
Qualifiers
- research-article
Conference

Acceptance Rates
ASPLOS '17 Paper Acceptance Rate53of320submissions,17%Overall Acceptance Rate535of2,713submissions,20%
More
Upcoming Conference
ASPLOS '24

Sponsor:

sigarch

sigarch

sigarch

29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

April 27 - May 1, 2024

La Jolla , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 33
  Total Citations
  View Citations
- 902
  Total Downloads
- Downloads (Last 12 months)85
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Moonwalk: NRE Optimization in ASIC Clouds

ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Moonwalk: NRE Optimization in ASIC Clouds

Moonwalk: NRE Optimization in ASIC Clouds

Extreme Datacenter Specialization for Planet-Scale Computing: ASIC Clouds