research-article

DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers

Authors:

Johann Hauswald,

Michael A. Laurenzano,

Ronald G. Dreslinski,

Lingjia TangAuthors Info & Claims

ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture

Pages 27 - 40

https://doi.org/10.1145/2749469.2749472

Published: 13 June 2015 Publication History

Abstract

As applications such as Apple Siri, Google Now, Microsoft Cortana, and Amazon Echo continue to gain traction, web-service companies are adopting large deep neural networks (DNN) for machine learning challenges such as image processing, speech recognition, natural language processing, among others. A number of open questions arise as to the design of a server platform specialized for DNN and how modern warehouse scale computers (WSCs) should be outfitted to provide DNN as a service for these applications.

In this paper, we present DjiNN, an open infrastructure for DNN as a service in WSCs, and Tonic Suite, a suite of 7 end-to-end applications that span image, speech, and language processing. We use DjiNN to design a high throughput DNN system based on massive GPU server designs and provide insights as to the varying characteristics across applications. After studying the throughput, bandwidth, and power properties of DjiNN and Tonic Suite, we investigate several design points for future WSC architectures. We investigate the total cost of ownership implications of having a WSC with a disaggregated GPU pool versus a WSC composed of homogeneous integrated GPU servers. We improve DNN throughput by over 120x for all but one application (40x for Facial Recognition) on an NVIDIA K40 GPU. On a GPU server composed of 8 NVIDIA K40s, we achieve near-linear scaling (around 1000x throughput improvement) for 3 of the 7 applications. Through our analysis, we also find that GPU-enabled WSCs improve total cost of ownership over CPU-only designs by 4-20x, depending on the composition of the workload

References

[1]

"Cuda toolkit documentation," http://docs.nvidia.com/cuda/profiler-users-guide/.

[2]

"DjiNN and Tonic: DNN as a Service," http://djinn.clarity-lab.org.

[3]

"Facebook's quest to build an artificial brain depends on this guy," www.wired.com/2014/08/deep-learning-yann-lecun.

[4]

"Google Glass," www.google.com/glass/start.

[5]

"Inside the artificial brain that's remaking the google empire," www.wired.com/2014/07/google_brain/.

[6]

"Maple 2015. maplesoft, a division of waterloo maple inc., waterloo, ontario." http://www.maplesoft.com/.

[7]

"Microsoft corp to challenge apple inc with siri alternative: More intelligent and fast enough!" www.dazeinfo.com/2013/06/18/microsoft-corp-to-challenge-apple-inc-with-siri-alternative/-more-intelligent-and-fast-enough.

[8]

"Multi-process service," https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf.

[9]

"Nvidia visual profiler," https://developer.nvidia.com/NVIDIA-visual-profiler.

[10]

"Apple's Massive New Data Center Set To Host Nuance Tech," http://techcrunch.com/2011/05/09/apple-nuance-data-center-deal/, 2011.

[11]

L. A. Barroso, J. Clidaras, and U. Hölzle, "The datacenter as a computer: an introduction to the design of warehouse-scale machines," Synthesis Lectures on Computer Architecture, 2013.

Digital Library

[12]

F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. Goodfellow, A. Bergeron, N. Bouchard, D. Warde-Farley, and Y. Bengio, "Theano: new features and speed improvements," Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop, 2012.

[13]

B. C. Becker and E. G. Ortiz, "Evaluating open-universe face identification on the web," in Computer Vision and Pattern Recognition Workshops (CVPRW), 2013.

Digital Library

[14]

T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, "Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning," Architectual Support for Programming Languages and Operating Systems(ASPLOS), 2014.

Digital Library

[15]

Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun et al., "Dadiannao: A machine-learning supercomputer," in Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on. IEEE, 2014, pp. 609--622.

Digital Library

[16]

S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer, "cudnn: Efficient primitives for deep learning," arXiv preprint arXiv:1410.0759, 2014.

[17]

T. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman, "Project adam: building an efficient and scalable deep learning training system," in Operating Systems Design and Implementation(OSDI), 2014.

Digital Library

[18]

A. Coates, B. Huval, T. Wang, D. Wu, B. Catanzaro, and N. Andrew, "Deep learning with cots hpc systems," in International Conference on Machine Learning(ICML), 2013.

[19]

R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, "Natural language processing (almost) from scratch," The Journal of Machine Learning Research, 2011.

Digital Library

[20]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in Computer Vision and Pattern Recognition (CVPR), 2009.

[21]

I. J. Goodfellow, D. Warde-Farley, P. Lamblin, V. Dumoulin, M. Mirza, R. Pascanu, J. Bergstra, F. Bastien, and Y. Bengio, "Pylearn2: a machine learning research library," arXiv preprint arXiv:1308.4214, 2013.

[22]

A. Graves, A.-R. Mohamed, and G. Hinton, "Speech recognition with deep recurrent neural networks," in International Conference on Acoustics, Speech and Signal Processing(ICASSp), 2013.

[23]

J. Hauswald, T. Manville, Q. Zheng, R. Dreslinski, C. Chakrabarti, and T. Mudge, "A hybrid approach to offloading mobile image classification," in International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014.

[24]

J. Hauswald, M. A. Laurenzano, Y. Zhang, C. Li, A. Rovinski, A. Khurana, R. G. Dreslinski, T. Mudge, V. Petrucci, L. Tang, and J. Mars, "Sirius: An open end-to-end voice and vision personal assistant and its implications for future warehouse scale computers," in International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2015.

Digital Library

[25]

C.-H. Hsu, Y. Zhang, M. A. Laurenzano, D. Meisner, T. Wenisch, L. Tang, J. Mars, and R. Dreslinski, "Adrenaline: Pinpointing and reigning in tail queries with quick voltage boosting," in International Symposium on High Performance Computer Architecture (HPCA), 2015.

[26]

X. Huang, J. Baker, and R. Reddy, "A historical perspective of speech recognition," Commun. ACM, 2014.

Digital Library

[27]

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, "Caffe: Convolutional architecture for fast feature embedding," arXiv preprint arXiv:1408.5093, 2014.

[28]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012.

Digital Library

[29]

N. Kurd, J. Douglas, P. Mosalikanti, and R. Kumar, "Next generation intel® micro-architecture (nehalem) clocking architecture," in VLSI Circuits, 2008 IEEE Symposium on. IEEE, 2008, pp. 62--63.

[30]

M. Laurenzano, Y. Zhang, L. Tang, and J. Mars, "Protean code: Achieving near-free online code transformations for warehouse scale computers," in International Symposium on Microarchitecture (MICRO), 2014.

Digital Library

[31]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, 1998.

[32]

D. Liu, T. Chen, S. Liu, J. Zhou, S. Zhou, O. Teman, X. Feng, X. Zhou, and Y. Chen, "Pudiannao: A polyvalent machine learning accelerator," in International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS), 2015.

Digital Library

[33]

Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein, "Distributed graphlab: a framework for machine learning and data mining in the cloud," Proceedings of the VLDB Endowment(PVLDB), 2012.

Digital Library

[34]

J. Mars and L. Tang, "Whare-map: Heterogeneity in "homogeneous" warehouse-scale computers," in International Symposium on Computer Architecture (ISCA), 2013.

Digital Library

[35]

J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa, "Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations," in International Symposium on Microarchitecture (MICRO), 2011.

Digital Library

[36]

V. Petrucci, M. A. Laurenzano, Y. Zhang, J. Doherty, D. Mosse, J. Mars, and L. Tang, "Octopus-man: Qos-driven task management for heterogeneous multicore in warehouse scale computers," in International Symposium on High Performance Computer Architecture (HPCA), 2015.

[37]

D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz et al., "The kaldi speech recognition toolkit," in Proc. ASRU, 2011.

[38]

A. Putnam, A. M. Caulfield, E. S. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray et al., "A reconfigurable fabric for accelerating large-scale datacenter services," in International Symposium on Computer Architecture (ISCA), 2014.

Digital Library

[39]

W. Qadeer, R. Hameed, O. Shacham, P. Venkatesan, C. Kozyrakis, and M. A. Horowitz, "Convolution engine: balancing efficiency & flexibility in specialized computing," in International Symposium on Computer Architecture(ISCA), 2013.

Digital Library

[40]

A. Research, "Wearable Computing Devices, Like Apple iWatch, Will Exceed 485 Million Annual Shipments by 2018," 2013.

[41]

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, "ImageNet Large Scale Visual Recognition Challenge(ILSVRC)," 2014.

[42]

M. Skach, M. Arora, C.-H. Hsu, Q. Li, D. Tullsen, L. Tang, and J. Mars, "Thermal time shifting: Leveraging phase change materials to reduce cooling costs in warehouse-scale computers," in Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA), ser. ISCA '15, 2015.

Digital Library

[43]

Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, "Deepface: Closing the gap to human-level performance in face verification," in Computer Vision and Pattern Recognition (CVPR), 2014.

Digital Library

[44]

O. Temam, "A defect-tolerant accelerator for emerging high-performance applications," in ACM SIGARCH Computer Architecture News, 2012.

Digital Library

[45]

V. Vanhoucke, A. Senior, and M. Z. Mao, "Improving the speed of neural networks on cpus," in Proc. Deep Learning and Unsupervised Feature Learning NIPS Workshop, 2011.

[46]

R. C. Whaley and J. Dongarra, "Automatically tuned linear algebra software," in SuperComputing: High Performance Networking and Computing, 1998.

Digital Library

[47]

H. Yang, A. Breslow, J. Mars, and L. Tang, "Bubble-flux: Precise online qos management for increased utilization in warehouse scale computers," in International Symposium on Computer Architecture (ISCA), 2013.

Digital Library

[48]

Y. Zhang, M. Laurenzano, J. Mars, and L. Tang, "Smite: Precise qos prediction on real system smt processors to improve utilization in warehouse scale computers," in International Symposium on Microarchitecture (MICRO), 2014.

Digital Library

Cited By

Rodriguez-Conde ICampos CFdez-Riverola F(2023)Horizontally Distributed Inference of Deep Neural Networks for AI-Enabled IoTSensors10.3390/s2304191123:4(1911)Online publication date: 8-Feb-2023
https://doi.org/10.3390/s23041911
Pei QYuan YHu HChen QLiu F(2023)AsyFuncProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624664(324-340)Online publication date: 30-Oct-2023
https://dl.acm.org/doi/10.1145/3620678.3624664
Lee JMin DByun IJang HKim J(2023)Fast, Light-weight, and Accurate Performance Evaluation using Representative Datacenter BehaviorsProceedings of the 24th International Middleware Conference10.1145/3590140.3629117(220-233)Online publication date: 27-Nov-2023
https://dl.acm.org/doi/10.1145/3590140.3629117
Show More Cited By

Index Terms

DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers

Recommendations

DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers
ISCA'15

As applications such as Apple Siri, Google Now, Microsoft Cortana, and Amazon Echo continue to gain traction, web-service companies are adopting large deep neural networks (DNN) for machine learning challenges such as image processing, speech ...
Alteration of ambient gaba by phasic and tonic neuronal activation

Neurons of primary auditory cortex (AI) emit spikes (action potentials) in two distinct manners, responding to sounds in an onset or a sustained manner. The former AI neurons are called phasic cells and the latter tonic cells. The phasic cells generate ...
Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and Simulation

High performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture

June 2015

768 pages

ISBN:9781450334020

DOI:10.1145/2749469

General Chair:
Debbie Marr
Intel
,
Program Chair:
David Albonesi
Cornell

ACM SIGARCH Computer Architecture News Volume 43, Issue 3S
ISCA'15
June 2015
745 pages
ISSN:0163-5964
DOI:10.1145/2872887
Editor:
Doug DeGroot
acm dot org
Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

IEEE TCCA: IEEE Computer Society Technical Committee on Computer Architecture
SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

ISCA '15

Sponsor:

IEEE TCCA
SIGARCH

ISCA '15: The 42nd Annual International Symposium on Computer Architecture

June 13 - 17, 2015

Oregon, Portland

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

133
Total Citations
View Citations
993
Total Downloads

Downloads (Last 12 months)44
Downloads (Last 6 weeks)7

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Rodriguez-Conde ICampos CFdez-Riverola F(2023)Horizontally Distributed Inference of Deep Neural Networks for AI-Enabled IoTSensors10.3390/s2304191123:4(1911)Online publication date: 8-Feb-2023
https://doi.org/10.3390/s23041911
Pei QYuan YHu HChen QLiu F(2023)AsyFuncProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624664(324-340)Online publication date: 30-Oct-2023
https://dl.acm.org/doi/10.1145/3620678.3624664
Lee JMin DByun IJang HKim J(2023)Fast, Light-weight, and Accurate Performance Evaluation using Representative Datacenter BehaviorsProceedings of the 24th International Middleware Conference10.1145/3590140.3629117(220-233)Online publication date: 27-Nov-2023
https://dl.acm.org/doi/10.1145/3590140.3629117
Saini AShende OPandit MSen RAnanthanarayanan GSha KBanerjee SChen J(2023)Bang for the Buck: Evaluating the cost-effectiveness of Heterogeneous Edge Platforms for Neural Network WorkloadsProceedings of the Eighth ACM/IEEE Symposium on Edge Computing10.1145/3583740.3628437(94-107)Online publication date: 6-Dec-2023
https://dl.acm.org/doi/10.1145/3583740.3628437
Yang ZSun DWang YHan XMeng CHuang W(2023)DTrap:A cyberattack-defense confrontation technique based on Moving Target Defense2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)10.1109/TrustCom60117.2023.00370(2652-2659)Online publication date: 1-Nov-2023
https://doi.org/10.1109/TrustCom60117.2023.00370
Korol GJordan MRutzig MCastrillon JBeck A(2023)Design Space Exploration for CNN Offloading to FPGAs at the Edge2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI59464.2023.10238644(1-6)Online publication date: 20-Jun-2023
https://doi.org/10.1109/ISVLSI59464.2023.10238644
Dong BLiu TLi BZhou XWang SXu Z(2023)WebInf: Accelerating WebGPU-based In-browser DNN Inference via Adaptive Model Partitioning2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00333(2499-2506)Online publication date: 17-Dec-2023
https://doi.org/10.1109/ICPADS60453.2023.00333
Cruz TGomes JMartins LAlbuquerque Ddos Santos GSantos DDamasio J(2023)Extensible Hardware Inference Accelerator for FPGA using Models from TensorFlow Lite2023 IEEE International Conference on Consumer Electronics (ICCE)10.1109/ICCE56470.2023.10043475(1-5)Online publication date: 6-Jan-2023
https://doi.org/10.1109/ICCE56470.2023.10043475
Chow MJahanshahi AWong D(2023)KRISP: Enabling Kernel-wise RIght-sizing for Spatial Partitioned GPU Inference Servers2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071121(624-637)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071121
Rodriguez-Conde ICampos CFdez-Riverola F(2023)Cloud-assisted collaborative inference of convolutional neural networks for vision tasks on resource-constrained devicesNeurocomputing10.1016/j.neucom.2023.126835560(126835)Online publication date: Dec-2023
https://doi.org/10.1016/j.neucom.2023.126835
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents