skip to main content
10.1145/2749469.2749472acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers

Published: 13 June 2015 Publication History

Abstract

As applications such as Apple Siri, Google Now, Microsoft Cortana, and Amazon Echo continue to gain traction, web-service companies are adopting large deep neural networks (DNN) for machine learning challenges such as image processing, speech recognition, natural language processing, among others. A number of open questions arise as to the design of a server platform specialized for DNN and how modern warehouse scale computers (WSCs) should be outfitted to provide DNN as a service for these applications.
In this paper, we present DjiNN, an open infrastructure for DNN as a service in WSCs, and Tonic Suite, a suite of 7 end-to-end applications that span image, speech, and language processing. We use DjiNN to design a high throughput DNN system based on massive GPU server designs and provide insights as to the varying characteristics across applications. After studying the throughput, bandwidth, and power properties of DjiNN and Tonic Suite, we investigate several design points for future WSC architectures. We investigate the total cost of ownership implications of having a WSC with a disaggregated GPU pool versus a WSC composed of homogeneous integrated GPU servers. We improve DNN throughput by over 120x for all but one application (40x for Facial Recognition) on an NVIDIA K40 GPU. On a GPU server composed of 8 NVIDIA K40s, we achieve near-linear scaling (around 1000x throughput improvement) for 3 of the 7 applications. Through our analysis, we also find that GPU-enabled WSCs improve total cost of ownership over CPU-only designs by 4-20x, depending on the composition of the workload

References

[1]
"Cuda toolkit documentation," http://docs.nvidia.com/cuda/profiler-users-guide/.
[2]
"DjiNN and Tonic: DNN as a Service," http://djinn.clarity-lab.org.
[3]
"Facebook's quest to build an artificial brain depends on this guy," www.wired.com/2014/08/deep-learning-yann-lecun.
[4]
"Google Glass," www.google.com/glass/start.
[5]
"Inside the artificial brain that's remaking the google empire," www.wired.com/2014/07/google_brain/.
[6]
"Maple 2015. maplesoft, a division of waterloo maple inc., waterloo, ontario." http://www.maplesoft.com/.
[7]
"Microsoft corp to challenge apple inc with siri alternative: More intelligent and fast enough!" www.dazeinfo.com/2013/06/18/microsoft-corp-to-challenge-apple-inc-with-siri-alternative/-more-intelligent-and-fast-enough.
[8]
"Multi-process service," https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf.
[9]
"Nvidia visual profiler," https://developer.nvidia.com/NVIDIA-visual-profiler.
[10]
"Apple's Massive New Data Center Set To Host Nuance Tech," http://techcrunch.com/2011/05/09/apple-nuance-data-center-deal/, 2011.
[11]
L. A. Barroso, J. Clidaras, and U. Hölzle, "The datacenter as a computer: an introduction to the design of warehouse-scale machines," Synthesis Lectures on Computer Architecture, 2013.
[12]
F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. Goodfellow, A. Bergeron, N. Bouchard, D. Warde-Farley, and Y. Bengio, "Theano: new features and speed improvements," Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop, 2012.
[13]
B. C. Becker and E. G. Ortiz, "Evaluating open-universe face identification on the web," in Computer Vision and Pattern Recognition Workshops (CVPRW), 2013.
[14]
T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, "Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning," Architectual Support for Programming Languages and Operating Systems(ASPLOS), 2014.
[15]
Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun et al., "Dadiannao: A machine-learning supercomputer," in Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on. IEEE, 2014, pp. 609--622.
[16]
S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer, "cudnn: Efficient primitives for deep learning," arXiv preprint arXiv:1410.0759, 2014.
[17]
T. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman, "Project adam: building an efficient and scalable deep learning training system," in Operating Systems Design and Implementation(OSDI), 2014.
[18]
A. Coates, B. Huval, T. Wang, D. Wu, B. Catanzaro, and N. Andrew, "Deep learning with cots hpc systems," in International Conference on Machine Learning(ICML), 2013.
[19]
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa, "Natural language processing (almost) from scratch," The Journal of Machine Learning Research, 2011.
[20]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in Computer Vision and Pattern Recognition (CVPR), 2009.
[21]
I. J. Goodfellow, D. Warde-Farley, P. Lamblin, V. Dumoulin, M. Mirza, R. Pascanu, J. Bergstra, F. Bastien, and Y. Bengio, "Pylearn2: a machine learning research library," arXiv preprint arXiv:1308.4214, 2013.
[22]
A. Graves, A.-R. Mohamed, and G. Hinton, "Speech recognition with deep recurrent neural networks," in International Conference on Acoustics, Speech and Signal Processing(ICASSp), 2013.
[23]
J. Hauswald, T. Manville, Q. Zheng, R. Dreslinski, C. Chakrabarti, and T. Mudge, "A hybrid approach to offloading mobile image classification," in International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014.
[24]
J. Hauswald, M. A. Laurenzano, Y. Zhang, C. Li, A. Rovinski, A. Khurana, R. G. Dreslinski, T. Mudge, V. Petrucci, L. Tang, and J. Mars, "Sirius: An open end-to-end voice and vision personal assistant and its implications for future warehouse scale computers," in International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2015.
[25]
C.-H. Hsu, Y. Zhang, M. A. Laurenzano, D. Meisner, T. Wenisch, L. Tang, J. Mars, and R. Dreslinski, "Adrenaline: Pinpointing and reigning in tail queries with quick voltage boosting," in International Symposium on High Performance Computer Architecture (HPCA), 2015.
[26]
X. Huang, J. Baker, and R. Reddy, "A historical perspective of speech recognition," Commun. ACM, 2014.
[27]
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, "Caffe: Convolutional architecture for fast feature embedding," arXiv preprint arXiv:1408.5093, 2014.
[28]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012.
[29]
N. Kurd, J. Douglas, P. Mosalikanti, and R. Kumar, "Next generation intel® micro-architecture (nehalem) clocking architecture," in VLSI Circuits, 2008 IEEE Symposium on. IEEE, 2008, pp. 62--63.
[30]
M. Laurenzano, Y. Zhang, L. Tang, and J. Mars, "Protean code: Achieving near-free online code transformations for warehouse scale computers," in International Symposium on Microarchitecture (MICRO), 2014.
[31]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, 1998.
[32]
D. Liu, T. Chen, S. Liu, J. Zhou, S. Zhou, O. Teman, X. Feng, X. Zhou, and Y. Chen, "Pudiannao: A polyvalent machine learning accelerator," in International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS), 2015.
[33]
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein, "Distributed graphlab: a framework for machine learning and data mining in the cloud," Proceedings of the VLDB Endowment(PVLDB), 2012.
[34]
J. Mars and L. Tang, "Whare-map: Heterogeneity in "homogeneous" warehouse-scale computers," in International Symposium on Computer Architecture (ISCA), 2013.
[35]
J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa, "Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations," in International Symposium on Microarchitecture (MICRO), 2011.
[36]
V. Petrucci, M. A. Laurenzano, Y. Zhang, J. Doherty, D. Mosse, J. Mars, and L. Tang, "Octopus-man: Qos-driven task management for heterogeneous multicore in warehouse scale computers," in International Symposium on High Performance Computer Architecture (HPCA), 2015.
[37]
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz et al., "The kaldi speech recognition toolkit," in Proc. ASRU, 2011.
[38]
A. Putnam, A. M. Caulfield, E. S. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray et al., "A reconfigurable fabric for accelerating large-scale datacenter services," in International Symposium on Computer Architecture (ISCA), 2014.
[39]
W. Qadeer, R. Hameed, O. Shacham, P. Venkatesan, C. Kozyrakis, and M. A. Horowitz, "Convolution engine: balancing efficiency & flexibility in specialized computing," in International Symposium on Computer Architecture(ISCA), 2013.
[40]
A. Research, "Wearable Computing Devices, Like Apple iWatch, Will Exceed 485 Million Annual Shipments by 2018," 2013.
[41]
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, "ImageNet Large Scale Visual Recognition Challenge(ILSVRC)," 2014.
[42]
M. Skach, M. Arora, C.-H. Hsu, Q. Li, D. Tullsen, L. Tang, and J. Mars, "Thermal time shifting: Leveraging phase change materials to reduce cooling costs in warehouse-scale computers," in Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA), ser. ISCA '15, 2015.
[43]
Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, "Deepface: Closing the gap to human-level performance in face verification," in Computer Vision and Pattern Recognition (CVPR), 2014.
[44]
O. Temam, "A defect-tolerant accelerator for emerging high-performance applications," in ACM SIGARCH Computer Architecture News, 2012.
[45]
V. Vanhoucke, A. Senior, and M. Z. Mao, "Improving the speed of neural networks on cpus," in Proc. Deep Learning and Unsupervised Feature Learning NIPS Workshop, 2011.
[46]
R. C. Whaley and J. Dongarra, "Automatically tuned linear algebra software," in SuperComputing: High Performance Networking and Computing, 1998.
[47]
H. Yang, A. Breslow, J. Mars, and L. Tang, "Bubble-flux: Precise online qos management for increased utilization in warehouse scale computers," in International Symposium on Computer Architecture (ISCA), 2013.
[48]
Y. Zhang, M. Laurenzano, J. Mars, and L. Tang, "Smite: Precise qos prediction on real system smt processors to improve utilization in warehouse scale computers," in International Symposium on Microarchitecture (MICRO), 2014.

Cited By

View all
  • (2023)Horizontally Distributed Inference of Deep Neural Networks for AI-Enabled IoTSensors10.3390/s2304191123:4(1911)Online publication date: 8-Feb-2023
  • (2023)AsyFuncProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624664(324-340)Online publication date: 30-Oct-2023
  • (2023)Fast, Light-weight, and Accurate Performance Evaluation using Representative Datacenter BehaviorsProceedings of the 24th International Middleware Conference10.1145/3590140.3629117(220-233)Online publication date: 27-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture
June 2015
768 pages
ISBN:9781450334020
DOI:10.1145/2749469
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2015

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ISCA '15
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)44
  • Downloads (Last 6 weeks)7
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Horizontally Distributed Inference of Deep Neural Networks for AI-Enabled IoTSensors10.3390/s2304191123:4(1911)Online publication date: 8-Feb-2023
  • (2023)AsyFuncProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624664(324-340)Online publication date: 30-Oct-2023
  • (2023)Fast, Light-weight, and Accurate Performance Evaluation using Representative Datacenter BehaviorsProceedings of the 24th International Middleware Conference10.1145/3590140.3629117(220-233)Online publication date: 27-Nov-2023
  • (2023)Bang for the Buck: Evaluating the cost-effectiveness of Heterogeneous Edge Platforms for Neural Network WorkloadsProceedings of the Eighth ACM/IEEE Symposium on Edge Computing10.1145/3583740.3628437(94-107)Online publication date: 6-Dec-2023
  • (2023)DTrap:A cyberattack-defense confrontation technique based on Moving Target Defense2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)10.1109/TrustCom60117.2023.00370(2652-2659)Online publication date: 1-Nov-2023
  • (2023)Design Space Exploration for CNN Offloading to FPGAs at the Edge2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI59464.2023.10238644(1-6)Online publication date: 20-Jun-2023
  • (2023)WebInf: Accelerating WebGPU-based In-browser DNN Inference via Adaptive Model Partitioning2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00333(2499-2506)Online publication date: 17-Dec-2023
  • (2023)Extensible Hardware Inference Accelerator for FPGA using Models from TensorFlow Lite2023 IEEE International Conference on Consumer Electronics (ICCE)10.1109/ICCE56470.2023.10043475(1-5)Online publication date: 6-Jan-2023
  • (2023)KRISP: Enabling Kernel-wise RIght-sizing for Spatial Partitioned GPU Inference Servers2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071121(624-637)Online publication date: Feb-2023
  • (2023)Cloud-assisted collaborative inference of convolutional neural networks for vision tasks on resource-constrained devicesNeurocomputing10.1016/j.neucom.2023.126835560(126835)Online publication date: Dec-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media