skip to main content
research-article

AI Tax: The Hidden Cost of AI Data Center Applications

Published: 26 March 2021 Publication History

Abstract

Artificial intelligence and machine learning are experiencing widespread adoption in industry and academia. This has been driven by rapid advances in the applications and accuracy of AI through increasingly complex algorithms and models; this, in turn, has spurred research into specialized hardware AI accelerators. Given the rapid pace of advances, it is easy to forget that they are often developed and evaluated in a vacuum without considering the full application environment. This article emphasizes the need for a holistic, end-to-end analysis of artificial intelligence (AI) workloads and reveals the “AI tax.” We deploy and characterize Face Recognition in an edge data center. The application is an AI-centric edge video analytics application built using popular open source infrastructure and machine learning (ML) tools. Despite using state-of-the-art AI and ML algorithms, the application relies heavily on pre- and post-processing code. As AI-centric applications benefit from the acceleration promised by accelerators, we find they impose stresses on the hardware and software infrastructure: storage and network bandwidth become major bottlenecks with increasing AI acceleration. By specializing for AI applications, we show that a purpose-built edge data center can be designed for the stresses of accelerated AI at 15% lower TCO than one derived from homogeneous servers and infrastructure.

References

[1]
[n.d.]. AI-Benchmark. Retrieved July 25, 2019 from http://ai-benchmark.com/index.html.
[2]
[n.d.]. Amazon.com: : Amazon Go. Retrieved July 25, 2019 from https://www.amazon.com/b?node=16008589011.
[3]
[n.d.]. Apache Apex. Retrieved July 29, 2019 from http://apex.apache.org/.
[4]
[n.d.]. Apache Flink: Stateful Computations Over Data Streams. Retrieved July 29, 2019 from https://flink.apache.org/.
[5]
[n.d.]. Apache Kafka. Retrieved June 10, 2019 from https://kafka.apache.org/.
[6]
[n.d.]. Apache Kafka. Retrieved June 12, 2019 from https://kafka.apache.org/documentation/streams/.
[7]
[n.d.]. Apache Storm. http://storm.apache.org/.
[8]
[n.d.]. Caffe | Deep Learning Framework. Retrieved July 21, 2019 from https://caffe.berkeleyvision.org/.
[9]
[n.d.]. Coral. Retrieved July 24, 2019 from https://coral.withgoogle.com/.
[10]
[n.d.]. Data Center Cooling Costs | Dataspan. Retrieved August 6, 2019 from https://www.dataspan.com/blog/data-center-cooling-costs/.
[11]
[n.d.]. Deep Learning and Artificial Intelligence Solutions | NVIDIA. Retrieved July 16, 2019 from https://www.nvidia.com/en-us/deep-learning-ai/solutions/.
[12]
[n.d.]. Docker - Build, Ship, and Run Any App, Anywhere. Retrieved June 12, 2019 from https://www.docker.com/.
[13]
[n.d.]. Druid | Interactive Analytics at Scale. Retrieved July 29, 2019 from https://druid.apache.org/.
[14]
[n.d.]. GitHub - baidu-research/DeepBench: Benchmarking Deep Learning operations on different hardware. Retrieved July 11, 2020 from https://github.com/baidu-research/DeepBench.
[15]
[n.d.]. GitHub - basicmi/AI-Chip: A list of ICs and IPs for AI, Machine Learning and Deep Learning. Retrieved August 5, 2019 from https://github.com/basicmi/AI-Chip.
[16]
[n.d.]. GitHub - davidsandberg/facenet: Face Recognition using Tensorflow. Retrieved January 9, 2019 from https://github.com/davidsandberg/facenet.
[17]
[n.d.]. Google Cloud Platform Blog: Google Supercharges Machine Learning Tasks with TPU Custom Chip. Retrieved July 24, 2019 from https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html.
[18]
[n.d.]. Habana Homepage - Habana. Retrieved June 17, 2019 from https://habana.ai/.
[19]
[n.d.]. home. Retrieved July 11, 2020 from https://aimatrix.ai/en-us/index.html.
[20]
[n.d.]. HPE Power Advisor. Retrieved July 30, 2019 from https://paonline56.itcs.hpe.com/?Page=Index#.
[21]
[n.d.]. Inference - Habana. Retrieved June 17, 2019 from https://habana.ai/inference/.
[22]
[n.d.]. Intel Unveils the Intel Neural Compute Stick 2 at Intel AI Devcon Beijing for Building Smarter AI Edge Devices. Retrieved June 30, 2019 from https://newsroom.intel.com/news/intel-unveils-intel-neural-compute-stick-2/.
[23]
[n.d.]. Intel® Optane™ Technology. Retrieved July 31, 2019 from https://www.intel.com/content/www/us/en/architecture-and-technology/intel-optane-technology.html.
[24]
[n.d.]. Intel® SSD DC P4510 Series (1.0TB, 2.5in PCIe 3.1 x4, 3D2, TLC) Product Specifications. Retrieved January 9, 2019 from https://ark.intel.com/content/www/us/en/ark/products/122573/intel-ssd-dc-p4510-series-1-0tb-2-5in-pcie-3-1-x4-3d2-tlc.html.
[25]
[n.d.]. Intel® Xeon® Platinum 8176 Processor (38.5M Cache, 2.10GHz) Product Specifications. Retrieved January 9, 2019 from https://ark.intel.com/content/www/us/en/ark/products/120508/intel-xeon-platinum-8176-processor-38-5m-cache-2-10-ghz.html.
[26]
[n.d.]. Logstash: Collect, Parse, Transform Logs | Elastic. Retrieved June 12, 2019 from https://www.elastic.co/products/logstash.
[27]
[n.d.]. MLPerf. Retrieved July 28, 2019 from https://mlperf.org/.
[28]
[n.d.]. MLPerf. Retrieved July 28, 2019 from https://mlperf.org/inference-overview/.
[29]
[n.d.]. NVIDIA Deep Learning Accelerator. Retrieved July 19, 2019 from http://nvdla.org/.
[30]
[n.d.]. On-Premise Data Centers: Coming Back or Heading Out? Retrieved July 30, 2019 from https://emconit.com/blog/on-premise-data-centers-coming-back-or-heading-out.
[31]
[n.d.]. Open Source Search & Analytics · Elasticsearch | Elastic. Retrieved June 12, 2019 from https://www.elastic.co/.
[32]
[n.d.]. Production-Grade Container Orchestration - Kubernetes. Retrieved June 12, 2019 from https://kubernetes.io/.
[33]
[n.d.]. Project Brainwave - Microsoft Research. Retrieved July 12, 2019 from https://www.microsoft.com/en-us/research/project/project-brainwave/.
[34]
[n.d.]. The Rise of Edge Data Centres - Data Economy. Retrieved July 30, 2019 from https://data-economy.com/the-rise-of-edge-data-centres/.
[35]
[n.d.]. Samza. Retrieved July 29, 2019 from http://samza.apache.org/.
[36]
[n.d.]. Slash Data-Ccenter Costs and Downtime by Using Coolan’s TCO Model - TechRepublic. Retrieved July 30, 2019 from https://www.techrepublic.com/article/slash-data-center-costs-and-downtime-by-using-coolans-tco-model/.
[37]
[n.d.]. Specifications - SN2000 Series - Mellanox Docs. Retrieved July 30, 2019 from https://docs.mellanox.com/display/sn2000pub/Specifications.
[38]
[n.d.]. Stanford DAWN Deep Learning Benchmark (DAWNBench). Retrieved July 9, 2020 from https://dawn.cs.stanford.edu/benchmark/index.html.
[39]
[n.d.]. TensorFlow. Retrieved June 12, 2019 from https://www.tensorflow.org/.
[40]
[n.d.]. Video Analytics Market to Reach USD 25.4 Billion by 2026. Retrieved August 6, 2019 from https://www.marketwatch.com/press-release/video-analytics-market-to-reach-usd-254-billion-by-2026cisco-systems-inc-axis-communications-genetec-inc-2019-09-09.
[41]
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2016. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.
[42]
Ken Birman and Thomas Joseph. 1987. Exploiting Virtual Synchrony in Distributed Systems. Vol. 21. ACM.
[43]
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM Sigplan Notices 49, 4 (2014), 269--284.
[44]
Michael Chow, David Meisner, Jason Flinn, Daniel Peek, and Thomas F. Wenisch. 2014. The mystery machine: End-to-end performance analysis of large-scale Internet services. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14). 217--231.
[45]
Eric Chung, Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Adrian Caulfield, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, et al. 2018. Serving DNNs in real time at datacenter scale with project brainwave. IEEE Micro 38, 2 (2018), 8--20.
[46]
Cody Coleman, Deepak Narayanan, Daniel Kang, Tian Zhao, Jian Zhang, Luigi Nardi, Peter Bailis, Kunle Olukotun, Chris Ré, and Matei Zaharia. 2017. Dawnbench: An end-to-end deep learning benchmark and competition. Training 100, 101 (2017), 102.
[47]
DataTorrent. [n.d.]. End-to-end “Exactly-Once” With Apache Apex. Retrieved July 30, 2019 from https://cdn.rawgit.com/dtpublic/website/b0c73294/blogs/End-to-end%20_Exactly-Once_%20_with%20Apache%20Apex%20-%20DataTorrent.htm.
[48]
Hewlett Packard Enterprise. 2018. HPE On-Prem vs. Amazon Web Services (AWS). Technical Report. Hewlett Packard Enterprise Company.
[49]
Wanling Gao, Fei Tang, Lei Wang, Jianfeng Zhan, Chunxin Lan, Chunjie Luo, Yunyou Huang, Chen Zheng, Jiahui Dai, Zheng Cao, et al. 2019. AIBench: An industry standard internet service AI benchmark suite. arXiv preprint arXiv:1908.08998.
[50]
Udit Gupta, Xiaodong Wang, Maxim Naumov, Carole-Jean Wu, Brandon Reagen, David Brooks, Bradford Cottel, Kim Hazelwood, Bill Jia, Hsien-Hsin S Lee, et al. 2019. The architectural implications of Facebook’s DNN-based personalized recommendation. arXiv preprint arXiv:1906.03109.
[51]
Kaylie Gyarmathy. [n.d.]. How to Reduce Latency Using Edge Computing. Retrieved July 30, 2019 from https://www.vxchnge.com/blog/how-data-center-reduces-latency.
[52]
Michelle Hannula. [n.d.]. How Hybrid Cloud Simplifies Data Sovereignty Challenges | CIO. Retrieved July 30, 2019 from https://www.cio.com/article/3396631/how-hybrid-cloud-simplifies-data-sovereignty-challenges.html.
[53]
Md E. Haque, Yuxiong He, Sameh Elnikety, Thu D. Nguyen, Ricardo Bianchini, and Kathryn S. McKinley. 2017. Exploiting heterogeneity for tail latency and energy efficiency. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 625--638.
[54]
Kim Hazelwood, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, et al. 2018. Applied machine learning at Facebook: A datacenter infrastructure perspective. In Proceedings of the 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA’18). IEEE, 620--629.
[55]
Andrey Ignatov, Radu Timofte, William Chou, Ke Wang, Max Wu, Tim Hartley, and Luc Van Gool. 2018. AI benchmark: Running deep neural networks on android smartphones. In Proceedings of the European Conference on Computer Vision (ECCV’18). 0--0.
[56]
Andrey Ignatov, Radu Timofte, Andrei Kulik, Seungsoo Yang, Ke Wang, Felix Baum, Max Wu, Lirong Xu, and Luc Van Gool. 2019. AI benchmark: All about deep learning on smartphones in 2019. arXiv preprint arXiv:1910.06663.
[57]
Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, et al. 2019. MLPerf inference benchmark. Retrieved July 11, 2020 from https://edge.seas.harvard.edu/files/edge/files/mlperf_inference.pdf.
[58]
Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks. 2015. Profiling a warehouse-scale computer. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 158--169.
[59]
Martin Kleppmann. [n.d.]. Apache Kafka, Samza, and the Unix Philosophy of Distributed Data. Retrieved July 30, 2019 from https://www.confluent.io/blog/apache-kafka-samza-and-the-unix-philosophy-of-distributed-data/.
[60]
Charles E. Leiserson. 1985. Fat-trees: Universal networks for hardware-efficient supercomputing. IEEE Transactions on Computers 100, 10 (1985), 892--901.
[61]
Jialin Li, Naveen Kr Sharma, Dan R. K. Ports, and Steven D. Gribble. 2014. Tales of the tail: Hardware, OS, and application-level sources of tail latency. In Proceedings of the ACM Symposium on Cloud Computing. ACM, 1--14.
[62]
Almudena Lindoso and Luis Entrena. 2009. Hardware architectures for image processing acceleration. In Image Processing. IntechOpen.
[63]
Divya Mahajan, Jongse Park, Emmanuel Amaro, Hardik Sharma, Amir Yazdanbakhsh, Joon Kyung Kim, and Hadi Esmaeilzadeh. 2016. Tabla: A unified template-based framework for accelerating statistical machine learning. In Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA’16). IEEE, 14--26.
[64]
Jason Mars, Lingjia Tang, and Robert Hundt. 2011. Heterogeneity in “Homogeneous” warehouse-scale computers: A performance opportunity. IEEE Computer Architecture Letters 10, 2 (2011), 29--32.
[65]
Robert Metzger. [n.d.]. Kafka + Flink: A Practical, How-To Guide. Retrieved July 30, 2019 from https://www.ververica.com/blog/kafka-flink-a-practical-how-to.
[66]
Rajiv Onat. [n.d.]. Apache Storm and Kafka Together: A Real-time Data Refinery. Retrieved July 30, 2019 from https://hortonworks.com/blog/storm-kafka-together-real-time-data-refinery/.
[67]
Keshav Pingali. 2019. A Case for Case Studies. https://www.sigarch.org/a-case-for-case-studies/.
[68]
Carlo Regazzoni, Andrea Cavallaro, Ying Wu, Janusz Konrad, and Arun Hampapur. 2010. Video analytics for surveillance: Theory and practice [from the guest editors]. IEEE Signal Processing Magazine 27, 5 (2010), 16--17.
[69]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. 91--99.
[70]
Daniel Richins, Dharmisha Doshi, Matthew Blackmore, Aswathy Thulaseedharan Nair, Neha Pathapati, Ankit Patel, Brainard Daguman, Daniel Dobrijalowski, Ramesh Illikkal, Kevin Long, et al. 2020. Missing the forest for the trees: End-to-end AI application performance in edge data centers. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 515--528.
[71]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 815--823.
[72]
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A. Alemi. 2017. Inception-v4, Inception-ResNet and the impact of residual connections on learning. In AAAI, Vol. 4. 12.
[73]
Peter Torelli and Mohit Bangale. [n.d.]. Measuring Inference Performance of Machine-Learning Frameworks on Edge-class Devices with the MLMark™ Benchmark. Report. EEMBC.
[74]
Bob Wheeler. 2018. Data Centers Accelerate AI Processing. Technical Report. The Linley Group.
[75]
Alex Woodie. [n.d.]. Understanding Your Options for Stream Processing Frameworks. Retrieved July 30, 2019 from https://www.datanami.com/2019/05/30/understanding-your-options-for-stream-processing-frameworks/.
[76]
Xilinx. 2018. Accelerating DNNs with Xilinx Alveo Accelerator Cards. Technical Report. Xilinx, Inc.
[77]
Fangjin Yang. [n.d.]. Building a Streaming Analytics Stack with Apache Kafka and Druid. Retrieved July 30, 2019 from https://www.confluent.io/blog/building-a-streaming-analytics-stack-with-apache-kafka-and-druid/.
[78]
Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multi-task cascaded convolutional networks. IEEE Signal Processing Letters 23, 10 (2016), 1499--1503.
[79]
Wei Zhang, Wei Wei, Lingjie Xu, Lingling Jin, and Cheng Li. 2019. AI matrix: A deep learning benchmark for Alibaba data centers. arXiv preprint arXiv:1909.10562.

Cited By

View all
  • (2024)Beyond Inference: Performance Analysis of DNN Server Overheads for Computer VisionProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3655960(1-6)Online publication date: 23-Jun-2024
  • (2023)ULEEN: A Novel Architecture for Ultra-low-energy Edge Neural NetworksACM Transactions on Architecture and Code Optimization10.1145/362952220:4(1-24)Online publication date: 25-Oct-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Computer Systems
ACM Transactions on Computer Systems  Volume 37, Issue 1-4
November 2019
177 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/3446674
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 March 2021
Accepted: 01 November 2020
Received: 01 July 2020
Published in TOCS Volume 37, Issue 1-4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. AI tax
  2. end-to-end AI application

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)313
  • Downloads (Last 6 weeks)36
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Beyond Inference: Performance Analysis of DNN Server Overheads for Computer VisionProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3655960(1-6)Online publication date: 23-Jun-2024
  • (2023)ULEEN: A Novel Architecture for Ultra-low-energy Edge Neural NetworksACM Transactions on Architecture and Code Optimization10.1145/362952220:4(1-24)Online publication date: 25-Oct-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media