research-article

AI Tax: The Hidden Cost of AI Data Center Applications

Authors:

Daniel Richins,

Dharmisha Doshi,

Matthew Blackmore,

Aswathy Thulaseedharan Nair,

Neha Pathapati,

Brainard Daguman,

Daniel Dobrijalowski,

Ramesh Illikkal,

David Zimmerman,

Vijay Janapa ReddiAuthors Info & Claims

ACM Transactions on Computer Systems (TOCS), Volume 37, Issue 1-4

Article No.: 3, Pages 1 - 32

https://doi.org/10.1145/3440689

Published: 26 March 2021 Publication History

Abstract

Artificial intelligence and machine learning are experiencing widespread adoption in industry and academia. This has been driven by rapid advances in the applications and accuracy of AI through increasingly complex algorithms and models; this, in turn, has spurred research into specialized hardware AI accelerators. Given the rapid pace of advances, it is easy to forget that they are often developed and evaluated in a vacuum without considering the full application environment. This article emphasizes the need for a holistic, end-to-end analysis of artificial intelligence (AI) workloads and reveals the “AI tax.” We deploy and characterize Face Recognition in an edge data center. The application is an AI-centric edge video analytics application built using popular open source infrastructure and machine learning (ML) tools. Despite using state-of-the-art AI and ML algorithms, the application relies heavily on pre- and post-processing code. As AI-centric applications benefit from the acceleration promised by accelerators, we find they impose stresses on the hardware and software infrastructure: storage and network bandwidth become major bottlenecks with increasing AI acceleration. By specializing for AI applications, we show that a purpose-built edge data center can be designed for the stresses of accelerated AI at 15% lower TCO than one derived from homogeneous servers and infrastructure.

References

[1]

[n.d.]. AI-Benchmark. Retrieved July 25, 2019 from http://ai-benchmark.com/index.html.

[2]

[n.d.]. Amazon.com: : Amazon Go. Retrieved July 25, 2019 from https://www.amazon.com/b?node=16008589011.

[3]

[n.d.]. Apache Apex. Retrieved July 29, 2019 from http://apex.apache.org/.

[4]

[n.d.]. Apache Flink: Stateful Computations Over Data Streams. Retrieved July 29, 2019 from https://flink.apache.org/.

[5]

[n.d.]. Apache Kafka. Retrieved June 10, 2019 from https://kafka.apache.org/.

[6]

[n.d.]. Apache Kafka. Retrieved June 12, 2019 from https://kafka.apache.org/documentation/streams/.

[7]

[n.d.]. Apache Storm. http://storm.apache.org/.

[8]

[n.d.]. Caffe | Deep Learning Framework. Retrieved July 21, 2019 from https://caffe.berkeleyvision.org/.

[9]

[n.d.]. Coral. Retrieved July 24, 2019 from https://coral.withgoogle.com/.

[10]

[n.d.]. Data Center Cooling Costs | Dataspan. Retrieved August 6, 2019 from https://www.dataspan.com/blog/data-center-cooling-costs/.

[11]

[n.d.]. Deep Learning and Artificial Intelligence Solutions | NVIDIA. Retrieved July 16, 2019 from https://www.nvidia.com/en-us/deep-learning-ai/solutions/.

[12]

[n.d.]. Docker - Build, Ship, and Run Any App, Anywhere. Retrieved June 12, 2019 from https://www.docker.com/.

[13]

[n.d.]. Druid | Interactive Analytics at Scale. Retrieved July 29, 2019 from https://druid.apache.org/.

[14]

[n.d.]. GitHub - baidu-research/DeepBench: Benchmarking Deep Learning operations on different hardware. Retrieved July 11, 2020 from https://github.com/baidu-research/DeepBench.

[15]

[n.d.]. GitHub - basicmi/AI-Chip: A list of ICs and IPs for AI, Machine Learning and Deep Learning. Retrieved August 5, 2019 from https://github.com/basicmi/AI-Chip.

[16]

[n.d.]. GitHub - davidsandberg/facenet: Face Recognition using Tensorflow. Retrieved January 9, 2019 from https://github.com/davidsandberg/facenet.

[17]

[n.d.]. Google Cloud Platform Blog: Google Supercharges Machine Learning Tasks with TPU Custom Chip. Retrieved July 24, 2019 from https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html.

[18]

[n.d.]. Habana Homepage - Habana. Retrieved June 17, 2019 from https://habana.ai/.

[19]

[n.d.]. home. Retrieved July 11, 2020 from https://aimatrix.ai/en-us/index.html.

[20]

[n.d.]. HPE Power Advisor. Retrieved July 30, 2019 from https://paonline56.itcs.hpe.com/?Page=Index#.

[21]

[n.d.]. Inference - Habana. Retrieved June 17, 2019 from https://habana.ai/inference/.

[22]

[n.d.]. Intel Unveils the Intel Neural Compute Stick 2 at Intel AI Devcon Beijing for Building Smarter AI Edge Devices. Retrieved June 30, 2019 from https://newsroom.intel.com/news/intel-unveils-intel-neural-compute-stick-2/.

[23]

[n.d.]. Intel® Optane™ Technology. Retrieved July 31, 2019 from https://www.intel.com/content/www/us/en/architecture-and-technology/intel-optane-technology.html.

[24]

[n.d.]. Intel® SSD DC P4510 Series (1.0TB, 2.5in PCIe 3.1 x4, 3D2, TLC) Product Specifications. Retrieved January 9, 2019 from https://ark.intel.com/content/www/us/en/ark/products/122573/intel-ssd-dc-p4510-series-1-0tb-2-5in-pcie-3-1-x4-3d2-tlc.html.

[25]

[n.d.]. Intel® Xeon® Platinum 8176 Processor (38.5M Cache, 2.10GHz) Product Specifications. Retrieved January 9, 2019 from https://ark.intel.com/content/www/us/en/ark/products/120508/intel-xeon-platinum-8176-processor-38-5m-cache-2-10-ghz.html.

[26]

[n.d.]. Logstash: Collect, Parse, Transform Logs | Elastic. Retrieved June 12, 2019 from https://www.elastic.co/products/logstash.

[27]

[n.d.]. MLPerf. Retrieved July 28, 2019 from https://mlperf.org/.

[28]

[n.d.]. MLPerf. Retrieved July 28, 2019 from https://mlperf.org/inference-overview/.

[29]

[n.d.]. NVIDIA Deep Learning Accelerator. Retrieved July 19, 2019 from http://nvdla.org/.

[30]

[n.d.]. On-Premise Data Centers: Coming Back or Heading Out? Retrieved July 30, 2019 from https://emconit.com/blog/on-premise-data-centers-coming-back-or-heading-out.

[31]

[n.d.]. Open Source Search & Analytics · Elasticsearch | Elastic. Retrieved June 12, 2019 from https://www.elastic.co/.

[32]

[n.d.]. Production-Grade Container Orchestration - Kubernetes. Retrieved June 12, 2019 from https://kubernetes.io/.

[33]

[n.d.]. Project Brainwave - Microsoft Research. Retrieved July 12, 2019 from https://www.microsoft.com/en-us/research/project/project-brainwave/.

[34]

[n.d.]. The Rise of Edge Data Centres - Data Economy. Retrieved July 30, 2019 from https://data-economy.com/the-rise-of-edge-data-centres/.

[35]

[n.d.]. Samza. Retrieved July 29, 2019 from http://samza.apache.org/.

[36]

[n.d.]. Slash Data-Ccenter Costs and Downtime by Using Coolan’s TCO Model - TechRepublic. Retrieved July 30, 2019 from https://www.techrepublic.com/article/slash-data-center-costs-and-downtime-by-using-coolans-tco-model/.

[37]

[n.d.]. Specifications - SN2000 Series - Mellanox Docs. Retrieved July 30, 2019 from https://docs.mellanox.com/display/sn2000pub/Specifications.

[38]

[n.d.]. Stanford DAWN Deep Learning Benchmark (DAWNBench). Retrieved July 9, 2020 from https://dawn.cs.stanford.edu/benchmark/index.html.

[39]

[n.d.]. TensorFlow. Retrieved June 12, 2019 from https://www.tensorflow.org/.

[40]

[n.d.]. Video Analytics Market to Reach USD 25.4 Billion by 2026. Retrieved August 6, 2019 from https://www.marketwatch.com/press-release/video-analytics-market-to-reach-usd-254-billion-by-2026cisco-systems-inc-axis-communications-genetec-inc-2019-09-09.

[41]

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2016. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.

[42]

Ken Birman and Thomas Joseph. 1987. Exploiting Virtual Synchrony in Distributed Systems. Vol. 21. ACM.

[43]

Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM Sigplan Notices 49, 4 (2014), 269--284.

Digital Library

[44]

Michael Chow, David Meisner, Jason Flinn, Daniel Peek, and Thomas F. Wenisch. 2014. The mystery machine: End-to-end performance analysis of large-scale Internet services. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14). 217--231.

[45]

Eric Chung, Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Adrian Caulfield, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, et al. 2018. Serving DNNs in real time at datacenter scale with project brainwave. IEEE Micro 38, 2 (2018), 8--20.

[46]

Cody Coleman, Deepak Narayanan, Daniel Kang, Tian Zhao, Jian Zhang, Luigi Nardi, Peter Bailis, Kunle Olukotun, Chris Ré, and Matei Zaharia. 2017. Dawnbench: An end-to-end deep learning benchmark and competition. Training 100, 101 (2017), 102.

[47]

DataTorrent. [n.d.]. End-to-end “Exactly-Once” With Apache Apex. Retrieved July 30, 2019 from https://cdn.rawgit.com/dtpublic/website/b0c73294/blogs/End-to-end%20_Exactly-Once_%20_with%20Apache%20Apex%20-%20DataTorrent.htm.

[48]

Hewlett Packard Enterprise. 2018. HPE On-Prem vs. Amazon Web Services (AWS). Technical Report. Hewlett Packard Enterprise Company.

[49]

Wanling Gao, Fei Tang, Lei Wang, Jianfeng Zhan, Chunxin Lan, Chunjie Luo, Yunyou Huang, Chen Zheng, Jiahui Dai, Zheng Cao, et al. 2019. AIBench: An industry standard internet service AI benchmark suite. arXiv preprint arXiv:1908.08998.

[50]

Udit Gupta, Xiaodong Wang, Maxim Naumov, Carole-Jean Wu, Brandon Reagen, David Brooks, Bradford Cottel, Kim Hazelwood, Bill Jia, Hsien-Hsin S Lee, et al. 2019. The architectural implications of Facebook’s DNN-based personalized recommendation. arXiv preprint arXiv:1906.03109.

[51]

Kaylie Gyarmathy. [n.d.]. How to Reduce Latency Using Edge Computing. Retrieved July 30, 2019 from https://www.vxchnge.com/blog/how-data-center-reduces-latency.

[52]

Michelle Hannula. [n.d.]. How Hybrid Cloud Simplifies Data Sovereignty Challenges | CIO. Retrieved July 30, 2019 from https://www.cio.com/article/3396631/how-hybrid-cloud-simplifies-data-sovereignty-challenges.html.

[53]

Md E. Haque, Yuxiong He, Sameh Elnikety, Thu D. Nguyen, Ricardo Bianchini, and Kathryn S. McKinley. 2017. Exploiting heterogeneity for tail latency and energy efficiency. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 625--638.

[54]

Kim Hazelwood, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy, Bill Jia, Yangqing Jia, Aditya Kalro, et al. 2018. Applied machine learning at Facebook: A datacenter infrastructure perspective. In Proceedings of the 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA’18). IEEE, 620--629.

[55]

Andrey Ignatov, Radu Timofte, William Chou, Ke Wang, Max Wu, Tim Hartley, and Luc Van Gool. 2018. AI benchmark: Running deep neural networks on android smartphones. In Proceedings of the European Conference on Computer Vision (ECCV’18). 0--0.

[56]

Andrey Ignatov, Radu Timofte, Andrei Kulik, Seungsoo Yang, Ke Wang, Felix Baum, Max Wu, Lirong Xu, and Luc Van Gool. 2019. AI benchmark: All about deep learning on smartphones in 2019. arXiv preprint arXiv:1910.06663.

[57]

Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, et al. 2019. MLPerf inference benchmark. Retrieved July 11, 2020 from https://edge.seas.harvard.edu/files/edge/files/mlperf_inference.pdf.

[58]

Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks. 2015. Profiling a warehouse-scale computer. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 158--169.

[59]

Martin Kleppmann. [n.d.]. Apache Kafka, Samza, and the Unix Philosophy of Distributed Data. Retrieved July 30, 2019 from https://www.confluent.io/blog/apache-kafka-samza-and-the-unix-philosophy-of-distributed-data/.

[60]

Charles E. Leiserson. 1985. Fat-trees: Universal networks for hardware-efficient supercomputing. IEEE Transactions on Computers 100, 10 (1985), 892--901.

[61]

Jialin Li, Naveen Kr Sharma, Dan R. K. Ports, and Steven D. Gribble. 2014. Tales of the tail: Hardware, OS, and application-level sources of tail latency. In Proceedings of the ACM Symposium on Cloud Computing. ACM, 1--14.

[62]

Almudena Lindoso and Luis Entrena. 2009. Hardware architectures for image processing acceleration. In Image Processing. IntechOpen.

[63]

Divya Mahajan, Jongse Park, Emmanuel Amaro, Hardik Sharma, Amir Yazdanbakhsh, Joon Kyung Kim, and Hadi Esmaeilzadeh. 2016. Tabla: A unified template-based framework for accelerating statistical machine learning. In Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA’16). IEEE, 14--26.

[64]

Jason Mars, Lingjia Tang, and Robert Hundt. 2011. Heterogeneity in “Homogeneous” warehouse-scale computers: A performance opportunity. IEEE Computer Architecture Letters 10, 2 (2011), 29--32.

Digital Library

[65]

Robert Metzger. [n.d.]. Kafka + Flink: A Practical, How-To Guide. Retrieved July 30, 2019 from https://www.ververica.com/blog/kafka-flink-a-practical-how-to.

[66]

Rajiv Onat. [n.d.]. Apache Storm and Kafka Together: A Real-time Data Refinery. Retrieved July 30, 2019 from https://hortonworks.com/blog/storm-kafka-together-real-time-data-refinery/.

[67]

Keshav Pingali. 2019. A Case for Case Studies. https://www.sigarch.org/a-case-for-case-studies/.

[68]

Carlo Regazzoni, Andrea Cavallaro, Ying Wu, Janusz Konrad, and Arun Hampapur. 2010. Video analytics for surveillance: Theory and practice [from the guest editors]. IEEE Signal Processing Magazine 27, 5 (2010), 16--17.

[69]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. 91--99.

[70]

Daniel Richins, Dharmisha Doshi, Matthew Blackmore, Aswathy Thulaseedharan Nair, Neha Pathapati, Ankit Patel, Brainard Daguman, Daniel Dobrijalowski, Ramesh Illikkal, Kevin Long, et al. 2020. Missing the forest for the trees: End-to-end AI application performance in edge data centers. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 515--528.

[71]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 815--823.

[72]

Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A. Alemi. 2017. Inception-v4, Inception-ResNet and the impact of residual connections on learning. In AAAI, Vol. 4. 12.

Digital Library

[73]

Peter Torelli and Mohit Bangale. [n.d.]. Measuring Inference Performance of Machine-Learning Frameworks on Edge-class Devices with the MLMark™ Benchmark. Report. EEMBC.

[74]

Bob Wheeler. 2018. Data Centers Accelerate AI Processing. Technical Report. The Linley Group.

[75]

Alex Woodie. [n.d.]. Understanding Your Options for Stream Processing Frameworks. Retrieved July 30, 2019 from https://www.datanami.com/2019/05/30/understanding-your-options-for-stream-processing-frameworks/.

[76]

Xilinx. 2018. Accelerating DNNs with Xilinx Alveo Accelerator Cards. Technical Report. Xilinx, Inc.

[77]

Fangjin Yang. [n.d.]. Building a Streaming Analytics Stack with Apache Kafka and Druid. Retrieved July 30, 2019 from https://www.confluent.io/blog/building-a-streaming-analytics-stack-with-apache-kafka-and-druid/.

[78]

Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multi-task cascaded convolutional networks. IEEE Signal Processing Letters 23, 10 (2016), 1499--1503.

[79]

Wei Zhang, Wei Wei, Lingjie Xu, Lingling Jin, and Cheng Li. 2019. AI matrix: A deep learning benchmark for Alibaba data centers. arXiv preprint arXiv:1909.10562.

Cited By

Abouelhamayed ABalle SSingh DAbdelfattah MDe V(2024)Beyond Inference: Performance Analysis of DNN Server Overheads for Computer VisionProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3655960(1-6)Online publication date: 23-Jun-2024
https://dl.acm.org/doi/10.1145/3649329.3655960
Susskind ZArora AMiranda IBacellar AVillon LKatopodis Rde Araújo LDutra DLima PFrança FBreternitz Jr. MJohn L(2023)ULEEN: A Novel Architecture for Ultra-low-energy Edge Neural NetworksACM Transactions on Architecture and Code Optimization10.1145/362952220:4(1-24)Online publication date: 25-Oct-2023
https://dl.acm.org/doi/10.1145/3629522

Index Terms

AI Tax: The Hidden Cost of AI Data Center Applications
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
2. Software and its engineering
  1. Software organization and properties
    1. Extra-functional properties
      1. Software performance

Recommendations

Requirements for Tax XAI Under Constitutional Principles and Human Rights
Explainable and Transparent AI and Multi-Agent Systems
Abstract
Tax authorities worldwide make extensive use of artificial intelligence (AI) technologies to automate various aspects of their tasks, such as answering taxpayer questions, assessing fraud risk, risk profiling, and auditing (selecting tax ...
Does Indonesia Have the Readiness to Implement Artificial Intelligence in Tax Technology Development?
ICEME '23: Proceedings of the 2023 14th International Conference on E-business, Management and Economics

Recently, Indonesia's tax ratio well be only 10.1% of the gross domestic product (GDP), below the average Asia-Pacific country ratio of 19.1%, indicating a lower level of tax compliance. Therefore, it is necessary to develop tax technology to create a ...
The potential of an artificial intelligence (AI) application for the tax administration system’s modernization: the case of Indonesia
Abstract
From 2010 to 2020, Indonesia’s tax-to-gross domestic product (GDP) ratio has been declining. A tax-to-GDP ratio trend of this magnitude indicates that the tax authority lacks the capacity to collect taxes. The tax administration system’s ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Computer Systems

ACM Transactions on Computer Systems Volume 37, Issue 1-4

November 2019

177 pages

ISSN:0734-2071

EISSN:1557-7333

DOI:10.1145/3446674

Editor:
Michael Swift
University of Wisconsin, USA

Issue’s Table of Contents

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 March 2021

Accepted: 01 November 2020

Received: 01 July 2020

Published in TOCS Volume 37, Issue 1-4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
1,273
Total Downloads

Downloads (Last 12 months)313
Downloads (Last 6 weeks)36

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Abouelhamayed ABalle SSingh DAbdelfattah MDe V(2024)Beyond Inference: Performance Analysis of DNN Server Overheads for Computer VisionProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3655960(1-6)Online publication date: 23-Jun-2024
https://dl.acm.org/doi/10.1145/3649329.3655960
Susskind ZArora AMiranda IBacellar AVillon LKatopodis Rde Araújo LDutra DLima PFrança FBreternitz Jr. MJohn L(2023)ULEEN: A Novel Architecture for Ultra-low-energy Edge Neural NetworksACM Transactions on Architecture and Code Optimization10.1145/362952220:4(1-24)Online publication date: 25-Oct-2023
https://dl.acm.org/doi/10.1145/3629522

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Issue’s Table of Contents