Skip to main content

EAIBench: An Energy Efficiency Benchmark for AI Training

  • Conference paper
  • First Online:
Benchmarking, Measuring, and Optimizing (Bench 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13852))

Included in the following conference series:

  • 356 Accesses

Abstract

The increase in computing power has prompted more considerable artificial intelligence (AI) model scales. From 341K multiply-accumulate operations (MACs) of LeNet-5 to 4.11G MACs of ResNet-50, the computational cost of image classification has increased by 10,000 times over two decades. On the other hand, it has inevitably brought about an increase in energy consumption, and benchmarking the energy efficiency of the modern AI workloads is also essential. Existing benchmarks, such as MLPerf and AIBench, focus on performance evaluation of AI computing, the time to the target accuracy (TTA) is the primary metric. Corresponding to the TTA metric, using the energy consumption, where the AI workload achieves the specific accuracy, is a straightforward energy measurement method. However, it is too time-consuming and power-hungry, which is unacceptable for energy efficiency benchmarking. This work introduces a new metric to quickly and accurately benchmark AI training workloads’ energy efficiency, called the Energy-Delay Product of one Epoch (EEDP). The EEDP is calculated based on the product of the energy and time consumption within one training epoch, where one epoch refers to one training cycle through the entire training dataset. It can reflect not only the energy consumption but also the time efficiency and suit the energy efficiency of the AI training workloads. Then, we introduce an AI training energy efficiency benchmark named EAIBench, which covers different energy efficiency dimensions, including dominant layers, computation intensities, and memory accesses. Our evaluation results demonstrate that EAIBench can provide reproducible and meaningful results in only dozens of minutes, which is hundreds of times faster than the existing AI training benchmark method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. https://nlp.stanford.edu/projects/nmt/

  2. Adolf, R., Rama, S., Reagen, B., Wei, G.Y., Brooks, D.: Fathom: reference workloads for modern deep learning methods. In: 2016 IEEE International Symposium on Workload Characterization (IISWC), pp. 1–10. IEEE (2016)

    Google Scholar 

  3. Akiba, T., Suzuki, S., Fukuda, K.: Extremely large minibatch SGD: training ResNet-50 on ImageNet in 15 minutes. arXiv preprint arXiv:1711.04325 (2017)

  4. Amodei, D., et al.: Deep speech 2: end-to-end speech recognition in English and Mandarin. In: International Conference on Machine Learning, pp. 173–182. PMLR (2016)

    Google Scholar 

  5. Baidu: Deepbench: benchmarking deep learning operations on different hardware (2017). https://github.com/baidu-research/DeepBench

  6. Cho, E., Myers, S.A., Leskovec, J.: Friendship and mobility: user movement in location-based social networks. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1082–1090 (2011)

    Google Scholar 

  7. Coleman, C., et al.: Analysis of dawnbench, a time-to-accuracy machine learning performance benchmark. ACM SIGOPS Oper. Syst. Rev. 53(1), 14–25 (2019)

    Article  Google Scholar 

  8. Coleman, C., et al.: Dawnbench: an end-to-end deep learning benchmark and competition. Training 100(101), 102 (2017)

    Google Scholar 

  9. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

    Google Scholar 

  10. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  11. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

    Google Scholar 

  12. Gonzalez, R., Horowitz, M.: Energy dissipation in general purpose microprocessors. IEEE J. Solid-State Circuits 31(9), 1277–1284 (1996)

    Article  Google Scholar 

  13. Goyal, P., et al.: Accurate, large minibatch SGD: training ImageNet in 1 hour. arXiv preprint arXiv:1706.02677 (2017)

  14. Hajiamini, S., Shirazi, B.A.: A study of DVFS methodologies for multicore systems with islanding feature. In: Advances in Computers, vol. 119, pp. 35–71. Elsevier (2020)

    Google Scholar 

  15. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

    Google Scholar 

  16. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  17. He, X., Liao, L., Zhang, H., Nie, L., Hu, X., Chua, T.S.: Neural collaborative filtering. In: Proceedings of the 26th International Conference on World Wide Web, pp. 173–182 (2017)

    Google Scholar 

  18. He, X., et al.: Practical lessons from predicting clicks on ads at Facebook. In: Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, pp. 1–9 (2014)

    Google Scholar 

  19. Henning, J.L.: SPEC CPU2006 benchmark descriptions. ACM SIGARCH Comput. Archit. News 34(4), 1–17 (2006)

    Article  Google Scholar 

  20. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  21. Liu, Y., Wei, X., Xiao, J., Liu, Z., Xu, Y., Tian, Y.: Energy consumption and emission mitigation prediction based on data center traffic and PUE for global data centers. Global Energy Interconnection 3(3), 272–282 (2020)

    Article  Google Scholar 

  22. Mattson, P., et al.: MLPerf training benchmark. Proc. Mach. Learn. Syst. 2, 336–349 (2020)

    Google Scholar 

  23. Miller, R.: The sustainability imperative: green data centers and our cloudy future. Tech. Rep., Data Center Frontier (2020)

    Google Scholar 

  24. Naumov, M., et al.: Deep learning recommendation model for personalization and recommendation systems. arXiv preprint arXiv:1906.00091 (2019)

  25. NVIDIA: https://docs.nvidia.com/cuda/profiler-users-guide/index.html

  26. NVIDIA: Nvidia deeplearningexamples (2019). https://github.com/NVIDIA/DeepLearningExamples

  27. Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: LibriSpeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210. IEEE (2015)

    Google Scholar 

  28. Shi, S., Wang, Q., Xu, P., Chu, X.: Benchmarking state-of-the-art deep learning software tools. In: 2016 7th International Conference on Cloud Computing and Big Data (CCBD), pp. 99–104. IEEE (2016)

    Google Scholar 

  29. Tang, F., et al.: AIBench training: balanced industry-standard AI training benchmarking. In: 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 24–35. IEEE (2021)

    Google Scholar 

  30. Tang, J., Wang, K.: Ranking distillation: learning compact ranking models with high performance for recommender system. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2289–2298 (2018)

    Google Scholar 

  31. Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  32. Wang, Y., et al.: Benchmarking the performance and energy efficiency of AI accelerators for AI training. In: 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), pp. 744–751. IEEE (2020)

    Google Scholar 

  33. Yao, C., et al.: Evaluating and analyzing the energy efficiency of CNN inference on high-performance GPU. Concurr. Comput. Pract. Exp. 33(6), e6064 (2021)

    Article  Google Scholar 

  34. Zhu, H., et al.: TBD: benchmarking and analyzing deep neural network training. arXiv preprint arXiv:1803.06905 (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jianfeng Zhan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, F. et al. (2023). EAIBench: An Energy Efficiency Benchmark for AI Training. In: Gainaru, A., Zhang, C., Luo, C. (eds) Benchmarking, Measuring, and Optimizing. Bench 2022. Lecture Notes in Computer Science, vol 13852. Springer, Cham. https://doi.org/10.1007/978-3-031-31180-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-31180-2_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-31179-6

  • Online ISBN: 978-3-031-31180-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics