Abstract
The increase in computing power has prompted more considerable artificial intelligence (AI) model scales. From 341K multiply-accumulate operations (MACs) of LeNet-5 to 4.11G MACs of ResNet-50, the computational cost of image classification has increased by 10,000 times over two decades. On the other hand, it has inevitably brought about an increase in energy consumption, and benchmarking the energy efficiency of the modern AI workloads is also essential. Existing benchmarks, such as MLPerf and AIBench, focus on performance evaluation of AI computing, the time to the target accuracy (TTA) is the primary metric. Corresponding to the TTA metric, using the energy consumption, where the AI workload achieves the specific accuracy, is a straightforward energy measurement method. However, it is too time-consuming and power-hungry, which is unacceptable for energy efficiency benchmarking. This work introduces a new metric to quickly and accurately benchmark AI training workloads’ energy efficiency, called the Energy-Delay Product of one Epoch (EEDP). The EEDP is calculated based on the product of the energy and time consumption within one training epoch, where one epoch refers to one training cycle through the entire training dataset. It can reflect not only the energy consumption but also the time efficiency and suit the energy efficiency of the AI training workloads. Then, we introduce an AI training energy efficiency benchmark named EAIBench, which covers different energy efficiency dimensions, including dominant layers, computation intensities, and memory accesses. Our evaluation results demonstrate that EAIBench can provide reproducible and meaningful results in only dozens of minutes, which is hundreds of times faster than the existing AI training benchmark method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adolf, R., Rama, S., Reagen, B., Wei, G.Y., Brooks, D.: Fathom: reference workloads for modern deep learning methods. In: 2016 IEEE International Symposium on Workload Characterization (IISWC), pp. 1–10. IEEE (2016)
Akiba, T., Suzuki, S., Fukuda, K.: Extremely large minibatch SGD: training ResNet-50 on ImageNet in 15 minutes. arXiv preprint arXiv:1711.04325 (2017)
Amodei, D., et al.: Deep speech 2: end-to-end speech recognition in English and Mandarin. In: International Conference on Machine Learning, pp. 173–182. PMLR (2016)
Baidu: Deepbench: benchmarking deep learning operations on different hardware (2017). https://github.com/baidu-research/DeepBench
Cho, E., Myers, S.A., Leskovec, J.: Friendship and mobility: user movement in location-based social networks. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1082–1090 (2011)
Coleman, C., et al.: Analysis of dawnbench, a time-to-accuracy machine learning performance benchmark. ACM SIGOPS Oper. Syst. Rev. 53(1), 14–25 (2019)
Coleman, C., et al.: Dawnbench: an end-to-end deep learning benchmark and competition. Training 100(101), 102 (2017)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Gonzalez, R., Horowitz, M.: Energy dissipation in general purpose microprocessors. IEEE J. Solid-State Circuits 31(9), 1277–1284 (1996)
Goyal, P., et al.: Accurate, large minibatch SGD: training ImageNet in 1 hour. arXiv preprint arXiv:1706.02677 (2017)
Hajiamini, S., Shirazi, B.A.: A study of DVFS methodologies for multicore systems with islanding feature. In: Advances in Computers, vol. 119, pp. 35–71. Elsevier (2020)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
He, X., Liao, L., Zhang, H., Nie, L., Hu, X., Chua, T.S.: Neural collaborative filtering. In: Proceedings of the 26th International Conference on World Wide Web, pp. 173–182 (2017)
He, X., et al.: Practical lessons from predicting clicks on ads at Facebook. In: Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, pp. 1–9 (2014)
Henning, J.L.: SPEC CPU2006 benchmark descriptions. ACM SIGARCH Comput. Archit. News 34(4), 1–17 (2006)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, Y., Wei, X., Xiao, J., Liu, Z., Xu, Y., Tian, Y.: Energy consumption and emission mitigation prediction based on data center traffic and PUE for global data centers. Global Energy Interconnection 3(3), 272–282 (2020)
Mattson, P., et al.: MLPerf training benchmark. Proc. Mach. Learn. Syst. 2, 336–349 (2020)
Miller, R.: The sustainability imperative: green data centers and our cloudy future. Tech. Rep., Data Center Frontier (2020)
Naumov, M., et al.: Deep learning recommendation model for personalization and recommendation systems. arXiv preprint arXiv:1906.00091 (2019)
NVIDIA: https://docs.nvidia.com/cuda/profiler-users-guide/index.html
NVIDIA: Nvidia deeplearningexamples (2019). https://github.com/NVIDIA/DeepLearningExamples
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: LibriSpeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210. IEEE (2015)
Shi, S., Wang, Q., Xu, P., Chu, X.: Benchmarking state-of-the-art deep learning software tools. In: 2016 7th International Conference on Cloud Computing and Big Data (CCBD), pp. 99–104. IEEE (2016)
Tang, F., et al.: AIBench training: balanced industry-standard AI training benchmarking. In: 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 24–35. IEEE (2021)
Tang, J., Wang, K.: Ranking distillation: learning compact ranking models with high performance for recommender system. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2289–2298 (2018)
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Wang, Y., et al.: Benchmarking the performance and energy efficiency of AI accelerators for AI training. In: 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), pp. 744–751. IEEE (2020)
Yao, C., et al.: Evaluating and analyzing the energy efficiency of CNN inference on high-performance GPU. Concurr. Comput. Pract. Exp. 33(6), e6064 (2021)
Zhu, H., et al.: TBD: benchmarking and analyzing deep neural network training. arXiv preprint arXiv:1803.06905 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, F. et al. (2023). EAIBench: An Energy Efficiency Benchmark for AI Training. In: Gainaru, A., Zhang, C., Luo, C. (eds) Benchmarking, Measuring, and Optimizing. Bench 2022. Lecture Notes in Computer Science, vol 13852. Springer, Cham. https://doi.org/10.1007/978-3-031-31180-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-31180-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31179-6
Online ISBN: 978-3-031-31180-2
eBook Packages: Computer ScienceComputer Science (R0)