research-article

Enabling energy-efficient DNN training on hybrid GPU-FPGA accelerators

Authors:

Dong LiAuthors Info & Claims

ICS '21: Proceedings of the 35th ACM International Conference on Supercomputing

Pages 227 - 241

https://doi.org/10.1145/3447818.3460371

Published: 04 June 2021 Publication History

Abstract

DNN training consumes orders of magnitude more energy than inference and requires innovative use of accelerators to improve energy-efficiency. However, despite having complementary features, GPUs and FPGAs have been mostly used independently for the entire training process, thus neglecting the opportunity in assigning individual but distinct operations to the most suitable hardware. In this paper, we take the initiative to explore new opportunities and viable solutions in enabling energy-efficient DNN training on hybrid accelerators. To overcome fundamental challenges including avoiding training throughput loss, enabling fast design space exploration, and efficient scheduling, we propose a comprehensive framework, Hype-training, that utilizes a combination of offline characterization, performance modeling, and online scheduling of individual operations. Experimental tests using NVIDIA V100 GPUs and Intel Stratix 10 FPGAs show that, Hype-training is able to exploit a mixture of GPUs and FPGAs at a fine granularity to achieve significant energy reduction, by 44.3% on average and up to 59.7%, without any loss in training throughput. Hype-training can also enforce power caps more effectively than state-of-the-art power management mechanisms on GPUs.

References

[1]

2018. NVProf - NVIDIA Developer Documentation. https://docs.nvidia.com/cuda/profiler-users-guide/index.html.

[2]

2018. Optimize TensorFlow performance using the Profiler. https://www.tensorflow.org/guide/profiler.

[3]

2019. BERT, RoBERTa, DistilBERT, XLNet --- which one to use? https://towardsdatascience.com/bert-roberta-distilbert-xlnet-which-one-to-use-3d5ab82ba5f8.

[4]

2019. CUDA Toolkit Documentation v10.1. https://developer.nvidia.com/cuda-toolkit-archive.

[5]

2019. Intel’s RunningAverage Power Limit (RAPL) interface. https://01.org/rapl-power-meter.

[6]

2020. NVIDIA Data Center Deep Learning Product Performance. https://developer.nvidia.com/deep-learning-performance-training-inference.

[7]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 265--283.

Digital Library

[8]

Ammar Ahmad Awan, Hari Subramoni, and Dhabaleswar K Panda. 2017. An in-depth performance characterization of CPU-and GPU-based DNN training on modern architectures. In Proceedings of the Machine Learning on HPC Environments. 1--8.

Digital Library

[9]

Utku Aydonat, Shane O'Connell, Davor Capalija, Andrew C Ling, and Gordon R Chiu. 2017. An opencl™ deep learning accelerator on arria 10. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 55--64.

Digital Library

[10]

Ray Bittner, Erik Ruf, and Alessandro Forin. 2014. Direct GPU/FPGA communication via PCI express. Cluster Computing 17, 2 (2014), 339--348.

Digital Library

[11]

Martin Burtscher, Ivan Zecena, and Ziliang Zong. 2014. Measuring GPU power with the K20 built-in sensor. In Proceedings of Workshop on General Purpose Processing Using GPUs. 28--36.

Digital Library

[12]

Deming Chen, Jason Cong, Yiping Fan, and Zhiru Zhang. 2007. High-level power estimation and low-power design space exploration for FPGAs. In 2007 Asia and South Pacific Design Automation Conference. IEEE, 529--534.

Digital Library

[13]

Xiaoming Chen, Danny Z Chen, and Xiaobo Sharon Hu. 2018. moDNN: Memory optimal DNN training on GPUs. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 13--18.

[14]

Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014).

[15]

Jeff Dean. 2019. Google AI chief Jeff Dean interview: Machine learning trends in 2020. https://venturebeat.com/2019/12/13/google-ai-chief-jeff-dean-interview-machine-learning-trends-in-2020/.

[16]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.

[17]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[18]

Mariza Ferro, André Yokoyama, Vinicius Klôh, Gabrieli Silva, Rodrigo Gandra, Ricardo Bragança, Andre Bulcao, and Bruno Schulze. 2017. Analysis of GPU power consumption using internal sensors. In Anais do XVI Workshop em Desempenho de Sistemas Computacionais e de Comunicação. SBC.

[19]

Sean Fox, Julian Faraone, David Boland, Kees Vissers, and Philip H. W. Leong. 2019. Training Deep Neural Networks in Low-Precision with High Accuracy Using FPGAs. In 2019 International Conference on Field-Programmable Technology (ICFPT).

[20]

Rong Ge, Ryan Vogt, Jahangir Majumder, Arif Alam, Martin Burtscher, and Ziliang Zong. 2013. Effects of dynamic voltage and frequency scaling on a k20 gpu. In 2013 42nd International Conference on Parallel Processing. IEEE, 826--833.

Digital Library

[21]

Tong Geng, Tianqi Wang, Ahmed Sanaullah, Chen Yang, Rui Xu, Rushi Patel, and Martin Herbordt. 2018. FPDeep: Acceleration and load balancing of CNN training on FPGA clusters. In 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 81--84.

[22]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[23]

Xin He, Yapeng Yao, Zhiwen Chen, Jianhua Sun, and Hao Chen. 2021. Efficient parallel A* search on multi-GPU system. Future Generation Computer Systems (2021).

[24]

Mark Hildebrand, Jawad Khan, Sanjeev Trika, Jason Lowe-Power, and Venkatesh Akella. 2020. AutoTM: Automatic Tensor Movement in Heterogeneous Memory Systems using Integer Linear Programming. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 875--890.

Digital Library

[25]

Myeongjae Jeon, Shivaram Venkataraman, Amar Phanishayee, Junjie Qian, Wencong Xiao, and Fan Yang. 2019. Analysis of large-scale multi-tenant {GPU} clusters for {DNN} training workloads. In 2019 USENIX Annual Technical Conference. 947--960.

Digital Library

[26]

Myeongjae Jeon, Shivaram Venkataraman, Junjie Qian, Amar Phanishayee, Wencong Xiao, and Fan Yang. 2018. Multi-tenant gpu clusters for deep learning workloads: Analysis and implications. Technical report, Microsoft Research (2018).

[27]

Toshiya Komoda, Shingo Hayashi, Takashi Nakada, Shinobu Miwa, and Hiroshi Nakamura. 2013. Power capping of CPU-GPU heterogeneous systems through coordinating DVFS and task mapping. In 2013 IEEE 31st International Conference on computer design (ICCD). IEEE, 349--356.

[28]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.

[29]

Teng Li, Vikram K Narayana, and Tarek El-Ghazawi. 2015. A power-aware symbiotic scheduling algorithm for concurrent GPU kernels. In 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS). IEEE, 562--569.

[30]

Yang Li, Charles R Lefurgy, Karthick Rajamani, Malcolm S Allen-Ware, Guillermo J Silva, Daniel D Heimsoth, Saugata Ghose, and Onur Mutlu. 2019. A scalable priority-aware approach to managing data center server power. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 701--714.

[31]

Shuang Liang, Shouyi Yin, Leibo Liu, Wayne Luk, and Shaojun Wei. 2018. FP-BNN: Binarized neural network on FPGA. Neurocomputing 275 (2018), 1072--1086.

Digital Library

[32]

Jiawen Liu, Dong Li, Gokcen Kestor, and Jeffrey Vetter. 2019. Runtime concurrency control and operation scheduling for high performance neural network training. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 188--199.

[33]

Jiawen Liu, Hengyu Zhao, Matheus A Ogleari, Dong Li, and Jishen Zhao. 2018. Processing-in-memory for energy-efficient neural network training: A heterogeneous approach. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 655--668.

Digital Library

[34]

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In Proceedings of International Conference on Computer Vision (ICCV).

Digital Library

[35]

Liqiang Lu, Yun Liang, Qingcheng Xiao, and Shengen Yan. 2017. Evaluating fast algorithms for convolutional neural networks on FPGAs. In 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 101--108.

[36]

Qinyi Luo, Jiaao He, Youwei Zhuo, and Xuehai Qian. 2020. Prague: High-Performance Heterogeneity-Aware Asynchronous Decentralized Training. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 401--416.

Digital Library

[37]

Kai Ma, Xue Li, Wei Chen, Chi Zhang, and Xiaorui Wang. 2012. Greengpu: A holistic approach to energy efficiency in gpu-cpu heterogeneous architectures. In 2012 41st International Conference on Parallel Processing. IEEE, 48--57.

Digital Library

[38]

Ruben Mayer. 2018. Simulation of TensorFlow partitioning and scheduling strategies. https://github.com/mayerrn/tensorflowPartitioningAndScheduling.

[39]

Ruben Mayer, Christian Mayer, and Larissa Laich. 2017. The tensorflow partitioning and scheduling problem: it's the critical path!. In Proceedings of the 1st Workshop on Distributed Infrastructures for Deep Learning. 1--6.

Digital Library

[40]

Xinxin Mei, Ling Sing Yung, Kaiyong Zhao, and Xiaowen Chu. 2013. A measurement study of GPU DVFS on energy conservation. In Proceedings of the Workshop on Power-Aware Computing and Systems. 1--5.

Digital Library

[41]

Azalia Mirhoseini, Anna Goldie, Hieu Pham, Benoit Steiner, Quoc V Le, and Jeff Dean. 2018. A hierarchical model for device placement. In International Conference on Learning Representations.

[42]

Azalia Mirhoseini, Hieu Pham, Quoc V Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, and Jeff Dean. 2017. Device placement optimization with reinforcement learning. In The 34th International Conference on Machine Learning (ICML).

Digital Library

[43]

Deepak Narayanan, Aaron Harlap, Amar Phanishayee, Vivek Seshadri, Nikhil R Devanur, Gregory R Ganger, Phillip B Gibbons, and Matei Zaharia. 2019. PipeDream: generalized pipeline parallelism for DNN training. In Proceedings of the 27th ACM Symposium on Operating Systems Principles. 1--15.

Digital Library

[44]

Ripal Nathuji, Karsten Schwan, Ankit Somani, and Yogendra Joshi. 2009. VPM tokens: virtual machine-aware power budgeting in datacenters. Cluster computing 12, 2 (2009), 189--203.

[45]

Eriko Nurvitadhi, Ganesh Venkatesh, Jaewoong Sim, Debbie Marr, Randy Huang, Jason Ong Gee Hock, Yeong Tat Liew, Krishnan Srivatsan, Duncan Moss, Suchit Subhaschandra, et al. 2017. Can FPGAs beat GPUs in accelerating next-generation deep neural networks?. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 5--14.

Digital Library

[46]

Nvidia. 2018. NVIDIA System Management Interface. https://developer.nvidia.com/nvidia-system-management-interface.

[47]

SL Pinjare and Arun Kumar. 2012. Implementation of neural network back propagation training algorithm on FPGA. International journal of computer applications 52, 6 (2012).

[48]

Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, et al. 2016. Going deeper with embedded fpga platform for convolutional neural network. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 26--35.

Digital Library

[49]

Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015).

[50]

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, 2383--2392.

[51]

Christopher J Shallue, Jaehoon Lee, Joseph Antognini, Jascha Sohl-Dickstein, Roy Frostig, and George E Dahl. 2018. Measuring the effects of data parallelism on neural network training. arXiv preprint arXiv:1811.03600 (2018).

[52]

Jiayi Sheng, Chen Yang, Ahmed Sanaullah, Michael Papamichael, Adrian Caulfield, and Martin C Herbordt. 2017. HPC on FPGA clouds: 3D FFTs and implications for molecular dynamics. In 2017 27th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 1--4.

[53]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[54]

Muthian Sivathanu, Tapan Chugh, Sanjay S Singapuram, and Lidong Zhou. 2019. Astra: Exploiting predictability to optimize deep learning. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 909--923.

Digital Library

[55]

Nikko Strom. 2015. Scalable distributed DNN training using commodity GPU cloud computing. In Sixteenth Annual Conference of the International Speech Communication Association.

[56]

Peng Sun, Wansen Feng, Ruobing Han, Shengen Yan, and Yonggang Wen. 2019. Optimizing network performance for distributed dnn training on gpu clusters: Imagenet/alexnet training in 1.5 minutes. arXiv preprint arXiv:1902.06855 (2019).

[57]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2818--2826.

[58]

Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, and Tim Kraska. 2018. Superneurons: Dynamic GPU memory management for training deep neural networks. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 41--53.

Digital Library

[59]

Qiang Wang and Xiaowen Chu. 2018. GPGPU performance estimation with core and memory frequency scaling. In 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS). IEEE, 417--424.

[60]

Shuo Wang, Yun Liang, and Wei Zhang. 2017. Flexcl: An analytical performance model for opencl workloads on flexible fpgas. In 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC). IEEE, 1--6.

Digital Library

[61]

Zeke Wang, Bingsheng He, Wei Zhang, and Shunning Jiang. 2016. A performance analysis framework for optimizing OpenCL applications on FPGAs. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 114--125.

[62]

Zhen Xie, Zheng Cao, Zhan Wang, Dawei Zang, En Shao, and Ninghui Sun. 2016. Modeling traffic of big data platform for large scale datacenter networks. In 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS). IEEE, 224--231.

[63]

Zhen Xie, Wenqian Dong, Jiawen Liu, Hang Liu, and Dong Li. 2021. Tahoe: tree structure-aware high performance inference engine for decision tree ensemble on GPU. In Proceedings of the Sixteenth European Conference on Computer Systems. 426--440.

Digital Library

[64]

Zhen Xie, Guangming Tan, Weifeng Liu, and Ninghui Sun. 2019. IA-SpGEMM: An input-aware auto-tuning framework for parallel sparse matrix-matrix multiplication. In Proceedings of the ACM International Conference on Supercomputing. 94--105.

Digital Library

[65]

Omry Yadan, Keith Adams, Yaniv Taigman, and Marc'Aurelio Ranzato. 2013. Multi-gpu training of convnets. arXiv preprint arXiv:1312.5853 (2013).

[66]

Masafumi Yamazaki, Akihiko Kasagi, Akihiro Tabuchi, Takumi Honda, Masahiro Miwa, Naoto Fukumoto, Tsuguchika Tabaru, Atsushi Ike, and Kohta Nakashima. 2019. Yet another accelerated sgd: Resnet-50 training on imagenet in 74.7 seconds. arXiv preprint arXiv:1903.12650 (2019).

[67]

Pengyuan Yu and Patrick Schaumont. 2007. Secure FPGA circuits using controlled placement and routing. In Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis. 45--50.

Digital Library

[68]

Reda Zhan, Xin and Sherief. 2013. Techniques for energy-efficient power budgeting in data centers. In 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC). IEEE, 1--7.

Digital Library

[69]

Xin Zhan and Sherief Reda. 2014. Power budgeting techniques for data centers. IEEE Trans. Comput. 64, 8 (2014), 2267--2278.

Digital Library

[70]

Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 161--170.

Digital Library

[71]

Huazhe Zhang and Henry Hoffmann. 2016. Maximizing performance under a power cap: A comparison of hardware, software, and hybrid techniques. ACM SIGPLAN Notices 51, 4 (2016), 545--559.

Digital Library

[72]

Jialiang Zhang and Jing Li. 2017. Improving the performance of OpenCL-based FPGA accelerator for convolutional neural network. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 25--34.

Digital Library

[73]

Xufan Zhang, Ziyue Yin, Yang Feng, Qingkai Shi, Jia Liu, and Zhenyu Chen. 2019. NeuralVis: Visualizing and Interpreting Deep Learning Models. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1106--1109.

[74]

Wenlai Zhao, Haohuan Fu, Wayne Luk, Teng Yu, Shaojun Wang, Bo Feng, Yuchun Ma, and Guangwen Yang. 2016. F-CNN: An FPGA-based framework for training Convolutional Neural Networks. In 2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 107--114.

Cited By

Hassani Sadi MSudarshan CWehn N(2024)Novel adaptive quantization methodology for 8-bit floating-point DNN trainingDesign Automation for Embedded Systems10.1007/s10617-024-09282-228:2(91-110)Online publication date: 16-Feb-2024
https://doi.org/10.1007/s10617-024-09282-2
Wu YWu JYao MLiu BChen LLam S(2023)Two-Level Scheduling Algorithms for Deep Neural Network Inference in Vehicular NetworksIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.326679524:9(9324-9343)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1109/TITS.2023.3266795
Koszczał GDobrosolski JMatuszek MCzarnul P(2023)Performance and Energy Aware Training of a Deep Neural Network in a Multi-GPU Environment with Power CappingEuro-Par 2023: Parallel Processing Workshops10.1007/978-3-031-48803-0_1(5-16)Online publication date: 28-Aug-2023
https://dl.acm.org/doi/10.1007/978-3-031-48803-0_1
Show More Cited By

Index Terms

Enabling energy-efficient DNN training on hybrid GPU-FPGA accelerators
1. Computer systems organization
  1. Architectures
2. Hardware
  1. Power and energy

Recommendations

Performance and toolchain of a combined GPU/FPGA desktop (abstract only)
FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Low-power, high-performance computing nowadays relies on accelerator cards to speed up the calculations. Combining the power of GPUs with the flexibility of FPGAs enlarges the scope of problems that can be accelerated [2, 3]. We describe the performance ...
Optimization schemes and performance evaluation of Smith–Waterman algorithm on CPU, GPU and FPGA

With fierce competition between CPU and graphics processing unit (GPU) platforms, performance evaluation has become the focus of various sectors. In this paper, we take a well-known algorithm in the field of biosequence matching and database searching, ...
Efficient parallel implementation of three-point viterbi decoding algorithm on CPU, GPU, and FPGA

In wireless communication, Viterbi decoding algorithm VDA is the one of most popular channel decoding algorithms, which is widely used in WLAN, WiMAX, or 3G communications. However, the throughput of Viterbi decoder is constrained by the convolutional ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '21: Proceedings of the 35th ACM International Conference on Supercomputing

June 2021

506 pages

ISBN:9781450383356

DOI:10.1145/3447818

General Chairs:
Huiyang Zhou
North Carolina State University
,
Jose Moreira
IBM Research
,
Program Chairs:
Frank Mueller
North Carolina State University
,
Yoav Etsion
Technion

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key Research and Development Program of China
National Science Foundation of China

Conference

ICS '21

Sponsor:

SIGARCH

ICS '21: 2021 International Conference on Supercomputing

June 14 - 17, 2021

Virtual Event, USA

Acceptance Rates

ICS '21 Paper Acceptance Rate 39 of 157 submissions, 25%;

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
522
Total Downloads

Downloads (Last 12 months)103
Downloads (Last 6 weeks)19

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hassani Sadi MSudarshan CWehn N(2024)Novel adaptive quantization methodology for 8-bit floating-point DNN trainingDesign Automation for Embedded Systems10.1007/s10617-024-09282-228:2(91-110)Online publication date: 16-Feb-2024
https://doi.org/10.1007/s10617-024-09282-2
Wu YWu JYao MLiu BChen LLam S(2023)Two-Level Scheduling Algorithms for Deep Neural Network Inference in Vehicular NetworksIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.326679524:9(9324-9343)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1109/TITS.2023.3266795
Koszczał GDobrosolski JMatuszek MCzarnul P(2023)Performance and Energy Aware Training of a Deep Neural Network in a Multi-GPU Environment with Power CappingEuro-Par 2023: Parallel Processing Workshops10.1007/978-3-031-48803-0_1(5-16)Online publication date: 28-Aug-2023
https://dl.acm.org/doi/10.1007/978-3-031-48803-0_1
Xie ZRaskar SEmani MVishwanath V(2023)TrainBF: High-Performance DNN Training Engine Using BFloat16 on AI AcceleratorsEuro-Par 2023: Parallel Processing10.1007/978-3-031-39698-4_31(458-473)Online publication date: 28-Aug-2023
https://dl.acm.org/doi/10.1007/978-3-031-39698-4_31
Cho HLee JLee J(2022)FARNN: FPGA-GPU Hybrid Acceleration Platform for Recurrent Neural NetworksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.312412533:7(1725-1738)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.1109/TPDS.2021.3124125

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten