research-article

Characterizing the Performance of Accelerated Jetson Edge Devices for Training Deep Learning Models

Authors:

Prashanthi S.K,

Sai Anuroop Kesanapalli,

Yogesh SimmhanAuthors Info & Claims

Proceedings of the ACM on Measurement and Analysis of Computing Systems, Volume 6, Issue 3

Article No.: 44, Pages 1 - 26

https://doi.org/10.1145/3570604

Published: 08 December 2022 Publication History

Abstract

Deep Neural Networks (DNNs) have had a significant impact on domains like autonomous vehicles and smart cities through low-latency inferencing on edge computing devices close to the data source. However, DNN training on the edge is poorly explored. Techniques like federated learning and the growing capacity of GPU-accelerated edge devices like NVIDIA Jetson motivate the need for a holistic characterization of DNN training on the edge. Training DNNs is resource-intensive and can stress an edge's GPU, CPU, memory and storage capacities. Edge devices also have different resources compared to workstations and servers, such as slower shared memory and diverse storage media. Here, we perform a principled study of DNN training on individual devices of three contemporary Jetson device types: AGX Xavier, Xavier NX and Nano for three diverse DNN model--dataset combinations. We vary device and training parameters such as I/O pipelining and parallelism, storage media, mini-batch sizes and power modes, and examine their effect on CPU and GPU utilization, fetch stalls, training time, energy usage, and variability. Our analysis exposes several resource inter-dependencies and counter-intuitive insights, while also helping quantify known wisdom. Our rigorous study can help tune the training performance on the edge, trade-off time and energy usage on constrained devices, and even select an ideal edge hardware for a DNN workload, and, in future, extend to federated learning too. As an illustration, we use these results to build a simple model to predict the training time and energy per epoch for any given DNN across different power modes, with minimal additional profiling.

References

[1]

Hazem A. Abdelhafez, Hassan Halawa, Karthik Pattabiraman, and Matei Ripeanu. 2021. Snowflakes at the Edge: A Study of Variability among NVIDIA Jetson AGX Xavier Boards. In ACM EdgeSys Workshop.

Digital Library

[2]

Hazem A. Abdelhafez and Matei Ripeanu. 2019. Studying the Impact of CPU and Memory Controller Frequencies on Power Consumption of the Jetson TX1. In IEEE Intl. Conf. on Fog and Mobile Edge Comp. (FMEC).

[3]

Assemblyai. 2022. TF v/s Pytorch. https://www.assemblyai.com/blog/pytorch-vs-tensorflow-in-2022/.

[4]

S. Baller, A. Jindal, M. Chadha, and M. Gerndt. 2021. DeepEdgeBench: Benchmarking Deep Neural Networks on Edge Devices. In IEEE International Conference on Cloud Engineering.

[5]

Yoshua Bengio. 2012. Practical recommendations for gradient-based training of deep architectures. In Neural networks: Tricks of the trade. Springer, 437--478.

Digital Library

[6]

Daniel J. Beutel, Taner Topal, Akhil Mathur, Xinchi Qiu, Javier Fernandez-Marques, Yan Gao, Lorenzo Sani, Kwing Hei Li, Titouan Parcollet, Pedro Porto Buarque de Gusmão, and Nicholas D. Lane. 2020. Flower: A friendly federated learning research framework. arXiv preprint arXiv:2007.14390 (2020).

[7]

Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloé Kiddon, Jakub Konený, Stefano Mazzocchi, Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, and Jason Roselander. 2019. Towards Federated Learning at Scale: System Design. In Proceedings of Machine Learning and Systems, A. Talwalkar, V. Smith, and M. Zaharia (Eds.), Vol. 1. 374--388. https://proceedings.mlsys.org/paper/2019/file/ bd686fd640be98efaae0091fa301e613-Paper.pdf

[8]

Shubham Chandel. 2022. Pytorch Model Summary. https://github.com/sksq96/pytorch-summary.

[9]

Zachary Charles, Zachary Garrett, Zhouyuan Huo, Sergei Shmulyian, and Virginia Smith. 2021. On large-cohort training for federated learning. Advances in Neural Information Processing Systems 34 (2021).

[10]

Jiasi Chen and Xukan Ran. 2019. Deep Learning With Edge Computing: A Review. Proc. IEEE 107, 8 (2019), 1655--1674. https://doi.org/10.1109/JPROC.2019.2921977

[11]

John Chen, Cameron Wolfe, Zhao Li, and Anastasios Kyrillidis. 2022. Demon: Improved Neural Network Training with Momentum Decay. In ICASSP 2022--2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3958--3962.

[12]

Qi Chen, Wei Wang, Fangyu Wu, Suparna De, Ruili Wang, Bailing Zhang, and Xin Huang. 2019. A survey on an emerging area: Deep learning for smart city data. IEEE Transactions on Emerging Topics in Computational Intelligence (2019).

[13]

Xiaohan Ding, Guiguang Ding, Jungong Han, and Sheng Tang. 2018. Auto-balanced filter pruning for efficient convolutional neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.

[14]

Noah Golmant, Nikita Vemuri, Zhewei Yao, Vladimir Feinberg, Amir Gholami, Kai Rothauge, Michael W Mahoney, and Joseph Gonzalez. 2018. On the computational inefficiency of large batch sizes for stochastic gradient descent. arXiv preprint arXiv:1811.12941 (2018).

[15]

Google. 2022. Dev Board datasheet. https://coral.ai/docs/dev-board/datasheet/.

[16]

Google. 2022. Google Coral Products. https://coral.ai/products/.

[17]

Hassan Halawa, Hazem A. Abdelhafez, Andrew Boktor, and Matei Ripeanu. 2017. NVIDIA Jetson Platform Characterization. In Euro-Par 2017: Parallel Processing. Springer International Publishing, Cham, 92--105.

[18]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[19]

Stephan Holly, Alexander Wendt, and Martin Lechner. 2020. Profiling Energy Consumption of Deep Neural Networks on NVIDIA Jetson Nano. In 2020 11th International Green and Sustainable Computing Workshops (IGSC). 1--6. https: //doi.org/10.1109/IGSC51522.2020.9290876

[20]

Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, and Hartwig Adam. 2019. Searching for MobileNetV3. In IEEE/CVF International Conference on Computer Vision (ICCV). IEEE.

[21]

Intel. 2022. Intel Movidius VPUs. https://www.intel.com/content/www/us/en/products/details/processors/movidiusvpu.html.

[22]

Sumin Kim, Seunghwan Oh, and Youngmin Yi. 2021. Minimizing GPU Kernel Launch Overhead in Deep Learning Inference on Mobile GPUs. In Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications (Virtual, United Kingdom) (HotMobile '21). Association for Computing Machinery, New York, NY, USA, 57--63. https://doi.org/10.1145/3446382.3448606

Digital Library

[23]

Dimitrios Kollias et al. 2018. Dimitrios Kollias and Athanasios Tagaris and Andreas Stafylopatis and Stefanos Kollias and Georgios Tagaris. Complex & Intelligent Systems (2018).

[24]

Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. (2009).

[25]

Navjot Kukreja, Alena Shilova, Olivier Beaumont, Jan Huckelheim, Nicola Ferrier, Paul Hovland, and Gerard Gorman. 2019. Training on the Edge: The why and the how. In 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 899--903. https://doi.org/10.1109/IPDPSW.2019.00148

[26]

Abhishek Vijaya Kumar and Muthian Sivathanu. 2020. Quiver: An informed storage cache for deep learning. In 18th {USENIX} Conference on File and Storage Technologies ({FAST} 20).

[27]

Sampo Kuutti, Richard Bowden, Yaochu Jin, Phil Barber, and Saber Fallah. 2020. A survey of deep learning applications to autonomous vehicle control. IEEE Transactions on Intelligent Transportation Systems (2020).

[28]

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324. https://doi.org/10.1109/5.726791

[29]

Jie Liu, Jiawen Liu, Wan Du, and Dong Li. 2019. Performance Analysis and Characterization of Training Deep Learning Models on Mobile Device. In 2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS). 506--515. https://doi.org/10.1109/ICPADS47876.2019.00077

[30]

man page. 2021. iostat. https://man7.org/linux/man-pages/man1/iostat.1.html.

[31]

man pages. 2021. vmtouch. https://linux.die.net/man/8/vmtouch.

[32]

Dominic Masters and Carlo Luschi. 2018. Revisiting small batch training for deep neural networks. arXiv preprint arXiv:1804.07612 (2018).

[33]

Peter Mattson, Christine Cheng, Gregory Diamos, Cody Coleman, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, David Brooks, Dehao Chen, Debo Dutta, Udit Gupta, Kim Hazelwood, Andy Hock, Xinyuan Huang, Daniel Kang, David Kanter, Naveen Kumar, Jeffery Liao, Deepak Narayanan, Tayo Oguntebi, Gennady Pekhimenko, Lillian Pentecost, Vijay Janapa Reddi, Taylor Robie, Tom St John, Carole-Jean Wu, Lingjie Xu, Cliff Young, and Matei Zaharia. 2020. MLPerf Training Benchmark, Vol. 2. 336--349. https://proceedings.mlsys.org/ paper/2020/file/02522a2b2726fb0a03bb19f2d8d9524d-Paper.pdf

[34]

Jayashree Mohan, Amar Phanishayee, Ashish Raniwala, and Vijay Chidambaram. 2021. Analyzing and mitigating data stalls in DNN training. Proceedings of the VLDB Endowment (2021).

Digital Library

[35]

Nvidia. 2021. Jetson AGX Xavier Developer Kit. https://developer.nvidia.com/embedded/jetson-agx-xavier-developerkit.

[36]

Nvidia. 2021. Jetson Nano Developer Kit. https://developer.nvidia.com/embedded/jetson-nano-developer-kit.

[37]

Nvidia. 2021. Jetson NX Xavier Developer Kit. https://developer.nvidia.com/embedded/jetson-xavier-nx.

[38]

Nvidia. 2021. Power modes for Nano. https://docs.nvidia.com/jetson/archives/l4t-archived/l4t-3261/index.html#page/ Tegra%20Linux%20Driver%20Package%20Development%20Guide/power_management_nano.html#.

[39]

Nvidia. 2021. Power modes for NX and AGX. https://docs.nvidia.com/jetson/archives/l4t-archived/l4t-3261/index.html# page/Tegra%20Linux%20Driver%20Package%20Development%20Guide/power_management_jetson_xavier.html#.

[40]

Nvidia. 2021. Technical Brief: Nvidia Jetson AGX Orin. https://www.nvidia.com/content/dam/en-zz/Solutions/gtcf21/ jetson-orin/nvidia-jetson-agx-orin-technical-brief.pdf.

[41]

Nvidia. 2021. tegrastats. https://docs.nvidia.com/jetson/archives/l4t-archived/l4t-3231/index.html#page/Tegra% 20Linux%20Driver%20Package%20Development%20Guide/AppendixTegraStats.html.

[42]

Nvidia. 2022. Jetson AGX Orin Developer Kit. https://www.nvidia.com/en-us/autonomous-machines/embeddedsystems/jetson-orin/.

[43]

papers with code. 2021. Mobilenet V3. https://paperswithcode.com/lib/torchvision/mobilenet-v3.

[44]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, Vol. 32. https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf

[45]

T Prabhakar, Nisha Bhaskar, Tejas Pande, and Chaitanya Kulkarni. 2014. Joule Jotter: An interactive energy meter for metering, monitoring and control. In International Workshop on Demand Response, co-located with the ACM e-Energy.

[46]

pytorch. 2021. TORCH.UTILS.DATA. https://pytorch.org/docs/stable/data.html.

[47]

PyTorch. 2022. Cuda event. https://pytorch.org/docs/stable/generated/torch.cuda.Event.html.

[48]

Prashanthi S. K, Aakash Khochare, Sai Anuroop Kesanapalli, Rahul Bhope, and Yogesh Simmhan. 2022. Workshop on Parallel AI and Systems for the Edge - PAISE. In 2022 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum (IPDPSW).

[49]

Christopher J Shallue, Jaehoon Lee, Joseph Antognini, Jascha Sohl-Dickstein, Roy Frostig, and George E Dahl. 2018. Measuring the effects of data parallelism on neural network training. arXiv preprint arXiv:1811.03600 (2018).

[50]

Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. https://doi.org/10.48550/ARXIV.1409.1556

[51]

Vladislav Sovrasov. 2021. Flops counter. https://pypi.org/project/ptflops/.

[52]

TensorFlow. 2022. TFF GLDv2. https://www.tensorflow.org/federated/api_docs/python/tff/simulation/datasets/gldv2/ load_data

[53]

Rik van Riel. 2001. Page Replacement in Linux 2.4 Memory Management. In 2001 USENIX Annual Technical Conference (USENIX ATC 01). USENIX Association, Boston, MA. https://www.usenix.org/conference/2001-usenix-annualtechnical-conference/page-replacement-linux-24-memory-management

[54]

Yu Wang, Gu-Yeon Wei, and David Brooks. 2019. Benchmarking tpu, gpu, and cpu platforms for deep learning. arXiv preprint arXiv:1907.10701 (2019).

[55]

Tobias Weyand, Andre Araujo, Bingyi Cao, and Jack Sim. 2020. Google landmarks dataset v2-a large-scale benchmark for instance-level recognition and retrieval. In IEEE/CVF conference on computer vision and pattern recognition.

[56]

Zhi Zhou, Xu Chen, En Li, Liekang Zeng, Ke Luo, and Junshan Zhang. 2019. Edge intelligence: Paving the last mile of artificial intelligence with edge computing. Proc. IEEE (2019)

Cited By

Rachuri SShaik NChoksi MGandhi A(2024)EcoEdgeInfer: Dynamically Optimizing Latency and Sustainability for Inference on Edge Devices2024 IEEE/ACM Symposium on Edge Computing (SEC)10.1109/SEC62691.2024.00023(191-205)Online publication date: 4-Dec-2024
https://doi.org/10.1109/SEC62691.2024.00023
Luzón MRodríguez-Barroso NArgente-Garrido AJiménez-López DMoyano JDel Ser JDing WHerrera F(2024)A Tutorial on Federated Learning from Theory to Practice: Foundations, Software Frameworks, Exemplary Use Cases, and Selected TrendsIEEE/CAA Journal of Automatica Sinica10.1109/JAS.2024.12421511:4(824-850)Online publication date: Apr-2024
https://doi.org/10.1109/JAS.2024.124215
Guo ZPerminov SKonenkov MTsetserukou D(2024)HawkDrive: A Transformer-driven Visual Perception System for Autonomous Driving in Night Scene2024 IEEE Intelligent Vehicles Symposium (IV)10.1109/IV55156.2024.10588447(2598-2603)Online publication date: 2-Jun-2024
https://doi.org/10.1109/IV55156.2024.10588447
Show More Cited By

Index Terms

Characterizing the Performance of Accelerated Jetson Edge Devices for Training Deep Learning Models
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
  2. Embedded and cyber-physical systems
2. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
  2. Parallel computing methodologies

Recommendations

Characterizing the Performance of Accelerated Jetson Edge Devices for Training Deep Learning Models
SIGMETRICS '23: Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems

Deep Neural Network (DNN) models are becoming ubiquitous in a variety of contemporary domains such as Autonomous Vehicles, Smart cities and Healthcare. They help drones to navigate, identify suspicious activities from safety cameras, and perform ...
Characterizing the Performance of Accelerated Jetson Edge Devices for Training Deep Learning Models
SIGMETRICS '23

Deep Neural Network (DNN) models are becoming ubiquitous in a variety of contemporary domains such as Autonomous Vehicles, Smart cities and Healthcare. They help drones to navigate, identify suspicious activities from safety cameras, and perform ...
PowerTrain: Fast, generalizable time and power prediction models to optimize DNN training on accelerated edges
Abstract
Accelerated edge devices, like Nvidia’s Jetson with 1000+ CUDA cores, are increasingly used for DNN training and federated learning, rather than just for inferencing workloads. A unique feature of these compact devices is their fine-grained ...
Highlights
- ML-based prediction models for estimating runtime and power of edge DNN training.
- Demonstrate the generalizability of prediction models.
- Optimization of training time using the Pareto front.

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems

Proceedings of the ACM on Measurement and Analysis of Computing Systems Volume 6, Issue 3

POMACS

December 2022

534 pages

EISSN:2476-1249

DOI:10.1145/3576048

Editors:
Augustin Chaintreau
Columbia University
,
Leana Golubchik
University of Southern California, United States
,
Zhi-Li Zhang
University of Minnesota, United States

Issue’s Table of Contents

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2022

Published in POMACS Volume 6, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Ministry of Education, India/PMRF
Department of Science and Technology, India/ICPS

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
888
Total Downloads

Downloads (Last 12 months)339
Downloads (Last 6 weeks)24

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Rachuri SShaik NChoksi MGandhi A(2024)EcoEdgeInfer: Dynamically Optimizing Latency and Sustainability for Inference on Edge Devices2024 IEEE/ACM Symposium on Edge Computing (SEC)10.1109/SEC62691.2024.00023(191-205)Online publication date: 4-Dec-2024
https://doi.org/10.1109/SEC62691.2024.00023
Luzón MRodríguez-Barroso NArgente-Garrido AJiménez-López DMoyano JDel Ser JDing WHerrera F(2024)A Tutorial on Federated Learning from Theory to Practice: Foundations, Software Frameworks, Exemplary Use Cases, and Selected TrendsIEEE/CAA Journal of Automatica Sinica10.1109/JAS.2024.12421511:4(824-850)Online publication date: Apr-2024
https://doi.org/10.1109/JAS.2024.124215
Guo ZPerminov SKonenkov MTsetserukou D(2024)HawkDrive: A Transformer-driven Visual Perception System for Autonomous Driving in Night Scene2024 IEEE Intelligent Vehicles Symposium (IV)10.1109/IV55156.2024.10588447(2598-2603)Online publication date: 2-Jun-2024
https://doi.org/10.1109/IV55156.2024.10588447
Rachmanto RSukma ZNabhaan ASetyanto AJiang TKim I(2024)Characterizing Deep Learning Model Compression with Post-Training Quantization on Accelerated Edge Devices2024 IEEE International Conference on Edge Computing and Communications (EDGE)10.1109/EDGE62653.2024.00023(110-120)Online publication date: 7-Jul-2024
https://doi.org/10.1109/EDGE62653.2024.00023
Zheng YCui LTso FLi ZJia W(2024)DNN acceleration in vehicle edge computing with mobility-awarenessComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2024.110607251:COnline publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1016/j.comnet.2024.110607
Ariffin AZaki FAnuar N(2023)Leveraging Federated Learning and XAI for Privacy-Aware and Lightweight Edge Training in Network Traffic Classification2023 IEEE International Conference on Computing (ICOCO)10.1109/ICOCO59262.2023.10397836(47-52)Online publication date: 9-Oct-2023
https://doi.org/10.1109/ICOCO59262.2023.10397836
Yokoyama AFerro Mde Paula FVieira VSchulze B(2023)Investigating hardware and software aspects in the energy consumption of machine learning: A green AI‐centric analysisConcurrency and Computation: Practice and Experience10.1002/cpe.782535:24Online publication date: Jun-2023
https://doi.org/10.1002/cpe.7825

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents