skip to main content
research-article

Characterizing the Performance of Accelerated Jetson Edge Devices for Training Deep Learning Models

Published: 08 December 2022 Publication History

Abstract

Deep Neural Networks (DNNs) have had a significant impact on domains like autonomous vehicles and smart cities through low-latency inferencing on edge computing devices close to the data source. However, DNN training on the edge is poorly explored. Techniques like federated learning and the growing capacity of GPU-accelerated edge devices like NVIDIA Jetson motivate the need for a holistic characterization of DNN training on the edge. Training DNNs is resource-intensive and can stress an edge's GPU, CPU, memory and storage capacities. Edge devices also have different resources compared to workstations and servers, such as slower shared memory and diverse storage media. Here, we perform a principled study of DNN training on individual devices of three contemporary Jetson device types: AGX Xavier, Xavier NX and Nano for three diverse DNN model--dataset combinations. We vary device and training parameters such as I/O pipelining and parallelism, storage media, mini-batch sizes and power modes, and examine their effect on CPU and GPU utilization, fetch stalls, training time, energy usage, and variability. Our analysis exposes several resource inter-dependencies and counter-intuitive insights, while also helping quantify known wisdom. Our rigorous study can help tune the training performance on the edge, trade-off time and energy usage on constrained devices, and even select an ideal edge hardware for a DNN workload, and, in future, extend to federated learning too. As an illustration, we use these results to build a simple model to predict the training time and energy per epoch for any given DNN across different power modes, with minimal additional profiling.

References

[1]
Hazem A. Abdelhafez, Hassan Halawa, Karthik Pattabiraman, and Matei Ripeanu. 2021. Snowflakes at the Edge: A Study of Variability among NVIDIA Jetson AGX Xavier Boards. In ACM EdgeSys Workshop.
[2]
Hazem A. Abdelhafez and Matei Ripeanu. 2019. Studying the Impact of CPU and Memory Controller Frequencies on Power Consumption of the Jetson TX1. In IEEE Intl. Conf. on Fog and Mobile Edge Comp. (FMEC).
[3]
Assemblyai. 2022. TF v/s Pytorch. https://www.assemblyai.com/blog/pytorch-vs-tensorflow-in-2022/.
[4]
S. Baller, A. Jindal, M. Chadha, and M. Gerndt. 2021. DeepEdgeBench: Benchmarking Deep Neural Networks on Edge Devices. In IEEE International Conference on Cloud Engineering.
[5]
Yoshua Bengio. 2012. Practical recommendations for gradient-based training of deep architectures. In Neural networks: Tricks of the trade. Springer, 437--478.
[6]
Daniel J. Beutel, Taner Topal, Akhil Mathur, Xinchi Qiu, Javier Fernandez-Marques, Yan Gao, Lorenzo Sani, Kwing Hei Li, Titouan Parcollet, Pedro Porto Buarque de Gusmão, and Nicholas D. Lane. 2020. Flower: A friendly federated learning research framework. arXiv preprint arXiv:2007.14390 (2020).
[7]
Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloé Kiddon, Jakub Konený, Stefano Mazzocchi, Brendan McMahan, Timon Van Overveldt, David Petrou, Daniel Ramage, and Jason Roselander. 2019. Towards Federated Learning at Scale: System Design. In Proceedings of Machine Learning and Systems, A. Talwalkar, V. Smith, and M. Zaharia (Eds.), Vol. 1. 374--388. https://proceedings.mlsys.org/paper/2019/file/ bd686fd640be98efaae0091fa301e613-Paper.pdf
[8]
Shubham Chandel. 2022. Pytorch Model Summary. https://github.com/sksq96/pytorch-summary.
[9]
Zachary Charles, Zachary Garrett, Zhouyuan Huo, Sergei Shmulyian, and Virginia Smith. 2021. On large-cohort training for federated learning. Advances in Neural Information Processing Systems 34 (2021).
[10]
Jiasi Chen and Xukan Ran. 2019. Deep Learning With Edge Computing: A Review. Proc. IEEE 107, 8 (2019), 1655--1674. https://doi.org/10.1109/JPROC.2019.2921977
[11]
John Chen, Cameron Wolfe, Zhao Li, and Anastasios Kyrillidis. 2022. Demon: Improved Neural Network Training with Momentum Decay. In ICASSP 2022--2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3958--3962.
[12]
Qi Chen, Wei Wang, Fangyu Wu, Suparna De, Ruili Wang, Bailing Zhang, and Xin Huang. 2019. A survey on an emerging area: Deep learning for smart city data. IEEE Transactions on Emerging Topics in Computational Intelligence (2019).
[13]
Xiaohan Ding, Guiguang Ding, Jungong Han, and Sheng Tang. 2018. Auto-balanced filter pruning for efficient convolutional neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
[14]
Noah Golmant, Nikita Vemuri, Zhewei Yao, Vladimir Feinberg, Amir Gholami, Kai Rothauge, Michael W Mahoney, and Joseph Gonzalez. 2018. On the computational inefficiency of large batch sizes for stochastic gradient descent. arXiv preprint arXiv:1811.12941 (2018).
[15]
Google. 2022. Dev Board datasheet. https://coral.ai/docs/dev-board/datasheet/.
[16]
Google. 2022. Google Coral Products. https://coral.ai/products/.
[17]
Hassan Halawa, Hazem A. Abdelhafez, Andrew Boktor, and Matei Ripeanu. 2017. NVIDIA Jetson Platform Characterization. In Euro-Par 2017: Parallel Processing. Springer International Publishing, Cham, 92--105.
[18]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[19]
Stephan Holly, Alexander Wendt, and Martin Lechner. 2020. Profiling Energy Consumption of Deep Neural Networks on NVIDIA Jetson Nano. In 2020 11th International Green and Sustainable Computing Workshops (IGSC). 1--6. https: //doi.org/10.1109/IGSC51522.2020.9290876
[20]
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, and Hartwig Adam. 2019. Searching for MobileNetV3. In IEEE/CVF International Conference on Computer Vision (ICCV). IEEE.
[21]
Intel. 2022. Intel Movidius VPUs. https://www.intel.com/content/www/us/en/products/details/processors/movidiusvpu.html.
[22]
Sumin Kim, Seunghwan Oh, and Youngmin Yi. 2021. Minimizing GPU Kernel Launch Overhead in Deep Learning Inference on Mobile GPUs. In Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications (Virtual, United Kingdom) (HotMobile '21). Association for Computing Machinery, New York, NY, USA, 57--63. https://doi.org/10.1145/3446382.3448606
[23]
Dimitrios Kollias et al. 2018. Dimitrios Kollias and Athanasios Tagaris and Andreas Stafylopatis and Stefanos Kollias and Georgios Tagaris. Complex & Intelligent Systems (2018).
[24]
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. (2009).
[25]
Navjot Kukreja, Alena Shilova, Olivier Beaumont, Jan Huckelheim, Nicola Ferrier, Paul Hovland, and Gerard Gorman. 2019. Training on the Edge: The why and the how. In 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 899--903. https://doi.org/10.1109/IPDPSW.2019.00148
[26]
Abhishek Vijaya Kumar and Muthian Sivathanu. 2020. Quiver: An informed storage cache for deep learning. In 18th {USENIX} Conference on File and Storage Technologies ({FAST} 20).
[27]
Sampo Kuutti, Richard Bowden, Yaochu Jin, Phil Barber, and Saber Fallah. 2020. A survey of deep learning applications to autonomous vehicle control. IEEE Transactions on Intelligent Transportation Systems (2020).
[28]
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2324. https://doi.org/10.1109/5.726791
[29]
Jie Liu, Jiawen Liu, Wan Du, and Dong Li. 2019. Performance Analysis and Characterization of Training Deep Learning Models on Mobile Device. In 2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS). 506--515. https://doi.org/10.1109/ICPADS47876.2019.00077
[30]
man page. 2021. iostat. https://man7.org/linux/man-pages/man1/iostat.1.html.
[31]
man pages. 2021. vmtouch. https://linux.die.net/man/8/vmtouch.
[32]
Dominic Masters and Carlo Luschi. 2018. Revisiting small batch training for deep neural networks. arXiv preprint arXiv:1804.07612 (2018).
[33]
Peter Mattson, Christine Cheng, Gregory Diamos, Cody Coleman, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, David Brooks, Dehao Chen, Debo Dutta, Udit Gupta, Kim Hazelwood, Andy Hock, Xinyuan Huang, Daniel Kang, David Kanter, Naveen Kumar, Jeffery Liao, Deepak Narayanan, Tayo Oguntebi, Gennady Pekhimenko, Lillian Pentecost, Vijay Janapa Reddi, Taylor Robie, Tom St John, Carole-Jean Wu, Lingjie Xu, Cliff Young, and Matei Zaharia. 2020. MLPerf Training Benchmark, Vol. 2. 336--349. https://proceedings.mlsys.org/ paper/2020/file/02522a2b2726fb0a03bb19f2d8d9524d-Paper.pdf
[34]
Jayashree Mohan, Amar Phanishayee, Ashish Raniwala, and Vijay Chidambaram. 2021. Analyzing and mitigating data stalls in DNN training. Proceedings of the VLDB Endowment (2021).
[35]
Nvidia. 2021. Jetson AGX Xavier Developer Kit. https://developer.nvidia.com/embedded/jetson-agx-xavier-developerkit.
[36]
Nvidia. 2021. Jetson Nano Developer Kit. https://developer.nvidia.com/embedded/jetson-nano-developer-kit.
[37]
Nvidia. 2021. Jetson NX Xavier Developer Kit. https://developer.nvidia.com/embedded/jetson-xavier-nx.
[38]
Nvidia. 2021. Power modes for Nano. https://docs.nvidia.com/jetson/archives/l4t-archived/l4t-3261/index.html#page/ Tegra%20Linux%20Driver%20Package%20Development%20Guide/power_management_nano.html#.
[39]
Nvidia. 2021. Power modes for NX and AGX. https://docs.nvidia.com/jetson/archives/l4t-archived/l4t-3261/index.html# page/Tegra%20Linux%20Driver%20Package%20Development%20Guide/power_management_jetson_xavier.html#.
[40]
Nvidia. 2021. Technical Brief: Nvidia Jetson AGX Orin. https://www.nvidia.com/content/dam/en-zz/Solutions/gtcf21/ jetson-orin/nvidia-jetson-agx-orin-technical-brief.pdf.
[41]
Nvidia. 2021. tegrastats. https://docs.nvidia.com/jetson/archives/l4t-archived/l4t-3231/index.html#page/Tegra% 20Linux%20Driver%20Package%20Development%20Guide/AppendixTegraStats.html.
[42]
Nvidia. 2022. Jetson AGX Orin Developer Kit. https://www.nvidia.com/en-us/autonomous-machines/embeddedsystems/jetson-orin/.
[43]
papers with code. 2021. Mobilenet V3. https://paperswithcode.com/lib/torchvision/mobilenet-v3.
[44]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, Vol. 32. https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf
[45]
T Prabhakar, Nisha Bhaskar, Tejas Pande, and Chaitanya Kulkarni. 2014. Joule Jotter: An interactive energy meter for metering, monitoring and control. In International Workshop on Demand Response, co-located with the ACM e-Energy.
[46]
pytorch. 2021. TORCH.UTILS.DATA. https://pytorch.org/docs/stable/data.html.
[47]
PyTorch. 2022. Cuda event. https://pytorch.org/docs/stable/generated/torch.cuda.Event.html.
[48]
Prashanthi S. K, Aakash Khochare, Sai Anuroop Kesanapalli, Rahul Bhope, and Yogesh Simmhan. 2022. Workshop on Parallel AI and Systems for the Edge - PAISE. In 2022 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum (IPDPSW).
[49]
Christopher J Shallue, Jaehoon Lee, Joseph Antognini, Jascha Sohl-Dickstein, Roy Frostig, and George E Dahl. 2018. Measuring the effects of data parallelism on neural network training. arXiv preprint arXiv:1811.03600 (2018).
[50]
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. https://doi.org/10.48550/ARXIV.1409.1556
[51]
Vladislav Sovrasov. 2021. Flops counter. https://pypi.org/project/ptflops/.
[52]
TensorFlow. 2022. TFF GLDv2. https://www.tensorflow.org/federated/api_docs/python/tff/simulation/datasets/gldv2/ load_data
[53]
Rik van Riel. 2001. Page Replacement in Linux 2.4 Memory Management. In 2001 USENIX Annual Technical Conference (USENIX ATC 01). USENIX Association, Boston, MA. https://www.usenix.org/conference/2001-usenix-annualtechnical-conference/page-replacement-linux-24-memory-management
[54]
Yu Wang, Gu-Yeon Wei, and David Brooks. 2019. Benchmarking tpu, gpu, and cpu platforms for deep learning. arXiv preprint arXiv:1907.10701 (2019).
[55]
Tobias Weyand, Andre Araujo, Bingyi Cao, and Jack Sim. 2020. Google landmarks dataset v2-a large-scale benchmark for instance-level recognition and retrieval. In IEEE/CVF conference on computer vision and pattern recognition.
[56]
Zhi Zhou, Xu Chen, En Li, Liekang Zeng, Ke Luo, and Junshan Zhang. 2019. Edge intelligence: Paving the last mile of artificial intelligence with edge computing. Proc. IEEE (2019)

Cited By

View all
  • (2024)EcoEdgeInfer: Dynamically Optimizing Latency and Sustainability for Inference on Edge Devices2024 IEEE/ACM Symposium on Edge Computing (SEC)10.1109/SEC62691.2024.00023(191-205)Online publication date: 4-Dec-2024
  • (2024)A Tutorial on Federated Learning from Theory to Practice: Foundations, Software Frameworks, Exemplary Use Cases, and Selected TrendsIEEE/CAA Journal of Automatica Sinica10.1109/JAS.2024.12421511:4(824-850)Online publication date: Apr-2024
  • (2024)HawkDrive: A Transformer-driven Visual Perception System for Autonomous Driving in Night Scene2024 IEEE Intelligent Vehicles Symposium (IV)10.1109/IV55156.2024.10588447(2598-2603)Online publication date: 2-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 6, Issue 3
POMACS
December 2022
534 pages
EISSN:2476-1249
DOI:10.1145/3576048
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2022
Published in POMACS Volume 6, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dnn training
  2. edge accelerators
  3. performance characterization

Qualifiers

  • Research-article

Funding Sources

  • Ministry of Education, India/PMRF
  • Department of Science and Technology, India/ICPS

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)339
  • Downloads (Last 6 weeks)24
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)EcoEdgeInfer: Dynamically Optimizing Latency and Sustainability for Inference on Edge Devices2024 IEEE/ACM Symposium on Edge Computing (SEC)10.1109/SEC62691.2024.00023(191-205)Online publication date: 4-Dec-2024
  • (2024)A Tutorial on Federated Learning from Theory to Practice: Foundations, Software Frameworks, Exemplary Use Cases, and Selected TrendsIEEE/CAA Journal of Automatica Sinica10.1109/JAS.2024.12421511:4(824-850)Online publication date: Apr-2024
  • (2024)HawkDrive: A Transformer-driven Visual Perception System for Autonomous Driving in Night Scene2024 IEEE Intelligent Vehicles Symposium (IV)10.1109/IV55156.2024.10588447(2598-2603)Online publication date: 2-Jun-2024
  • (2024)Characterizing Deep Learning Model Compression with Post-Training Quantization on Accelerated Edge Devices2024 IEEE International Conference on Edge Computing and Communications (EDGE)10.1109/EDGE62653.2024.00023(110-120)Online publication date: 7-Jul-2024
  • (2024)DNN acceleration in vehicle edge computing with mobility-awarenessComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2024.110607251:COnline publication date: 1-Sep-2024
  • (2023)Leveraging Federated Learning and XAI for Privacy-Aware and Lightweight Edge Training in Network Traffic Classification2023 IEEE International Conference on Computing (ICOCO)10.1109/ICOCO59262.2023.10397836(47-52)Online publication date: 9-Oct-2023
  • (2023)Investigating hardware and software aspects in the energy consumption of machine learning: A green AI‐centric analysisConcurrency and Computation: Practice and Experience10.1002/cpe.782535:24Online publication date: Jun-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media