Conferences >2023 IEEE High Performance Ex...

High-Level Frameworks: Effect on Transformer Inference Time and Power on Embedded GPU Devices

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Developing software for machine- and deep-learning (ML and DL) workloads is often a daunting task to individuals with minimal programming experience or to organizations w...Show More

Metadata

Abstract:

Developing software for machine- and deep-learning (ML and DL) workloads is often a daunting task to individuals with minimal programming experience or to organizations with limited engineering capacity. ML frameworks address these issues by providing a high-level API able to perform otherwise complex tasks with less engineering time. This high level of abstraction can reduce and hide many of the challenges that are induced by unclean datasets, complicated pre/postprocessing pipelines, and low-level dependencies like CUDA. This high-level approach encourages model portability and can dramatically increase design iteration speed, as well as providing model speedup in some cases. This research demonstrates that these high-level ML frameworks are also more performant out-of-the-box on embedded systems than their pure PyTorch reference implementations likely due to their myriad of optimizations related to data movement and memory management. In this research, we benchmark a state-of-the-art transcription model, wav2vec2, and compare performance across different frameworks: the reference implementation from the Fairseq framework and the two higher-level frameworks HuggingFace and Lightning Flash. Overall, we observe that both Lightning Flash and HuggingFace are substantially faster than the original unoptimized PyTorch model. In general, these models ran between

$1.8\times$ and

$2.0\times$ faster than the base PyTorch implementation on the embedded NVIDIA Jetson platforms targeted. As a secondary result, we also observe the high-level frameworks to be more power efficient for the same computation.

Published in: 2023 IEEE High Performance Extreme Computing Conference (HPEC)

Date of Conference: 25-29 September 2023

Date Added to IEEE Xplore: 25 December 2023

ISBN Information:

ISSN Information:

DOI: 10.1109/HPEC58863.2023.10363464

Conference Location: Boston, MA, USA

Funding Agency:

Contents

References is not available for this document.

High-Level Frameworks: Effect on Transformer Inference Time and Power on Embedded GPU Devices

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

High-Level Frameworks: Effect on Transformer Inference Time and Power on Embedded GPU Devices

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?