Abstract:
Developing software for machine- and deep-learning (ML and DL) workloads is often a daunting task to individuals with minimal programming experience or to organizations w...Show MoreMetadata
Abstract:
Developing software for machine- and deep-learning (ML and DL) workloads is often a daunting task to individuals with minimal programming experience or to organizations with limited engineering capacity. ML frameworks address these issues by providing a high-level API able to perform otherwise complex tasks with less engineering time. This high level of abstraction can reduce and hide many of the challenges that are induced by unclean datasets, complicated pre/postprocessing pipelines, and low-level dependencies like CUDA. This high-level approach encourages model portability and can dramatically increase design iteration speed, as well as providing model speedup in some cases. This research demonstrates that these high-level ML frameworks are also more performant out-of-the-box on embedded systems than their pure PyTorch reference implementations likely due to their myriad of optimizations related to data movement and memory management. In this research, we benchmark a state-of-the-art transcription model, wav2vec2, and compare performance across different frameworks: the reference implementation from the Fairseq framework and the two higher-level frameworks HuggingFace and Lightning Flash. Overall, we observe that both Lightning Flash and HuggingFace are substantially faster than the original unoptimized PyTorch model. In general, these models ran between 1.8\times and 2.0\times faster than the base PyTorch implementation on the embedded NVIDIA Jetson platforms targeted. As a secondary result, we also observe the high-level frameworks to be more power efficient for the same computation.
Date of Conference: 25-29 September 2023
Date Added to IEEE Xplore: 25 December 2023
ISBN Information: