skip to main content
10.1145/3665314.3670841acmconferencesArticle/Chapter ViewAbstractPublication PagesislpedConference Proceedingsconference-collections
research-article
Open access

PiQi: Partially Quantized DNN Inference on HMPSoCs

Published: 09 September 2024 Publication History

Abstract

Deep Neural Network (DNN) inference is now ubiquitous in embedded applications at the edge. State-of-the-art Heterogeneous Multi-Processors System-on-Chip (HMPSoCs) powering these applications come equipped with powerful Neural Processing Units (NPUs) that significantly outperform other inference-capable HMPSoC components - namely, the CPUs and GPUs - in terms of power consumption and performance. However, CPUs and GPUs can perform full precision inference, whereas NPUs can often only perform a quantized inference. Consequently, low-latency, low-power inference by the NPU comes at an accuracy loss due to the quantization.
DNNs consist of several heterogeneous layers. Here, we introduce the PiQi framework that allows DNN inference to layer-wise switch between the three inference-capable HMPSoC components, CPU, GPU, and NPU, mid-inference with minimal overhead. Consequently, PiQi employs the novel idea of partially quantized DNN inference on HMPSoCs. However, different DNN layers experience different power-performance gains while projecting different accuracy losses on quantization. Therefore, we provide within PiQi a multi-objective Genetic Algorithm (GA) that provides a power-performance Pareto-front under an accuracy constraint by selective multi-layer quantization during inference. Additionally, PiQi utilizes a neural network to expedite search time by predicting accuracy when assigning DNN layers to the appropriate cores.

References

[1]
Ehsan Aghapour et al. 2022. CPU-GPU Layer-Switched Low Latency CNN Inference. In DSD.
[2]
Ehsan Aghapour et al. 2023. PELSI: Power-Efficient Layer-Switched Inference. In RTCSA.
[3]
Ehsan Aghapour, Dolly Sapra, Andy Pimentel, and Anuj Pathania. 2024. ARM-CO-UP: ARM COoperative Utilization of Processors. ACM Trans. Des. Autom. Electron. Syst. (apr 2024). Just Accepted.
[4]
Hyunho Ahn et al. 2023. Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version.
[5]
Claudionor N. Coelho et al. 2021. Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors. Nature Machine Intelligence (2021).
[6]
Xiaotian Guo et al. 2023. Automated Exploration and Implementation of Distributed CNN Inference at the Edge. IoT Journal (2023).
[7]
Ramyad Hadidi et al. 2019. Characterizing the Deployment of Deep Neural Networks on Commercial Edge Devices. In IISWC.
[8]
Shubham Jain et al. 2018. Compensated-DNN: Energy Efficient Low-Precision Deep Neural Networks by Compensating Quantization Errors. In Proceedings of the 55th Annual Design Automation Conference (DAC '18).
[9]
Deb Kalyanmoy. 2002. A fast and elitist multi-objective genetic algorithm: NSGA-II. TEVC (2002).
[10]
Andreas Karatzas et al. 2023. OmniBoost: Boosting Throughput of Heterogeneous Embedded Devices under Multi-DNN Workload. In DAC.
[11]
Youngsok Kim et al. 2019. Layer: Low Latency On-Device Inference Using Cooperative Single-Layer Acceleration and Processor-Friendly Quantization. in EuroSys.
[12]
Kyuho J. Lee. 2021. Architecture of neural processing unit for deep neural networks. In Hardware Accelerator Systems for Artificial Intelligence and Machine Learning. Elsevier.
[13]
Yuhang Li et al. 2021. BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction. arXiv:2102.05426
[14]
Markus Nagel et al. 2019. Data-Free Quantization Through Weight Equalization and Bias Correction. In ICCV.
[15]
Markus Nagel et al. 2021. A White Paper on Neural Network Quantization. arXiv:2106.08295
[16]
Jie Tang et al. 2017. Enabling Deep Learning on IoT Devices. Computer (2017).
[17]
Hokchhay Tann et al. 2017. Hardware-Software Codesign of Accurate, Multiplier-Free Deep Neural Networks. In Proceedings of the 54th Annual Design Automation Conference 2017.
[18]
Satoki Tsuji et al. 2022. Greedy search algorithm for partial quantization of convolutional neural networks inspired by submodular optimization. Neural Computing and Applications (2022).
[19]
Peisong Wang et al. 2023. Optimization-Based Post-Training Quantization With Bit-Split and Stitching. TPAMI (2023).
[20]
Siqi Wang et al. 2020. High-Throughput CNN Inference on Embedded ARM Big.LITTLE Multicore Processors. TCAD (2020).
[21]
Siqi Wang et al. 2020. Neural Network Inference on Mobile SoCs. IEEE Design Test (2020).

Cited By

View all
  • (2024)Education Abstract: Design Space Exploration for Deep Learning at the Edge2024 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)10.1109/CASES60062.2024.00006(1-2)Online publication date: 29-Sep-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISLPED '24: Proceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design
August 2024
384 pages
ISBN:9798400706882
DOI:10.1145/3665314
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 September 2024

Check for updates

Author Tags

  1. edge artificial intelligence (edge-AI)
  2. low-power design (LPD)
  3. partial quantization
  4. and neural processing unit (NPU)

Qualifiers

  • Research-article

Conference

ISLPED '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 398 of 1,159 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)156
  • Downloads (Last 6 weeks)32
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Education Abstract: Design Space Exploration for Deep Learning at the Edge2024 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES)10.1109/CASES60062.2024.00006(1-2)Online publication date: 29-Sep-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media