skip to main content
10.1145/3394885.3431659acmconferencesArticle/Chapter ViewAbstractPublication PagesaspdacConference Proceedingsconference-collections
research-article

A 0.57-GOPS/DSP Object Detection PIM Accelerator on FPGA

Published: 29 January 2021 Publication History

Abstract

The paper presents an object detection accelerator featuring a processing-in-memory (PIM) architecture on FPGAs. PIM architectures are well known for their energy efficiency and avoidance of the memory wall. In the accelerator, a PIM unit is developed using BRAM and LUT based counters, which also helps to improve the DSP performance density. The overall architecture consists of 64 PIM units and three memory buffers to store inter-layer results. A shrunk and quantized Tiny-YOLO network is mapped to the PIM accelerator, where DRAM access is fully eliminated during inference. The design achieves a throughput of 201.6 GOPs at 100MHz clock rate and correspondingly, a performance density of 0.57 GOPS/DSP.

References

[1]
Y. Cai, et al., "Training low bitwidth convolutional neural network on RRAM," Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 117--122, 2018.
[2]
D. Wang, et al., "ABM-SpConv: A Novel Approach to FPGA-Based Acceleration of Convolutional Neural Network Inference," in 2019 56th ACM/IEEE Design Automation Conference (DAC), pp. 1--6, 2019.
[3]
H. Zhu, Y. Wang, and C.-J. R. Shi, "Tanji: A General-Purpose Neural Network Accelerator with a Unified Crossbar Architecture," in IEEE Design & Test, vol. 37, no. 1, pp. 56--63, Feb. 2020.
[4]
S.K. Esser, et al., "Learned Step Size Quantization," in International Conference on Learning Representations 2020 (ICLR), 2020.
[5]
J. Redmon, and Ali Farhadi. "YOLOv3: An Incremental Improvement," [On-line]Available: https://arxiv.org/abs/1804.02767, 2018.

Cited By

View all
  • (2023)Ultra-High-Speed Accelerator Architecture for Convolutional Neural Network Based on Processing-in-Memory Using Resistive Random Access MemorySensors10.3390/s2305240123:5(2401)Online publication date: 21-Feb-2023
  • (2022)TinyML for Ultra-Low Power AI and Large Scale IoT Deployments: A Systematic ReviewFuture Internet10.3390/fi1412036314:12(363)Online publication date: 6-Dec-2022
  • (2022)Implementation and Optimization of Target Detection based on Multi-core DSP2022 IEEE 10th Joint International Information Technology and Artificial Intelligence Conference (ITAIC)10.1109/ITAIC54216.2022.9836526(1587-1592)Online publication date: 17-Jun-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPDAC '21: Proceedings of the 26th Asia and South Pacific Design Automation Conference
January 2021
930 pages
ISBN:9781450379991
DOI:10.1145/3394885
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 January 2021

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ASPDAC '21
Sponsor:

Acceptance Rates

ASPDAC '21 Paper Acceptance Rate 111 of 368 submissions, 30%;
Overall Acceptance Rate 466 of 1,454 submissions, 32%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)50
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Ultra-High-Speed Accelerator Architecture for Convolutional Neural Network Based on Processing-in-Memory Using Resistive Random Access MemorySensors10.3390/s2305240123:5(2401)Online publication date: 21-Feb-2023
  • (2022)TinyML for Ultra-Low Power AI and Large Scale IoT Deployments: A Systematic ReviewFuture Internet10.3390/fi1412036314:12(363)Online publication date: 6-Dec-2022
  • (2022)Implementation and Optimization of Target Detection based on Multi-core DSP2022 IEEE 10th Joint International Information Technology and Artificial Intelligence Conference (ITAIC)10.1109/ITAIC54216.2022.9836526(1587-1592)Online publication date: 17-Jun-2022
  • (2022)CA-SpaceNet: Counterfactual Analysis for 6D Pose Estimation in Space2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS47612.2022.9981172(10627-10634)Online publication date: 23-Oct-2022
  • (2022)A review on TinyML: State-of-the-art and prospectsJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2021.11.01934:4(1595-1623)Online publication date: Apr-2022
  • (2022)SeLoC-ML: Semantic Low-Code Engineering for Machine Learning Applications in Industrial IoTThe Semantic Web – ISWC 202210.1007/978-3-031-19433-7_48(845-862)Online publication date: 16-Oct-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media