poster

At-scale assessment of weight clustering for energy-efficient object detection accelerators

Authors:
Martí Caro

Barcelona Supercomputing Center (BSC) and Universitat Politécnica de Catalunya (UPC)

Barcelona Supercomputing Center (BSC) and Universitat Politécnica de Catalunya (UPC)
View Profile

,
Hamid Tabani

Barcelona Supercomputing Center (BSC)

Barcelona Supercomputing Center (BSC)
View Profile

,
Jaume Abella

Universitat Politécnica de Catalunya (UPC)

Universitat Politécnica de Catalunya (UPC)
View Profile

SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied ComputingApril 2022Pages 530–533https://doi.org/10.1145/3477314.3507161

Published:06 May 2022Publication History

SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing

Pages 530–533

ABSTRACT

DNN-based object detection operates on large data volumes to fetch images and DNN weights, which leads to high power and bandwidth demands. Solutions to mitigate those demands, such as weight clustering, are normally studied in limited examples of a much smaller scale than target applications, which poses difficulties to determine the best tradeoff to implement. This paper performs an at-scale (using a real life application) assessment of weight clustering for a DNN-based object detection system - You Only Look Once (YOLO) - considering real driving videos. Our case study shows that an Output Stationary accelerator (e.g. a systolic array) restricting weights to only between 32 (5-bit) and 256 (8-bit) different values allows preserving the accuracy of the original 32-bit weights of YOLO while decreasing bandwidth requirements to around 30%-40% of the original bandwidth, and overall energy consumption to around 45% of the original consumption. Overall, our case study provides key insights on which to take design decisions for an accelerator for camera-based object detection.

References

2018. Apollo, an open autonomous driving platform. http://apollo.auto/.Google Scholar
T. Chen et al. 2014. DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning. In ASPLOS.Google Scholar
Yoojin Choi et al. 2016. Towards the Limit of Network Quantization. CoRR abs/1612.01543 (2016). arXiv:1612.01543 http://arxiv.org/abs/1612.01543Google Scholar
I.S. Dhillon and D.S. Modha. 2000. A Data-Clustering Algorithm on Distributed Memory Multiprocessors. In Large-Scale Parallel Data Mining. Springer Berlin Heidelberg, 245--260.Google Scholar
Y. Gong et al. 2014. Compressing Deep Convolutional Networks using Vector Quantization. arXiv:cs.CV/1412.6115Google Scholar
S. Han et al. 2016. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv:cs.CV/1510.00149Google Scholar
S. Han et al. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In ISCA. 243--254. Google ScholarDigital Library
J. Redmon and A. Farhadi. 2018. Yolo v3: An incremental improvement.Google Scholar
K.T. Johnson, A.R. Hurson, and B. Shirazi. 1993. General-purpose systolic arrays. Computer 26, 11 (1993), 20--31. Google ScholarDigital Library
S. Kung. 1985. VLSI Array processors. IEEE ASSP Magazine 2, 3 (1985), 4--22. Google ScholarCross Ref
S. Li et al. 2011. CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques. In ICCAD.Google Scholar
S. Li et al. 2020. DRAMsim3: A Cycle-Accurate, Thermal-Capable DRAM Simulator. IEEE Computer Architecture Letters 19, 2 (2020), 106--109.Google ScholarDigital Library
J.B. MacQueen. 1967. Some methods for classification and analysis of multivariate observations. In Berkeley Symposium on Mathematical Statistics and Probability.Google Scholar
Microsoft. [n. d.]. COCO - Detection Evaluation. https://cocodataset.org/detection-eval.Google Scholar
R. Padilla et al. 2020. A Survey on Performance Metrics for Object-Detection Algorithms. In 2020 Int'l Conf. on Systems, Signals and Image Processing. 237--242.Google Scholar
S. Seo and J. Kim. 2019. Efficient Weights Quantization of Convolutional Neural Networks Using Kernel Density Estimation based Non-uniform Quantizer. Applied Sciences 9, 12 (2019).Google Scholar
V. Sze et al. 2017. Efficient Processing of Deep Neural Networks: A Tutorial and Survey. Proc. IEEE 105, 12 (2017), 2295--2329.Google ScholarCross Ref
Hamid Tabani, Jose-Maria Arnau, Jordi Tubella, and Antonio Gonzalez. 2017. An ultra low-power hardware accelerator for acoustic scoring in speech recognition. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, 41--52.Google Scholar
F. Tung and G. Mori. 2018. CLIP-Q: Deep Network Compression Learning by In-parallel Pruning-Quantization. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7873--7882.Google Scholar
J. Utah. 2020. Rio 4K - Copacabana Beach - Morning Drive, [2:50 - 3:20]. https://www.youtube.com/watch?v=_hWCN1yV9TY.Google Scholar
Z. Wang et al. 2020. Sparse-YOLO: Hardware/Software Co-Design of an FPGA Accelerator for YOLOv2. IEEE Access 8 (2020), 116569--116585.Google ScholarCross Ref
S. Ye et al. 2018. A unified framework of dnn weight pruning and weight clustering/quantization using admm. arXiv preprint arXiv:1811.01907 (2018).Google Scholar

Index Terms

At-scale assessment of weight clustering for energy-efficient object detection accelerators

Index terms have been assigned to the content through auto-classification.

Recommendations

At-scale evaluation of weight clustering to enable energy-efficient object detection
Abstract
Accelerators implementing Deep Neural Networks (DNNs) for image-based object detection operate on large volumes of data due to fetching images and neural network parameters, especially if they need to process video streams, hence with ...
Read More
Energy-Efficient In-SRAM Accumulation for CMOS-based CNN Accelerators
GLSVLSI '22: Proceedings of the Great Lakes Symposium on VLSI 2022

State-of-the-art convolutional neural network (CNN) accelerators are typically communication-dominate architectures. To reduce the energy consumption of data accesses and also to maintain the high performance, researches have adopted large amounts of on-...
Read More
Energy Efficient Weight Based Clustering in MANET
ICGSP '17: Proceedings of the 1st International Conference on Graphics and Signal Processing

Energy is a critical issue in cluster based routing protocol in MANET (Mobile Ad hoc Network). When death of a cluster head occurs, re-clustering technique needs to be invoked to select another cluster head. This technique again involves explicit ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing
April 2022
2099 pages
ISBN:9781450387132
DOI:10.1145/3477314
Conference Chairs:
Jiman Hong
Soongsil University
,
Miroslav Bures
Czech Technical University, Czechia
,
Program Chairs:
Juw Won Park
University of Louisville
,
Tomas Cerny
Baylor University
Copyright © 2022 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 May 2022
Check for updates
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate1,650of6,669submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 37
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

At-scale assessment of weight clustering for energy-efficient object detection accelerators

SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

At-scale evaluation of weight clustering to enable energy-efficient object detection

Energy-Efficient In-SRAM Accumulation for CMOS-based CNN Accelerators

Energy Efficient Weight Based Clustering in MANET