skip to main content
10.1145/3477314.3507161acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
poster

At-scale assessment of weight clustering for energy-efficient object detection accelerators

Authors Info & Claims
Published:06 May 2022Publication History

ABSTRACT

DNN-based object detection operates on large data volumes to fetch images and DNN weights, which leads to high power and bandwidth demands. Solutions to mitigate those demands, such as weight clustering, are normally studied in limited examples of a much smaller scale than target applications, which poses difficulties to determine the best tradeoff to implement. This paper performs an at-scale (using a real life application) assessment of weight clustering for a DNN-based object detection system - You Only Look Once (YOLO) - considering real driving videos. Our case study shows that an Output Stationary accelerator (e.g. a systolic array) restricting weights to only between 32 (5-bit) and 256 (8-bit) different values allows preserving the accuracy of the original 32-bit weights of YOLO while decreasing bandwidth requirements to around 30%-40% of the original bandwidth, and overall energy consumption to around 45% of the original consumption. Overall, our case study provides key insights on which to take design decisions for an accelerator for camera-based object detection.

References

  1. 2018. Apollo, an open autonomous driving platform. http://apollo.auto/.Google ScholarGoogle Scholar
  2. T. Chen et al. 2014. DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning. In ASPLOS.Google ScholarGoogle Scholar
  3. Yoojin Choi et al. 2016. Towards the Limit of Network Quantization. CoRR abs/1612.01543 (2016). arXiv:1612.01543 http://arxiv.org/abs/1612.01543Google ScholarGoogle Scholar
  4. I.S. Dhillon and D.S. Modha. 2000. A Data-Clustering Algorithm on Distributed Memory Multiprocessors. In Large-Scale Parallel Data Mining. Springer Berlin Heidelberg, 245--260.Google ScholarGoogle Scholar
  5. Y. Gong et al. 2014. Compressing Deep Convolutional Networks using Vector Quantization. arXiv:cs.CV/1412.6115Google ScholarGoogle Scholar
  6. S. Han et al. 2016. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv:cs.CV/1510.00149Google ScholarGoogle Scholar
  7. S. Han et al. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In ISCA. 243--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Redmon and A. Farhadi. 2018. Yolo v3: An incremental improvement.Google ScholarGoogle Scholar
  9. K.T. Johnson, A.R. Hurson, and B. Shirazi. 1993. General-purpose systolic arrays. Computer 26, 11 (1993), 20--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Kung. 1985. VLSI Array processors. IEEE ASSP Magazine 2, 3 (1985), 4--22. Google ScholarGoogle ScholarCross RefCross Ref
  11. S. Li et al. 2011. CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques. In ICCAD.Google ScholarGoogle Scholar
  12. S. Li et al. 2020. DRAMsim3: A Cycle-Accurate, Thermal-Capable DRAM Simulator. IEEE Computer Architecture Letters 19, 2 (2020), 106--109.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J.B. MacQueen. 1967. Some methods for classification and analysis of multivariate observations. In Berkeley Symposium on Mathematical Statistics and Probability.Google ScholarGoogle Scholar
  14. Microsoft. [n. d.]. COCO - Detection Evaluation. https://cocodataset.org/detection-eval.Google ScholarGoogle Scholar
  15. R. Padilla et al. 2020. A Survey on Performance Metrics for Object-Detection Algorithms. In 2020 Int'l Conf. on Systems, Signals and Image Processing. 237--242.Google ScholarGoogle Scholar
  16. S. Seo and J. Kim. 2019. Efficient Weights Quantization of Convolutional Neural Networks Using Kernel Density Estimation based Non-uniform Quantizer. Applied Sciences 9, 12 (2019).Google ScholarGoogle Scholar
  17. V. Sze et al. 2017. Efficient Processing of Deep Neural Networks: A Tutorial and Survey. Proc. IEEE 105, 12 (2017), 2295--2329.Google ScholarGoogle ScholarCross RefCross Ref
  18. Hamid Tabani, Jose-Maria Arnau, Jordi Tubella, and Antonio Gonzalez. 2017. An ultra low-power hardware accelerator for acoustic scoring in speech recognition. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, 41--52.Google ScholarGoogle Scholar
  19. F. Tung and G. Mori. 2018. CLIP-Q: Deep Network Compression Learning by In-parallel Pruning-Quantization. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7873--7882.Google ScholarGoogle Scholar
  20. J. Utah. 2020. Rio 4K - Copacabana Beach - Morning Drive, [2:50 - 3:20]. https://www.youtube.com/watch?v=_hWCN1yV9TY.Google ScholarGoogle Scholar
  21. Z. Wang et al. 2020. Sparse-YOLO: Hardware/Software Co-Design of an FPGA Accelerator for YOLOv2. IEEE Access 8 (2020), 116569--116585.Google ScholarGoogle ScholarCross RefCross Ref
  22. S. Ye et al. 2018. A unified framework of dnn weight pruning and weight clustering/quantization using admm. arXiv preprint arXiv:1811.01907 (2018).Google ScholarGoogle Scholar

Index Terms

  1. At-scale assessment of weight clustering for energy-efficient object detection accelerators
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing
          April 2022
          2099 pages
          ISBN:9781450387132
          DOI:10.1145/3477314

          Copyright © 2022 Owner/Author

          Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 6 May 2022

          Check for updates

          Qualifiers

          • poster

          Acceptance Rates

          Overall Acceptance Rate1,650of6,669submissions,25%
        • Article Metrics

          • Downloads (Last 12 months)9
          • Downloads (Last 6 weeks)1

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader