skip to main content
10.1145/3240765.3240775guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Algorithm-Hardware Co-Design of Single Shot Detector for Fast Object Detection on FPGAs

Authors Info & Claims
Published:05 November 2018Publication History

ABSTRACT

The rapid improvement in computation capability has made convolutional neural networks (CNNs) a great success in recent years on image classification tasks, which has also prospered the development of objection detection algorithms with significantly improved accuracy. However, during the deployment phase, many applications demand low latency processing of one image with strict power consumption requirement, which reduces the efficiency of GPU and other general-purpose platform, bringing opportunities for specific acceleration hardware, e.g. FPGA, by customizing the digital circuit specific for the inference algorithm. Therefore, this work proposes to customize the detection algorithm, e.g. SSD, to benefit its hardware implementation with low data precision at the cost of marginal accuracy degradation. The proposed FPGA-based deep learning inference accelerator is demonstrated on two Intel FPGAs for SSD algorithm achieving up to 2.18 TOPS throughput and up to 3.3× superior energy-efficiency compared to GPU.

References

  1. [1].Aydonat Utku, O'Connell Shane, Capalija Davor, Ling Andrew C., and Chiu Gordon R.. 2017. An OpenCL™Deep Learning Accelerator on Arria 10. In ACM/SIGDA Int. Sym. on Field-Programmable Gate Arrays (FPGA).Google ScholarGoogle Scholar
  2. [2].Everingham M., Van Gool L., Williams C.K.I., Winn J., and Zisserman A.. [n. d.]. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results.Google ScholarGoogle Scholar
  3. [3].Guan Yijin, Liang Hao, Xu Ningyi, Wang Wenqiang, Shi Shaoshuai, Chen Xi, Sun Guangyu, Zhang Wei, and Cong Jason. 2017. FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates. In IEEE Int. Sym. on Field-Programmable Custom Computing Machines (FCCM). 152159.Google ScholarGoogle Scholar
  4. [4].He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. Jun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  5. [5].Jia Yangqing, Shelhamer Evan, Donahue Jeff, Karayev Sergey, Long Jonathan, Girshick Ross, Guadarrama Sergio, and Darrell Trevor. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv: 1408.5093 (2014).Google ScholarGoogle Scholar
  6. [6].Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E.. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems (NIPS).Google ScholarGoogle Scholar
  7. [7].Wei Liu, Dragomir Anguelov, Dumitru Erhan, Szegedy Christian, Reed Scott E., Fu Cheng-Yang, and Berg Alexander C.. Oct. 2016. SSD: Single Shot MultiBox Detector. In European Conference Computer Vision (ECCV).Google ScholarGoogle Scholar
  8. [8].Yufei Ma, Yu Cao, Sarma B. Vrudhula K., and Seo Jae-sun. 2017. An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks. In Int. Conf on Field Programmable Logic and Applications (FPL).Google ScholarGoogle Scholar
  9. [9].Yufei Ma, Yu Cao, Sarma B. Vrudhula K., and Seo Jae-sun. 2017. Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks. In ACM/SIGDA Int. Sym. on Field-Programmable Gate Arrays (FPGA).Google ScholarGoogle Scholar
  10. [10].Moons Bert and Verhelst Marian. 2017. An Energy-Efficient Precision-Scalable ConvNet Processor in 40-nm CMOS. J. Solid-State Circuits (2017).Google ScholarGoogle Scholar
  11. [11].Qiu Jiantao, Wang Jie, Yao Song, Guo Kaiyuan, Li Boxun, Zhou Erjin, Yu Jincheng, Tang Tianqi, Xu Ningyi, Song Sen, Wang Yu, and Yang Huazhong. 2016. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network. In ACM/SIGDA Int. Sym. on Field-Programmable Gate Arrays (FPGA).Google ScholarGoogle Scholar
  12. [12].Redmon Joseph, Divvala Santosh Kumar, Girshick Ross B., and Farhadi Ali. 2016. You Only Look Once: Unified, Real-Time Object Detection. In IEEE Conf on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  13. [13].Shaoqing Ren, Kaiming He, Girshick Ross B., and Sun Jian. Dec. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems (NIPS).Google ScholarGoogle Scholar
  14. [14].Shin Dongjoo, Lee Jinmook, Lee Jinsu, and Ypp Hoi-Jun. 2017. 14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks. In IEEE Int. Solid-State Circuits Conference (ISSCC).Google ScholarGoogle Scholar
  15. [15].Simonyan Karen and Zisserman Andrew. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 (2014). arXiv: 1409.1556 http://arxiv.org/abs/1409.1556Google ScholarGoogle Scholar
  16. [16].Suda Naveen, Chandra Vikas, Dasika Ganesh, Mohanty Abinash, Ma Yufei, Vrudhula Sarma, Seo Jae-sun, and Cao Yu. 2016. Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks. In ACM/SIGDA Int. Sym. on Field-Programmable Gate Arrays (FPGA).Google ScholarGoogle Scholar
  17. [17].Wei Xuechao, Yu Cody Hao, Zhang Peng, Chen Youxiang, Wang Yuxin, Hu Han, Liang Yun, and Cong Jason. 2017. Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs. In Design Automation Conference (DAC).Google ScholarGoogle Scholar
  18. [18].Yu Fisher and Koltun Vladlen. 2015. Multi-Scale Context Aggregation by Dilated Convolutions. CoRR abs/1511.07122 (2015). arXiv: 1511.07122 http://arxiv.org/abs/1511.07122Google ScholarGoogle Scholar
  19. [19].Zhang Chen, Fang Zhenman, Zhou Peipei, Pan Peichen, and Cong Jason. 2016. Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks. In Int. Conf on Computer-Aided Design (ICCAD).Google ScholarGoogle Scholar
  20. [20].Zhang Chen, Li Peng, Sun Guangyu, Guan Yijin, Xiao Bingjun, and Cong Jason. 2015. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. In ACM/SIGDA Int. Sym. on Field-Programmable Gate Arrays (FPGA).Google ScholarGoogle Scholar
  21. [21].Zhao Ruizhe, Niu Xinyu, Wu Yajie, Luk Wayne, and Liu Qiang. 2017. Optimizing CNN-Based Object Detection Algorithms on Embedded FPGA Platforms. In Applied Reconfigurable Computing (ARC).Google ScholarGoogle Scholar

Index Terms

  1. Algorithm-Hardware Co-Design of Single Shot Detector for Fast Object Detection on FPGAs
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image Guide Proceedings
        2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
        Nov 2018
        939 pages

        Copyright © 2018

        Publisher

        IEEE Press

        Publication History

        • Published: 5 November 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Qualifiers

        • research-article