research-article

Algorithm-Hardware Co-Design of Single Shot Detector for Fast Object Detection on FPGAs

Authors:
Yufei Ma

School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, USA

School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, USA
View Profile

,
Tu Zheng

College of Computer Science and Technology, Zhejiang University, Hangzhou, China

College of Computer Science and Technology, Zhejiang University, Hangzhou, China
View Profile

,
Yu Cao

School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, USA

School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, USA
View Profile

,
Sarma Vrudhula

School of Computing, Informatics and Decision Systems Engineering, Arizona State University, Tempe, USA

School of Computing, Informatics and Decision Systems Engineering, Arizona State University, Tempe, USA
View Profile

,
Jae-sun Seo

School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, USA

School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, USA
View Profile

2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)Nov 2018Pages 1–8https://doi.org/10.1145/3240765.3240775

Published:05 November 2018Publication History

2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pages 1–8

ABSTRACT

The rapid improvement in computation capability has made convolutional neural networks (CNNs) a great success in recent years on image classification tasks, which has also prospered the development of objection detection algorithms with significantly improved accuracy. However, during the deployment phase, many applications demand low latency processing of one image with strict power consumption requirement, which reduces the efficiency of GPU and other general-purpose platform, bringing opportunities for specific acceleration hardware, e.g. FPGA, by customizing the digital circuit specific for the inference algorithm. Therefore, this work proposes to customize the detection algorithm, e.g. SSD, to benefit its hardware implementation with low data precision at the cost of marginal accuracy degradation. The proposed FPGA-based deep learning inference accelerator is demonstrated on two Intel FPGAs for SSD algorithm achieving up to 2.18 TOPS throughput and up to 3.3× superior energy-efficiency compared to GPU.

References

[1].Aydonat Utku, O'Connell Shane, Capalija Davor, Ling Andrew C., and Chiu Gordon R.. 2017. An OpenCL™Deep Learning Accelerator on Arria 10. In ACM/SIGDA Int. Sym. on Field-Programmable Gate Arrays (FPGA).Google Scholar
[2].Everingham M., Van Gool L., Williams C.K.I., Winn J., and Zisserman A.. [n. d.]. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results.Google Scholar
[3].Guan Yijin, Liang Hao, Xu Ningyi, Wang Wenqiang, Shi Shaoshuai, Chen Xi, Sun Guangyu, Zhang Wei, and Cong Jason. 2017. FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates. In IEEE Int. Sym. on Field-Programmable Custom Computing Machines (FCCM). 152–159.Google Scholar
[4].He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. Jun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
[5].Jia Yangqing, Shelhamer Evan, Donahue Jeff, Karayev Sergey, Long Jonathan, Girshick Ross, Guadarrama Sergio, and Darrell Trevor. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv: 1408.5093 (2014).Google Scholar
[6].Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E.. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems (NIPS).Google Scholar
[7].Wei Liu, Dragomir Anguelov, Dumitru Erhan, Szegedy Christian, Reed Scott E., Fu Cheng-Yang, and Berg Alexander C.. Oct. 2016. SSD: Single Shot MultiBox Detector. In European Conference Computer Vision (ECCV).Google Scholar
[8].Yufei Ma, Yu Cao, Sarma B. Vrudhula K., and Seo Jae-sun. 2017. An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks. In Int. Conf on Field Programmable Logic and Applications (FPL).Google Scholar
[9].Yufei Ma, Yu Cao, Sarma B. Vrudhula K., and Seo Jae-sun. 2017. Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks. In ACM/SIGDA Int. Sym. on Field-Programmable Gate Arrays (FPGA).Google Scholar
[10].Moons Bert and Verhelst Marian. 2017. An Energy-Efficient Precision-Scalable ConvNet Processor in 40-nm CMOS. J. Solid-State Circuits (2017).Google Scholar
[11].Qiu Jiantao, Wang Jie, Yao Song, Guo Kaiyuan, Li Boxun, Zhou Erjin, Yu Jincheng, Tang Tianqi, Xu Ningyi, Song Sen, Wang Yu, and Yang Huazhong. 2016. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network. In ACM/SIGDA Int. Sym. on Field-Programmable Gate Arrays (FPGA).Google Scholar
[12].Redmon Joseph, Divvala Santosh Kumar, Girshick Ross B., and Farhadi Ali. 2016. You Only Look Once: Unified, Real-Time Object Detection. In IEEE Conf on Computer Vision and Pattern Recognition (CVPR).Google Scholar
[13].Shaoqing Ren, Kaiming He, Girshick Ross B., and Sun Jian. Dec. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems (NIPS).Google Scholar
[14].Shin Dongjoo, Lee Jinmook, Lee Jinsu, and Ypp Hoi-Jun. 2017. 14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks. In IEEE Int. Solid-State Circuits Conference (ISSCC).Google Scholar
[15].Simonyan Karen and Zisserman Andrew. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 (2014). arXiv: 1409.1556 http://arxiv.org/abs/1409.1556Google Scholar
[16].Suda Naveen, Chandra Vikas, Dasika Ganesh, Mohanty Abinash, Ma Yufei, Vrudhula Sarma, Seo Jae-sun, and Cao Yu. 2016. Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks. In ACM/SIGDA Int. Sym. on Field-Programmable Gate Arrays (FPGA).Google Scholar
[17].Wei Xuechao, Yu Cody Hao, Zhang Peng, Chen Youxiang, Wang Yuxin, Hu Han, Liang Yun, and Cong Jason. 2017. Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs. In Design Automation Conference (DAC).Google Scholar
[18].Yu Fisher and Koltun Vladlen. 2015. Multi-Scale Context Aggregation by Dilated Convolutions. CoRR abs/1511.07122 (2015). arXiv: 1511.07122 http://arxiv.org/abs/1511.07122Google Scholar
[19].Zhang Chen, Fang Zhenman, Zhou Peipei, Pan Peichen, and Cong Jason. 2016. Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks. In Int. Conf on Computer-Aided Design (ICCAD).Google Scholar
[20].Zhang Chen, Li Peng, Sun Guangyu, Guan Yijin, Xiao Bingjun, and Cong Jason. 2015. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. In ACM/SIGDA Int. Sym. on Field-Programmable Gate Arrays (FPGA).Google Scholar
[21].Zhao Ruizhe, Niu Xinyu, Wu Yajie, Luk Wayne, and Liu Qiang. 2017. Optimizing CNN-Based Object Detection Algorithms on Embedded FPGA Platforms. In Applied Reconfigurable Computing (ARC).Google Scholar

Index Terms

Algorithm-Hardware Co-Design of Single Shot Detector for Fast Object Detection on FPGAs
1. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators
  2. Very large scale integration design
    1. Application-specific VLSI designs

Index terms have been assigned to the content through auto-classification.

Recommendations

ARM Synthesizable Design with Actel FPGAs: with Mixed-Signal SoC Applications (set 3)
Read More
Hardware and software infrastructure to implement many-core systems in modern FPGAs
SBCCI '17: Proceedings of the 30th Symposium on Integrated Circuits and Systems Design: Chip on the Sands

Many-core systems are increasingly popular in embedded systems due to their high-performance and flexibility to execute different workloads. These many-core systems provide a rich processing fabric but lack the flexibility to accelerate critical ...
Read More
Domain-Specific Language for HW/SW Co-design for FPGAs
DSL '09: Proceedings of the IFIP TC 2 Working Conference on Domain-Specific Languages

This article describes FSMLanguage, a domain-specific language for HW/SW co-design targeting platform FPGAs. Modern platform FPGAs provide a wealth of configurable logic in addition to embedded processors, distributed RAM blocks, and DSP slices in order ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
Nov 2018
939 pages

Copyright © 2018
Sponsors
In-Cooperation
Publisher
IEEE Press
Publication History
- Published: 5 November 2018
Permissions
Request permissions about this article.
Request Permissions
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 18
  Total Citations
  View Citations
- 424
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

Algorithm-Hardware Co-Design of Single Shot Detector for Fast Object Detection on FPGAs

2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

ABSTRACT

References

Cited By

Index Terms

Recommendations

ARM Synthesizable Design with Actel FPGAs: with Mixed-Signal SoC Applications (set 3)

Hardware and software infrastructure to implement many-core systems in modern FPGAs

Domain-Specific Language for HW/SW Co-design for FPGAs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

Digital Edition

Caption

Algorithm-Hardware Co-Design of Single Shot Detector for Fast Object Detection on FPGAs

2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

ABSTRACT

References

Cited By

Index Terms

Recommendations

ARM Synthesizable Design with Actel FPGAs: with Mixed-Signal SoC Applications (set 3)

Hardware and software infrastructure to implement many-core systems in modern FPGAs

Domain-Specific Language for HW/SW Co-design for FPGAs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

Digital Edition

Share this Publication link

Share on Social Media