research-article

A 1.17 TOPS/W, 150fps Accelerator for Multi-Face Detection and Alignment

Authors:
Huiyu Mo

Institute of Microelectronics, Tsinghua University, Beijing, China

Institute of Microelectronics, Tsinghua University, Beijing, China
View Profile

,
Leibo Liu

Institute of Microelectronics, Tsinghua University, Beijing, China

Institute of Microelectronics, Tsinghua University, Beijing, China
View Profile

,
Wenping Zhu

Institute of Semiconductors, Chinese Academy of Sciences, Beijing, China

Institute of Semiconductors, Chinese Academy of Sciences, Beijing, China
View Profile

,
Qiang Li

Intel Corporation, Beijing, China

Intel Corporation, Beijing, China
View Profile

,
Hong Liu

Institute of Microelectronics, Tsinghua University, Beijing, China

Institute of Microelectronics, Tsinghua University, Beijing, China
View Profile

,
Wenjing Hu

Institute of Microelectronics, Tsinghua University, Beijing, China

Institute of Microelectronics, Tsinghua University, Beijing, China
View Profile

,
Yao Wang

Institute of Microelectronics, Tsinghua University, Beijing, China

Institute of Microelectronics, Tsinghua University, Beijing, China
View Profile

,
Shaojun Wei

Institute of Microelectronics, Tsinghua University, Beijing, China

Institute of Microelectronics, Tsinghua University, Beijing, China
View Profile

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019June 2019Article No.: 80Pages 1–6https://doi.org/10.1145/3316781.3317736

Published:02 June 2019Publication History

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

Pages 1–6

ABSTRACT

Face detection and alignment are highly-correlated, computation-intensive tasks, without being flexibly supported by any facial-oriented accelerator yet. This work proposes the first unified accelerator for multi-face detection and alignment, along with the optimizations on multi-task cascaded convolutional networks algorithm, to implement both multi-face detection and alignment. First, the clustering non-maximum suppression is proposed to significantly reduce intersection over union computation and eliminate the hardware-interfer-ence sorting process, bringing 16.0% speed-up without any loss. Second, a new pipeline architecture is presented to implement the proposal network in more computation-efficient manner, with 41.7% less multiplier usage and 38.3% decrease in memory capacity compared with the similar method. Third, a batch schedule mechanism is proposed to improve hardware utilization of fully-connected layer by 16.7% on average with variable input number in batch process. Based on the TSMC 28 nm CMOS process, this accelerator only consumes 6.7ms at 400 MHz to simultaneously process 5 faces for each image and achieves 1.17 TOPS/W power efficiency, which is 54.8× higher than the state-of-the-art solution.

References

Manoj Alwani et al. 2016. Fused-layer CNN accelerators. In IEEE MIRCO. 1--12. Google ScholarDigital Library
Kyeongryeol Bong et al. 2017. A 0.62mW ultra-low-power convolutional-neural-network face-recognition processor and a CIS integrated with always-on haar-like face detector. In ISSCC. 248--249.Google Scholar
Sungpill Choi et al. 2018. A 9.02mW CNN-stereo-based real-time 3D hand-gesture recognition processor for smart mobile devices. In ISSCC.Google Scholar
H. Mo et al. 2019. Face Alignment with Expression- and Pose-Based Adaptive Initialization. IEEE TMM 21, 4 (2019), 943--956.Google Scholar
Richardson et al. 2017. Learning Detailed Face Reconstruction From a Single Image. In The IEEE Conference on CVPR.Google Scholar
S. Kang et al. 2018. B-Face: 0.2 MW CNN-Based Face Recognition Processor with Face Alignment for Mobile User Identification. In 2018 IEEE Symposium on VLSI Circuits. 137--138.Google ScholarCross Ref
Z. Yuan et al. 2018. Sticker: A 0.41-62.1 TOPS/W 8Bit Neural Network Processor with Multi-Sparsity Compatible Convolution Arrays and Online Tuning Acceleration for Fully Connected Layers. In 2018 IEEE Symposium on VLSI Circuits. 33--34.Google ScholarCross Ref
Z. Zhang et al. 2016. Learning Deep Representation for Face Alignment with Auxiliary Attributes. IEEE TPAMI 38 (2016), 918--930. Google ScholarDigital Library
Changhyeon Kim et al. 2017. An ultra-low-power and mixed-mode event-driven face detection SoC for always-on mobile applications. In IEEE ESSCC. 255--258.Google Scholar
Weiyang Liu et al. 2017. SphereFace: Deep Hypersphere Embedding for Face Recognition. In The IEEE Conference on CVPR.Google Scholar
Bert Moons et al. 2017. 14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI. ISSCC, 246--247.Google Scholar
Rajeev Ranjan et al. 2015. A deep pyramid Deformable Part Model for face detection. In IEEE International Conference on BTAS. 1--8.Google Scholar
Qiang Wang et al. 2017. A 700fps Optimized Coarse-to-Fine Shape Searching Based Hardware Accelerator for Face Alignment. In Design Automation Conference. 57. Google ScholarDigital Library
Shouyi Yin et al. 2018. A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications. IEEE JSSC 53, 4 (2018), 968--982.Google Scholar
Kaipeng Zhang et al. 2016. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Processing Letters 23, 10 (2016), 1499--1503.Google ScholarCross Ref

Recommendations

A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks (Abstract Only)
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

FPGA-based hardware accelerator for convolutional neural networks (CNNs) has obtained great attentions due to its higher energy efficiency than GPUs. However, it has been a challenge for FPGA-based solutions to achieve a higher throughput than GPU ...
Read More
Accelerator: using data parallelism to program GPUs for general-purpose uses
Proceedings of the 2006 ASPLOS Conference

GPUs are difficult to program for general-purpose uses. Programmers can either learn graphics APIs and convert their applications to use graphics pipeline operations or they can use stream programming abstractions of GPUs. We describe Accelerator, a ...
Read More
Parallelization of a color-entropy preprocessed Chan–Vese model for face contour detection on multi-core CPU and GPU
Highlights
- We introduce a novel way to parallelize a face contour detecting application.
- ...
Abstract
Face tracking is an important computer vision technology that has been widely adopted in many areas, from cell phone applications to industry robots. In this paper, we introduce a novel way to parallelize a face contour detecting ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019
June 2019
1378 pages
ISBN:9781450367257
DOI:10.1145/3316781

Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 June 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate1,770of5,499submissions,32%
Upcoming Conference
DAC '24

Sponsor:

sigda

61st ACM/IEEE Design Automation Conference

June 23 - 27, 2024

San Francisco , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 755
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A 1.17 TOPS/W, 150fps Accelerator for Multi-Face Detection and Alignment

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

ABSTRACT

References

Cited By

Recommendations

A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks (Abstract Only)

Accelerator: using data parallelism to program GPUs for general-purpose uses

Parallelization of a color-entropy preprocessed Chan–Vese model for face contour detection on multi-core CPU and GPU

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A 1.17 TOPS/W, 150fps Accelerator for Multi-Face Detection and Alignment

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

ABSTRACT

References

Cited By

Recommendations

A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks (Abstract Only)

Accelerator: using data parallelism to program GPUs for general-purpose uses

Parallelization of a color-entropy preprocessed Chan–Vese model for face contour detection on multi-core CPU and GPU

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media