ABSTRACT
Face detection and alignment are highly-correlated, computation-intensive tasks, without being flexibly supported by any facial-oriented accelerator yet. This work proposes the first unified accelerator for multi-face detection and alignment, along with the optimizations on multi-task cascaded convolutional networks algorithm, to implement both multi-face detection and alignment. First, the clustering non-maximum suppression is proposed to significantly reduce intersection over union computation and eliminate the hardware-interfer-ence sorting process, bringing 16.0% speed-up without any loss. Second, a new pipeline architecture is presented to implement the proposal network in more computation-efficient manner, with 41.7% less multiplier usage and 38.3% decrease in memory capacity compared with the similar method. Third, a batch schedule mechanism is proposed to improve hardware utilization of fully-connected layer by 16.7% on average with variable input number in batch process. Based on the TSMC 28 nm CMOS process, this accelerator only consumes 6.7ms at 400 MHz to simultaneously process 5 faces for each image and achieves 1.17 TOPS/W power efficiency, which is 54.8× higher than the state-of-the-art solution.
- Manoj Alwani et al. 2016. Fused-layer CNN accelerators. In IEEE MIRCO. 1--12. Google ScholarDigital Library
- Kyeongryeol Bong et al. 2017. A 0.62mW ultra-low-power convolutional-neural-network face-recognition processor and a CIS integrated with always-on haar-like face detector. In ISSCC. 248--249.Google Scholar
- Sungpill Choi et al. 2018. A 9.02mW CNN-stereo-based real-time 3D hand-gesture recognition processor for smart mobile devices. In ISSCC.Google Scholar
- H. Mo et al. 2019. Face Alignment with Expression- and Pose-Based Adaptive Initialization. IEEE TMM 21, 4 (2019), 943--956.Google Scholar
- Richardson et al. 2017. Learning Detailed Face Reconstruction From a Single Image. In The IEEE Conference on CVPR.Google Scholar
- S. Kang et al. 2018. B-Face: 0.2 MW CNN-Based Face Recognition Processor with Face Alignment for Mobile User Identification. In 2018 IEEE Symposium on VLSI Circuits. 137--138.Google ScholarCross Ref
- Z. Yuan et al. 2018. Sticker: A 0.41-62.1 TOPS/W 8Bit Neural Network Processor with Multi-Sparsity Compatible Convolution Arrays and Online Tuning Acceleration for Fully Connected Layers. In 2018 IEEE Symposium on VLSI Circuits. 33--34.Google ScholarCross Ref
- Z. Zhang et al. 2016. Learning Deep Representation for Face Alignment with Auxiliary Attributes. IEEE TPAMI 38 (2016), 918--930. Google ScholarDigital Library
- Changhyeon Kim et al. 2017. An ultra-low-power and mixed-mode event-driven face detection SoC for always-on mobile applications. In IEEE ESSCC. 255--258.Google Scholar
- Weiyang Liu et al. 2017. SphereFace: Deep Hypersphere Embedding for Face Recognition. In The IEEE Conference on CVPR.Google Scholar
- Bert Moons et al. 2017. 14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI. ISSCC, 246--247.Google Scholar
- Rajeev Ranjan et al. 2015. A deep pyramid Deformable Part Model for face detection. In IEEE International Conference on BTAS. 1--8.Google Scholar
- Qiang Wang et al. 2017. A 700fps Optimized Coarse-to-Fine Shape Searching Based Hardware Accelerator for Face Alignment. In Design Automation Conference. 57. Google ScholarDigital Library
- Shouyi Yin et al. 2018. A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications. IEEE JSSC 53, 4 (2018), 968--982.Google Scholar
- Kaipeng Zhang et al. 2016. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Processing Letters 23, 10 (2016), 1499--1503.Google ScholarCross Ref
Recommendations
A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks (Abstract Only)
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysFPGA-based hardware accelerator for convolutional neural networks (CNNs) has obtained great attentions due to its higher energy efficiency than GPUs. However, it has been a challenge for FPGA-based solutions to achieve a higher throughput than GPU ...
Accelerator: using data parallelism to program GPUs for general-purpose uses
Proceedings of the 2006 ASPLOS ConferenceGPUs are difficult to program for general-purpose uses. Programmers can either learn graphics APIs and convert their applications to use graphics pipeline operations or they can use stream programming abstractions of GPUs. We describe Accelerator, a ...
Parallelization of a color-entropy preprocessed Chan–Vese model for face contour detection on multi-core CPU and GPU
Highlights- We introduce a novel way to parallelize a face contour detecting application.
- ...
AbstractFace tracking is an important computer vision technology that has been widely adopted in many areas, from cell phone applications to industry robots. In this paper, we introduce a novel way to parallelize a face contour detecting ...
Comments