skip to main content
10.1145/3316781.3317736acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

A 1.17 TOPS/W, 150fps Accelerator for Multi-Face Detection and Alignment

Authors Info & Claims
Published:02 June 2019Publication History

ABSTRACT

Face detection and alignment are highly-correlated, computation-intensive tasks, without being flexibly supported by any facial-oriented accelerator yet. This work proposes the first unified accelerator for multi-face detection and alignment, along with the optimizations on multi-task cascaded convolutional networks algorithm, to implement both multi-face detection and alignment. First, the clustering non-maximum suppression is proposed to significantly reduce intersection over union computation and eliminate the hardware-interfer-ence sorting process, bringing 16.0% speed-up without any loss. Second, a new pipeline architecture is presented to implement the proposal network in more computation-efficient manner, with 41.7% less multiplier usage and 38.3% decrease in memory capacity compared with the similar method. Third, a batch schedule mechanism is proposed to improve hardware utilization of fully-connected layer by 16.7% on average with variable input number in batch process. Based on the TSMC 28 nm CMOS process, this accelerator only consumes 6.7ms at 400 MHz to simultaneously process 5 faces for each image and achieves 1.17 TOPS/W power efficiency, which is 54.8× higher than the state-of-the-art solution.

References

  1. Manoj Alwani et al. 2016. Fused-layer CNN accelerators. In IEEE MIRCO. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Kyeongryeol Bong et al. 2017. A 0.62mW ultra-low-power convolutional-neural-network face-recognition processor and a CIS integrated with always-on haar-like face detector. In ISSCC. 248--249.Google ScholarGoogle Scholar
  3. Sungpill Choi et al. 2018. A 9.02mW CNN-stereo-based real-time 3D hand-gesture recognition processor for smart mobile devices. In ISSCC.Google ScholarGoogle Scholar
  4. H. Mo et al. 2019. Face Alignment with Expression- and Pose-Based Adaptive Initialization. IEEE TMM 21, 4 (2019), 943--956.Google ScholarGoogle Scholar
  5. Richardson et al. 2017. Learning Detailed Face Reconstruction From a Single Image. In The IEEE Conference on CVPR.Google ScholarGoogle Scholar
  6. S. Kang et al. 2018. B-Face: 0.2 MW CNN-Based Face Recognition Processor with Face Alignment for Mobile User Identification. In 2018 IEEE Symposium on VLSI Circuits. 137--138.Google ScholarGoogle ScholarCross RefCross Ref
  7. Z. Yuan et al. 2018. Sticker: A 0.41-62.1 TOPS/W 8Bit Neural Network Processor with Multi-Sparsity Compatible Convolution Arrays and Online Tuning Acceleration for Fully Connected Layers. In 2018 IEEE Symposium on VLSI Circuits. 33--34.Google ScholarGoogle ScholarCross RefCross Ref
  8. Z. Zhang et al. 2016. Learning Deep Representation for Face Alignment with Auxiliary Attributes. IEEE TPAMI 38 (2016), 918--930. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Changhyeon Kim et al. 2017. An ultra-low-power and mixed-mode event-driven face detection SoC for always-on mobile applications. In IEEE ESSCC. 255--258.Google ScholarGoogle Scholar
  10. Weiyang Liu et al. 2017. SphereFace: Deep Hypersphere Embedding for Face Recognition. In The IEEE Conference on CVPR.Google ScholarGoogle Scholar
  11. Bert Moons et al. 2017. 14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI. ISSCC, 246--247.Google ScholarGoogle Scholar
  12. Rajeev Ranjan et al. 2015. A deep pyramid Deformable Part Model for face detection. In IEEE International Conference on BTAS. 1--8.Google ScholarGoogle Scholar
  13. Qiang Wang et al. 2017. A 700fps Optimized Coarse-to-Fine Shape Searching Based Hardware Accelerator for Face Alignment. In Design Automation Conference. 57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Shouyi Yin et al. 2018. A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications. IEEE JSSC 53, 4 (2018), 968--982.Google ScholarGoogle Scholar
  15. Kaipeng Zhang et al. 2016. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Processing Letters 23, 10 (2016), 1499--1503.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019
    June 2019
    1378 pages
    ISBN:9781450367257
    DOI:10.1145/3316781

    Copyright © 2019 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 2 June 2019

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate1,770of5,499submissions,32%

    Upcoming Conference

    DAC '24
    61st ACM/IEEE Design Automation Conference
    June 23 - 27, 2024
    San Francisco , CA , USA

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader