skip to main content
10.1145/3240765.3240810guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

FCN-Engine: Accelerating Deconvolutional Layers in Classic CNN Processors

Published: 05 November 2018 Publication History

Abstract

Unlike standard Convolutional Neural Networks (CNNs) with fully-connected layers, Fully Convolutional Neural Networks (FCN) are prevalent in computer vision applications such as object detection, semantic/image segmentation, and the most popular generative tasks based on Generative Adversarial Networks (GAN). In an FCN, traditional convolutional layers and deconvolutional layers contribute to the majority of the computation complexity. However, prior deep learning accelerator designs mostly focus on CNN optimization. They either use independent compute-resources to handle deconvolution or convert deconvolutional layers (Deconv) into general convolution operations, which arouses considerable overhead. To address this problem, we propose a unified fully convolutional accelerator aiming to handle both the deconvolutional and convolutional layers with a single processing element (PE) array. We re-optimize the conventional CNN accelerator architecture of regular 2D processing elements array, to enable it more efficiently support the data flow of deconvolutional layer inference. By exploiting the locality in deconvolutional filters, this architecture reduces the consumption of on-chip memory communication from 24.79 GB to 6.56 GB and improves the power efficiency significantly. Compared to prior baseline deconvolution acceleration scheme, the proposed accelerator achieves 1.3X–44.9X speedup and reduces the energy consumption by 14.60/0-97.6% on a set of representative benchmark applications. Meanwhile, it keeps similar CNN inference performance to that of an optimized CNN-only accelerator with negligible power consumption and chip area overhead.

References

[1]
Han S, Liu X, Mao H et al., EIE: efficient inference engine on compressed deep neural network. In ISCA, 2016.
[2]
Wang Y, Xu J, Han Y et al., DeepBurning: automatic generation of FPGA-based learning accelerators for the neural network family. In DAC, 2016.
[3]
Wei X, Yu CH, Zhang P et al., Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In DAC, 2017.
[4]
Song L, Wang Y, Han Y, C Brain, et al: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization. In DAC, 2016.
[5]
Albericio J, Judd P, Hetherington T, et al., Cnvlutin: Ineffectual-neuron-free deep neural network computing. In ISCA, 2016.
[6]
Y.-H. Chen, J. Emer, and V. Sze, Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. In ISCA, 2016.
[7]
Bromberger M, Bastian P, Bergeest JP, et al., FPGA-accelerated Ri chardson-Lucy deconvolution for 3D image data. In ISBI, 2016.
[8]
Zhang X, Das S, Neopane O, et al., A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA. arXiv preprint arXiv:, 2017.
[9]
Radford A, Metz L, Chintala S, Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:, 2015.
[10]
Miyato T, Kataoka T, Koyama M, et al., Spectral Normalization for Generative Adversarial Networks. In Implicit Models Workshop, 2017.
[11]
Christ PF, Elshaer MEA, Ettlinger F, et al., Automatic liver and lesion segmentation in CT using cascaded fully convolutional neural networks and 3D conditional random fields. In MICCAI, 2016.
[12]
Long J, Shelhamer E, Darrell T, Fully convolutional networks for semantic segmentation. In CVPR, 2015.
[13]
Godard C, Mac O Aodha, Brostow GJ, Unsupervised monocular depth estimation with left-right consistency. In CVPR, 2017.
[14]
Zhang C, Fang Z, Zhou P, et al., Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks. In ICCAD, 2016.
[15]
Aydonat U, O'Connell S, Capalija D, et al., An OpenCL deep learning accelerator on Arria 10. In FPGA, 2017.
[16]
Parashar A, Rhu M, Mukkara A, et al., SCNN: An accelerator for compressed-sparse convolutional neural networks. In ISCA, 2017.
[17]
Amir Yazdanbakhsh, Hajar Falahati, Philip J. Wolfe, GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks. In ISCA, 2018.
[18]
Amir Yazdanbakhsh, Michael Brzozowski, Behnam Khaleghi, et al., FlexiGAN: An End-to-End Solution for FPGA Acceleration of Generative Adversarial Networks. In FCCM, 2018.

Cited By

View all
  • (2023)FTA-GAN: A Computation-Efficient Accelerator for GANs With Fast Transformation AlgorithmIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.311072834:6(2978-2992)Online publication date: Jun-2023
  • (2023)An Efficient Dataflow for Convolutional Generative Models2023 International Conference on Field Programmable Technology (ICFPT)10.1109/ICFPT59805.2023.00011(53-59)Online publication date: 12-Dec-2023
  • (2023)A GPU-accelerated real-time human voice separation framework for mobile phonesJournal of Systems Architecture10.1016/j.sysarc.2023.103005145(103005)Online publication date: Dec-2023
  • Show More Cited By

Index Terms

  1. FCN-Engine: Accelerating Deconvolutional Layers in Classic CNN Processors
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Guide Proceedings
        2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
        Nov 2018
        939 pages

        Publisher

        IEEE Press

        Publication History

        Published: 05 November 2018

        Permissions

        Request permissions for this article.

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 15 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)FTA-GAN: A Computation-Efficient Accelerator for GANs With Fast Transformation AlgorithmIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.311072834:6(2978-2992)Online publication date: Jun-2023
        • (2023)An Efficient Dataflow for Convolutional Generative Models2023 International Conference on Field Programmable Technology (ICFPT)10.1109/ICFPT59805.2023.00011(53-59)Online publication date: 12-Dec-2023
        • (2023)A GPU-accelerated real-time human voice separation framework for mobile phonesJournal of Systems Architecture10.1016/j.sysarc.2023.103005145(103005)Online publication date: Dec-2023
        • (2022)Evolving CNN with Paddy Field Algorithm for Geographical Landmark RecognitionElectronics10.3390/electronics1107107511:7(1075)Online publication date: 29-Mar-2022
        • (2022)An Intermediate-Centric Dataflow for Transposed Convolution Acceleration on FPGAACM Transactions on Embedded Computing Systems10.1145/356105322:6(1-22)Online publication date: 1-Sep-2022
        • (2022)VoiceBit: GPU-Accelerated Real-Time Human Voice Separation for Mobile Phones2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00296(1987-1994)Online publication date: Dec-2022
        • (2021)GANPU: An Energy-Efficient Multi-DNN Training Processor for GANs With Speculative Dual-Sparsity ExploitationIEEE Journal of Solid-State Circuits10.1109/JSSC.2021.306657256:9(2845-2857)Online publication date: Sep-2021
        • (2021)TOCO: A Systolic Network for Efficient Transposed Convolutions with Output-Reuse Paths2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM51124.2021.00060(275-275)Online publication date: May-2021
        • (2021)Binarized Encoder-Decoder Network and Binarized Deconvolution Engine for Semantic SegmentationIEEE Access10.1109/ACCESS.2020.30483759(8006-8027)Online publication date: 2021
        • (2021)A survey of hardware architectures for generative adversarial networksJournal of Systems Architecture10.1016/j.sysarc.2021.102227(102227)Online publication date: Jun-2021
        • Show More Cited By

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media