research-article

FCN-Engine: Accelerating Deconvolutional Layers in Classic CNN Processors

Authors:

Huawei LiAuthors Info & Claims

2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pages 1 - 6

https://doi.org/10.1145/3240765.3240810

Published: 05 November 2018 Publication History

Abstract

Unlike standard Convolutional Neural Networks (CNNs) with fully-connected layers, Fully Convolutional Neural Networks (FCN) are prevalent in computer vision applications such as object detection, semantic/image segmentation, and the most popular generative tasks based on Generative Adversarial Networks (GAN). In an FCN, traditional convolutional layers and deconvolutional layers contribute to the majority of the computation complexity. However, prior deep learning accelerator designs mostly focus on CNN optimization. They either use independent compute-resources to handle deconvolution or convert deconvolutional layers (Deconv) into general convolution operations, which arouses considerable overhead. To address this problem, we propose a unified fully convolutional accelerator aiming to handle both the deconvolutional and convolutional layers with a single processing element (PE) array. We re-optimize the conventional CNN accelerator architecture of regular 2D processing elements array, to enable it more efficiently support the data flow of deconvolutional layer inference. By exploiting the locality in deconvolutional filters, this architecture reduces the consumption of on-chip memory communication from 24.79 GB to 6.56 GB and improves the power efficiency significantly. Compared to prior baseline deconvolution acceleration scheme, the proposed accelerator achieves 1.3X–44.9X speedup and reduces the energy consumption by 14.60/0-97.6% on a set of representative benchmark applications. Meanwhile, it keeps similar CNN inference performance to that of an optimized CNN-only accelerator with negligible power consumption and chip area overhead.

References

[1]

Han S, Liu X, Mao H et al., EIE: efficient inference engine on compressed deep neural network. In ISCA, 2016.

[2]

Wang Y, Xu J, Han Y et al., DeepBurning: automatic generation of FPGA-based learning accelerators for the neural network family. In DAC, 2016.

[3]

Wei X, Yu CH, Zhang P et al., Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In DAC, 2017.

[4]

Song L, Wang Y, Han Y, C Brain, et al: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization. In DAC, 2016.

[5]

Albericio J, Judd P, Hetherington T, et al., Cnvlutin: Ineffectual-neuron-free deep neural network computing. In ISCA, 2016.

[6]

Y.-H. Chen, J. Emer, and V. Sze, Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. In ISCA, 2016.

[7]

Bromberger M, Bastian P, Bergeest JP, et al., FPGA-accelerated Ri chardson-Lucy deconvolution for 3D image data. In ISBI, 2016.

[8]

Zhang X, Das S, Neopane O, et al., A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA. arXiv preprint arXiv:, 2017.

[9]

Radford A, Metz L, Chintala S, Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:, 2015.

[10]

Miyato T, Kataoka T, Koyama M, et al., Spectral Normalization for Generative Adversarial Networks. In Implicit Models Workshop, 2017.

[11]

Christ PF, Elshaer MEA, Ettlinger F, et al., Automatic liver and lesion segmentation in CT using cascaded fully convolutional neural networks and 3D conditional random fields. In MICCAI, 2016.

[12]

Long J, Shelhamer E, Darrell T, Fully convolutional networks for semantic segmentation. In CVPR, 2015.

[13]

Godard C, Mac O Aodha, Brostow GJ, Unsupervised monocular depth estimation with left-right consistency. In CVPR, 2017.

[14]

Zhang C, Fang Z, Zhou P, et al., Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks. In ICCAD, 2016.

[15]

Aydonat U, O'Connell S, Capalija D, et al., An OpenCL deep learning accelerator on Arria 10. In FPGA, 2017.

[16]

Parashar A, Rhu M, Mukkara A, et al., SCNN: An accelerator for compressed-sparse convolutional neural networks. In ISCA, 2017.

[17]

Amir Yazdanbakhsh, Hajar Falahati, Philip J. Wolfe, GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks. In ISCA, 2018.

[18]

Amir Yazdanbakhsh, Michael Brzozowski, Behnam Khaleghi, et al., FlexiGAN: An End-to-End Solution for FPGA Acceleration of Generative Adversarial Networks. In FCCM, 2018.

Cited By

Mao WYang PWang Z(2023)FTA-GAN: A Computation-Efficient Accelerator for GANs With Fast Transformation AlgorithmIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.311072834:6(2978-2992)Online publication date: Jun-2023
https://doi.org/10.1109/TNNLS.2021.3110728
Ma ZLuo G(2023)An Efficient Dataflow for Convolutional Generative Models2023 International Conference on Field Programmable Technology (ICFPT)10.1109/ICFPT59805.2023.00011(53-59)Online publication date: 12-Dec-2023
https://doi.org/10.1109/ICFPT59805.2023.00011
Chen GZheng YZhou ZHe SYi W(2023)A GPU-accelerated real-time human voice separation framework for mobile phonesJournal of Systems Architecture10.1016/j.sysarc.2023.103005145(103005)Online publication date: Dec-2023
https://doi.org/10.1016/j.sysarc.2023.103005
Show More Cited By

Index Terms

FCN-Engine: Accelerating Deconvolutional Layers in Classic CNN Processors

Index terms have been assigned to the content through auto-classification.

Recommendations

Accelerating computing with the cell broadband engine processor
CF '08: Proceedings of the 5th conference on Computing frontiers

In this paper, we describe our approach to utilizing the compute power of the Cell Broadband Engine™ (Cell/B.E.)¹ processor as an accelerator for computationally intensive portions of high performance computing applications. We call this approach "...
Accelerating Pattern Matching on Intel Xeon Phi Processors
Algorithms and Architectures for Parallel Processing
Abstract
Pattern matching algorithms are used in several areas such as network security, bioinformatics and text mining. In order to provide real-time response for large inputs, high-performance systems should be used. However, this requires adapting the ...
Accelerating Graph Structural Clustering Algorithms on Heterogeneous Processors

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Nov 2018

939 pages

Copyright © 2018.

Publisher

IEEE Press

Publication History

Published: 05 November 2018

Permissions

Request permissions for this article.

Request Permissions

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
444
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mao WYang PWang Z(2023)FTA-GAN: A Computation-Efficient Accelerator for GANs With Fast Transformation AlgorithmIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.311072834:6(2978-2992)Online publication date: Jun-2023
https://doi.org/10.1109/TNNLS.2021.3110728
Ma ZLuo G(2023)An Efficient Dataflow for Convolutional Generative Models2023 International Conference on Field Programmable Technology (ICFPT)10.1109/ICFPT59805.2023.00011(53-59)Online publication date: 12-Dec-2023
https://doi.org/10.1109/ICFPT59805.2023.00011
Chen GZheng YZhou ZHe SYi W(2023)A GPU-accelerated real-time human voice separation framework for mobile phonesJournal of Systems Architecture10.1016/j.sysarc.2023.103005145(103005)Online publication date: Dec-2023
https://doi.org/10.1016/j.sysarc.2023.103005
Bansal KSingh AVerma SKavita Jhanjhi NShorfuzzaman MMasud M(2022)Evolving CNN with Paddy Field Algorithm for Geographical Landmark RecognitionElectronics10.3390/electronics1107107511:7(1075)Online publication date: 29-Mar-2022
https://doi.org/10.3390/electronics11071075
Ma ZDai TWei XLuo G(2022)An Intermediate-Centric Dataflow for Transposed Convolution Acceleration on FPGAACM Transactions on Embedded Computing Systems10.1145/356105322:6(1-22)Online publication date: 1-Sep-2022
https://dl.acm.org/doi/10.1145/3561053
Chen GZhou ZHe SZheng YYi W(2022)VoiceBit: GPU-Accelerated Real-Time Human Voice Separation for Mobile Phones2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00296(1987-1994)Online publication date: Dec-2022
https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00296
Kang SHan DLee JIm DKim SKim SRyu JYoo H(2021)GANPU: An Energy-Efficient Multi-DNN Training Processor for GANs With Speculative Dual-Sparsity ExploitationIEEE Journal of Solid-State Circuits10.1109/JSSC.2021.306657256:9(2845-2857)Online publication date: Sep-2021
https://doi.org/10.1109/JSSC.2021.3066572
Ma ZLuo G(2021)TOCO: A Systolic Network for Efficient Transposed Convolutions with Output-Reuse Paths2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM51124.2021.00060(275-275)Online publication date: May-2021
https://doi.org/10.1109/FCCM51124.2021.00060
Kim HKim JChoi JLee JSong Y(2021)Binarized Encoder-Decoder Network and Binarized Deconvolution Engine for Semantic SegmentationIEEE Access10.1109/ACCESS.2020.30483759(8006-8027)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2020.3048375
Shrivastava NHanif MMittal SSarangi SShafique M(2021)A survey of hardware architectures for generative adversarial networksJournal of Systems Architecture10.1016/j.sysarc.2021.102227(102227)Online publication date: Jun-2021
https://doi.org/10.1016/j.sysarc.2021.102227
Show More Cited By

View Options

View options

Figures

Tables

Media

View Table of Conten