ABSTRACT
The recent proposed Deformable Convolutional Networks (DCNs)greatly enhance the performance of conventional Convolutional Neural Networks (CNNs) on vision recognition tasks by allowing flexible input sampling during inference runtime. DCNs introduce an additional convolutional layer for adaptive sampling offset generation, followed by a bilinear interpolation (BLI) algorithm to integerize the generated non-integer offset values. Finally, a regular convolution is performed on the loaded input pixels. Compared with conventional CNNs, DCN demonstrated significantly increased computational complexity and irregular input-dependentmemory access patterns, making it a great challenge for deploying DCNs onto edge devices for real-time computer vision tasks. In this work, we propose RECOIN, a processing-in-memory (PIM) architecture, which supports DCN inference on resistive memory (ReRAM)crossbars, thus making the first DCN inference accelerator possible. We present a novel BLI processing engine that leverage both row-and column-oriented computation for in-situ BLI calculation. Amapping scheme and an address converter are particular designed to accommodate the intensive computation and irregular data access. We implement the DCN inference in a 4-stage pipeline and evaluate the effectiveness of RECOIN on six DCN models. Experimental results show RECOIN achieves respectively 225×and 17.4×improvement in energy efficiency compared to general-purpose CPU and GPU. Compared to two state-of-the-art ASIC accelerators, RECOIN achieve 26.8× and 20.4× speedup respectively.
Supplemental Material
- Alex Krizhevsky et al. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.Google Scholar
- Karen Simonyan et al. Very deep convolutional networks for large-scale image recognition. arXiv, 2014.Google Scholar
- Kaiming He et al. Deep Residual Learning for Image Recognition. In CVPR, 2016.Google Scholar
- Evan Shelhamer et al. Fully convolutional networks for semantic segmentation. IEEE TPAMI, 2017.Google Scholar
- J. Dai et al. Deformable Convolutional Networks. In ICCV, 2017.Google ScholarCross Ref
- Xizhou Zhu et al. Deformable convnets v2: More deformable, better results. In CVPR, 2019.Google Scholar
- Hang Zhang et al. Resnest: Split-attention networks. arXiv, 2020.Google Scholar
- Haozhi Qi et al. Deformable convolutional networks--coco detection and segmentation challenge 2017 entry. In ICCV COCO Challenge Workshop, 2017.Google Scholar
- L. Deng et al. Restricted Deformable Convolution-Based Road Scene Semantic Segmentation Using Surround View Cameras. IEEE T-ITS, 2020.Google ScholarCross Ref
- Xiao Sun et al. Integral Human Pose Regression. In ECCV, 2018.Google Scholar
- K. Mac et al. Learning Motion in Feature Space: Locally-Consistent Deformable Convolution Networks for Fine-Grained Action Detection. In ICCV, 2019.Google ScholarCross Ref
- M Mitchell Waldrop. The chips are down for moore's law. Nature News, 2016.Google Scholar
- Y. Chen et al. Dadiannao: A machine-learning supercomputer. In MICRO, 2014.Google ScholarDigital Library
- A. Shafiee et al. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars. In ISCA, 2016.Google Scholar
- N. P. Jouppi et al. In-data center performance analysis of a tensor processing unit. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), 2017.Google ScholarDigital Library
- L. Song et al. Pipe Layer: A Pipelined ReRAM-Based Accelerator for Deep Learning. In HPCA), 2017.Google ScholarCross Ref
- Qijing Huang et al. Algorithm-hardware Co-design for Deformable Convolution. arXiv, 2020.Google Scholar
- H.. P. Wong et al. Metal--oxide rram. Proceedings of the IEEE, 2012.Google Scholar
- Merced-Grafals. Repeatable, accurate, and high speed multi-level programming of memristor 1T1R arrays for power efficient analog computing applications. Nanotechnology, 2016.Google Scholar
- Min Lin. Network in network. arXiv, 2013.Google Scholar
- V. Badrinarayanan et al. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE TPAMI, 2017.Google ScholarCross Ref
- S. Ren et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE TPAMI, 2017.Google Scholar
- Jia Deng et al. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.Google Scholar
- X. Peng et al. DNN+NeuroSim: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators with Versatile Device Technologies. In IEDM, 2019.Google ScholarCross Ref
Index Terms
- RECOIN: A Low-Power Processing-in-ReRAM Architecture for Deformable Convolution
Recommendations
CNNWire: Boosting Convolutional Neural Network with Winograd on ReRAM based Accelerators
GLSVLSI '19: Proceedings of the 2019 on Great Lakes Symposium on VLSIResistive random access memory (ReRAM) demonstrates the great potential of in-memory processing for neural network (NN) acceleration. However, since the convolutional neural network (CNN) is widely known as compute-bound, current ReRAM-based ...
The restricted h -connectivity of the data center network DCell
Traditional data center networks (DCNs) are faced with many challenges with the development of cloud computing. This fact makes design of new DCNs represented by DCell networks become a hot research topic. For any integers k 0 and n 2 , the k -...
Efficient Process-in-Memory Architecture Design for Unsupervised GAN-based Deep Learning using ReRAM
GLSVLSI '19: Proceedings of the 2019 on Great Lakes Symposium on VLSIThe ending of Moore's Law makes domain-specific architecture as the future of computing. The most representative is the emergence of various deep learning accelerators. Among the proposed solutions, resistive random access memory (ReRAM) based process-...
Comments