research-article

Exploiting Parallelism for Convolutional Connections in Processing-In-Memory Architecture

Authors:

Jing YangAuthors Info & Claims

DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017

Article No.: 4, Pages 1 - 6

https://doi.org/10.1145/3061639.3062242

Published: 18 June 2017 Publication History

Abstract

Deep convolutional neural networks (CNNs) are widely adopted in intelligent systems with unprecedented accuracy but at the cost of a substantial amount of data movement. Although recent development in processing-in-memory (PIM) architecture seeks to minimize data movement by computing the data at the dedicated nonvolatile device, how to jointly explore the computation capability of PIM and utilize the highly parallel property of neural network remains a critical issue.

This paper presents Para-CONV, that exploits parallelism for deterministic convolutional connections in PIM architecture. Para-CONV achieveldata-level parallelism for convolutions by fully utilizing the on-chip processing engine (PE) in PIM. The objective is to minimize data movement and data fetching from off-PE DRAM for inter-PE communications. We formulate this data allocation problem as a dynamic programming model and obtain an optimal solution. Para-CONV is evaluated through a set of benchmarks from both real-life CNN applications and synthetic task graphs. The experimental results show that Para-CONV can significantly improve the throughput and reduce data movement compared with the baseline scheme.

References

[1]

A. Graves et al. Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626):471--476, Oct. 2016.

[2]

J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos. Cnvlutin: Ineffectual-neuron-free deep neural network computing. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA '16), pages 1--13, June 2016.

Digital Library

[3]

Y. H. Chen, J. Emer, and V. Sze. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA '16), pages 367--379, June 2016.

Digital Library

[4]

P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA '16), pages 27--39, June 2016.

Digital Library

[5]

J. Chung and T. Shin. Simplifying deep neural networks for neuromorphic architectures. In 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC '16), pages 126:1--126:6, June 2016.

Digital Library

[6]

B. Donyanavard, T. Mück, S. Sarma, and N. Dutt. Sparta: Runtime task allocation for energy efficient heterogeneous many-cores. In Proceedings of the Eleventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES '16), pages 27:1--27:10, 2016.

Digital Library

[7]

M. Horowitz. 1.1 computing's energy problem (and what wecan do about it). In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC '14), pages 10--14, Feb 2014.

[8]

D. Kim, J. Kung, S. Chai, S. Yalamanchili, and S. Mukhopadhyay. Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA '16), pages 380--392, June 2016.

Digital Library

[9]

J. L. Krichmar, P. Coussy, and N. Dutt. Large-scale spiking neural networks using neuromorphic hardware compatible models. ACM Journal on Emerging Technologies in Computing Systems (JETC), 11(4):36:1--36:18, Apr. 2015.

Digital Library

[10]

Y. LeCun, K. Kavukcuoglu, and C. Farabet. Convolutional networks and applications in vision. In Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS '10), pages 253--256, May 2010.

[11]

J. Lee, J. H. Ahn, and K. Choi. Buffered compares: Excavating the hidden parallelism inside DRAM architectures with lightweight logic. In 2016 Design, Automation Test in Europe Conference Exhibition (DATE '16), pages 1243--1248, March 2016.

Digital Library

[12]

S. Li, C. Xu, Q. Zou, J. Zhao, Y. Lu, and Y. Xie. Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories. In 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC '16), pages 173:1--173:6, June 2016.

Digital Library

[13]

N. L. Passos and E. H. M. Sha. Synchronous circuit optimization via multidimensional retiming. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 43(7):507--519, Jul 1996.

[14]

M. Poremba, S. Mittal, D. Li, J. S. Vetter, and Y. Xie. DESTINY: A tool for modeling emerging 3D NVM and eDRAM caches. In 2015 Design, Automation Test in Europe Conference Exhibition (DATE '15), pages 1543--1546, March 2015.

Digital Library

[15]

L. Song, Y. Wang, Y. Han, X. Zhao, B. Liu, and X. Li. C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization. In 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC '16), pages 123:1--123:6, June 2016.

Digital Library

[16]

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR '15), pages 1--9, June 2015.

[17]

W. Wen, C. Wu, Y. Wang, K. Nixon, Q. Wu, M. Barnell, H. Li, and Y. Chen. A new learning method for inference accuracy, core occupation, and performance co-optimization on TrueNorthchip. In 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC '16), pages 18:1--18:6, June 2016.

Digital Library

[18]

L. Xia, T. Tang, W. Huangfu, M. Cheng, X. Yin, B. Li, Y. Wang, and H. Yang. Switched by input: Power efficient structure for RRAM-based convolutional neural network. In 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC '16), pages 125:1--125:6, June 2016.

Digital Library

[19]

C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '15), pages 161--170, New York, NY, USA, 2015. ACM.

Digital Library

Cited By

Wang JDu HDing BXu QChen SKang Y(2022)DDAM: Data Distribution-Aware Mapping of CNNs on Processing-In-Memory SystemsACM Transactions on Design Automation of Electronic Systems10.1145/357619628:3(1-30)Online publication date: 15-Dec-2022
https://dl.acm.org/doi/10.1145/3576196
Chen JLin GChen JWang Y(2021)Towards efficient allocation of graph convolutional networks on hybrid computation-in-memory architectureScience China Information Sciences10.1007/s11432-020-3248-y64:6Online publication date: 10-May-2021
https://doi.org/10.1007/s11432-020-3248-y
Wang YChen WYang JLi T(2019)Exploiting Parallelism for CNN Applications on 3D Stacked Processing-In-Memory ArchitectureIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.286806230:3(589-600)Online publication date: 1-Mar-2019
https://doi.org/10.1109/TPDS.2018.2868062

Index Terms

Recommendations

Towards memory-efficient processing-in-memory architecture for convolutional neural networks
LCTES 2017: Proceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems

Convolutional neural networks (CNNs) are widely adopted in artificial intelligent systems. In contrast to conventional computing centric applications, the computational and memory resources of CNN applications are mixed together in the network weights. ...
Towards memory-efficient processing-in-memory architecture for convolutional neural networks
LCTES '17

Convolutional neural networks (CNNs) are widely adopted in artificial intelligent systems. In contrast to conventional computing centric applications, the computational and memory resources of CNN applications are mixed together in the network weights. ...
Redesign the Memory Allocator for Non-Volatile Main Memory
Special Issue on Hardware and Algorithms for Learning On-a-chip and Special Issue on Alternative Computing Systems

The non-volatile memory (NVM) has the merits of byte-addressability, fast speed, persistency and low power consumption, which make it attractive to be used as main memory. Commonly, user process dynamically acquires memory through memory allocators. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017

June 2017

533 pages

ISBN:9781450349277

DOI:10.1145/3061639

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

EDAC: Electronic Design Automation Consortium
SIGDA: ACM Special Interest Group on Design Automation
IEEE-CEDA

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Natural Science Foundation of China
Shenzhen Science and Technology Foundation
Natural Science Foundation of SZU
Guangdong Natural Science Foundation

Conference

DAC '17

Sponsor:

EDAC
SIGDA

DAC '17: The 54th Annual Design Automation Conference 2017

June 18 - 22, 2017

TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
447
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang JDu HDing BXu QChen SKang Y(2022)DDAM: Data Distribution-Aware Mapping of CNNs on Processing-In-Memory SystemsACM Transactions on Design Automation of Electronic Systems10.1145/357619628:3(1-30)Online publication date: 15-Dec-2022
https://dl.acm.org/doi/10.1145/3576196
Chen JLin GChen JWang Y(2021)Towards efficient allocation of graph convolutional networks on hybrid computation-in-memory architectureScience China Information Sciences10.1007/s11432-020-3248-y64:6Online publication date: 10-May-2021
https://doi.org/10.1007/s11432-020-3248-y
Wang YChen WYang JLi T(2019)Exploiting Parallelism for CNN Applications on 3D Stacked Processing-In-Memory ArchitectureIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.286806230:3(589-600)Online publication date: 1-Mar-2019
https://doi.org/10.1109/TPDS.2018.2868062

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten