skip to main content
10.1145/3061639.3062242acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Exploiting Parallelism for Convolutional Connections in Processing-In-Memory Architecture

Published: 18 June 2017 Publication History

Abstract

Deep convolutional neural networks (CNNs) are widely adopted in intelligent systems with unprecedented accuracy but at the cost of a substantial amount of data movement. Although recent development in processing-in-memory (PIM) architecture seeks to minimize data movement by computing the data at the dedicated nonvolatile device, how to jointly explore the computation capability of PIM and utilize the highly parallel property of neural network remains a critical issue.
This paper presents Para-CONV, that exploits parallelism for deterministic convolutional connections in PIM architecture. Para-CONV achieveldata-level parallelism for convolutions by fully utilizing the on-chip processing engine (PE) in PIM. The objective is to minimize data movement and data fetching from off-PE DRAM for inter-PE communications. We formulate this data allocation problem as a dynamic programming model and obtain an optimal solution. Para-CONV is evaluated through a set of benchmarks from both real-life CNN applications and synthetic task graphs. The experimental results show that Para-CONV can significantly improve the throughput and reduce data movement compared with the baseline scheme.

References

[1]
A. Graves et al. Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626):471--476, Oct. 2016.
[2]
J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos. Cnvlutin: Ineffectual-neuron-free deep neural network computing. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA '16), pages 1--13, June 2016.
[3]
Y. H. Chen, J. Emer, and V. Sze. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA '16), pages 367--379, June 2016.
[4]
P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA '16), pages 27--39, June 2016.
[5]
J. Chung and T. Shin. Simplifying deep neural networks for neuromorphic architectures. In 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC '16), pages 126:1--126:6, June 2016.
[6]
B. Donyanavard, T. Mück, S. Sarma, and N. Dutt. Sparta: Runtime task allocation for energy efficient heterogeneous many-cores. In Proceedings of the Eleventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES '16), pages 27:1--27:10, 2016.
[7]
M. Horowitz. 1.1 computing's energy problem (and what wecan do about it). In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC '14), pages 10--14, Feb 2014.
[8]
D. Kim, J. Kung, S. Chai, S. Yalamanchili, and S. Mukhopadhyay. Neurocube: A programmable digital neuromorphic architecture with high-density 3D memory. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA '16), pages 380--392, June 2016.
[9]
J. L. Krichmar, P. Coussy, and N. Dutt. Large-scale spiking neural networks using neuromorphic hardware compatible models. ACM Journal on Emerging Technologies in Computing Systems (JETC), 11(4):36:1--36:18, Apr. 2015.
[10]
Y. LeCun, K. Kavukcuoglu, and C. Farabet. Convolutional networks and applications in vision. In Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS '10), pages 253--256, May 2010.
[11]
J. Lee, J. H. Ahn, and K. Choi. Buffered compares: Excavating the hidden parallelism inside DRAM architectures with lightweight logic. In 2016 Design, Automation Test in Europe Conference Exhibition (DATE '16), pages 1243--1248, March 2016.
[12]
S. Li, C. Xu, Q. Zou, J. Zhao, Y. Lu, and Y. Xie. Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories. In 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC '16), pages 173:1--173:6, June 2016.
[13]
N. L. Passos and E. H. M. Sha. Synchronous circuit optimization via multidimensional retiming. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 43(7):507--519, Jul 1996.
[14]
M. Poremba, S. Mittal, D. Li, J. S. Vetter, and Y. Xie. DESTINY: A tool for modeling emerging 3D NVM and eDRAM caches. In 2015 Design, Automation Test in Europe Conference Exhibition (DATE '15), pages 1543--1546, March 2015.
[15]
L. Song, Y. Wang, Y. Han, X. Zhao, B. Liu, and X. Li. C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization. In 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC '16), pages 123:1--123:6, June 2016.
[16]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR '15), pages 1--9, June 2015.
[17]
W. Wen, C. Wu, Y. Wang, K. Nixon, Q. Wu, M. Barnell, H. Li, and Y. Chen. A new learning method for inference accuracy, core occupation, and performance co-optimization on TrueNorthchip. In 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC '16), pages 18:1--18:6, June 2016.
[18]
L. Xia, T. Tang, W. Huangfu, M. Cheng, X. Yin, B. Li, Y. Wang, and H. Yang. Switched by input: Power efficient structure for RRAM-based convolutional neural network. In 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC '16), pages 125:1--125:6, June 2016.
[19]
C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '15), pages 161--170, New York, NY, USA, 2015. ACM.

Cited By

View all
  • (2022)DDAM: Data Distribution-Aware Mapping of CNNs on Processing-In-Memory SystemsACM Transactions on Design Automation of Electronic Systems10.1145/357619628:3(1-30)Online publication date: 15-Dec-2022
  • (2021)Towards efficient allocation of graph convolutional networks on hybrid computation-in-memory architectureScience China Information Sciences10.1007/s11432-020-3248-y64:6Online publication date: 10-May-2021
  • (2019)Exploiting Parallelism for CNN Applications on 3D Stacked Processing-In-Memory ArchitectureIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.286806230:3(589-600)Online publication date: 1-Mar-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017
June 2017
533 pages
ISBN:9781450349277
DOI:10.1145/3061639
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Near-data processing
  2. neuromorphic computing
  3. non-volatile memory
  4. parallel computing
  5. scheduling

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

DAC '17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25
62nd ACM/IEEE Design Automation Conference
June 22 - 26, 2025
San Francisco , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)DDAM: Data Distribution-Aware Mapping of CNNs on Processing-In-Memory SystemsACM Transactions on Design Automation of Electronic Systems10.1145/357619628:3(1-30)Online publication date: 15-Dec-2022
  • (2021)Towards efficient allocation of graph convolutional networks on hybrid computation-in-memory architectureScience China Information Sciences10.1007/s11432-020-3248-y64:6Online publication date: 10-May-2021
  • (2019)Exploiting Parallelism for CNN Applications on 3D Stacked Processing-In-Memory ArchitectureIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.286806230:3(589-600)Online publication date: 1-Mar-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media