research-article

ProPRAM: exploiting the transparent logic resources in non-volatile memory for near data computing

Authors:

Xiaowei LiAuthors Info & Claims

DAC '15: Proceedings of the 52nd Annual Design Automation Conference

Article No.: 47, Pages 1 - 6

https://doi.org/10.1145/2744769.2744896

Published: 07 June 2015 Publication History

Abstract

Emerging highly-parallel and big data applications have renewed the research interest in Processing-in-Memory (PIM) architectures. However, moving powerful processing unit into the CMOS-incompatible DRAM chips is not cost-effective for large capacity memory. In this work, we observe that Non-Volatile Memory is often naturally incorporated with basic logics like Data Comparison Write or Flip-n-Write modules that are essential for cell SET/REST operation. In contrast to other conventional PIM or Near Data Computing (NDC) architectures, ProPRAM, as a typical Active NVM, abandons the design approach of moving accelerators or customized processors into memory devices, but begins with exploiting the existing resources inside the memory chips to accelerate the key non-compute-intensive functions for emerging big data applications. With slight hardware and architectural modification, we succeed to expose the transparent peripheral logics to the application layer through instruction set extension and exploit them for in-field bulk data processing with limited hardware cost. Compared to conventional CPU-centric systems, ProPRAM achieves an excellent optimization on energy-efficiency (15x) for important data-intensive micro-benchmarks and kernels.

References

[1]

M. Zaharia et al., Spark: Cluster Computing with Working Sets, In Proc. Hot-Cloud, pp. 10--10, 2010.

Digital Library

[2]

J. Gebis, et al., VIRAM-1: A Media-Oriented Vector Processor with Embedded DRAM, In Design Automation Student Design Contenst, 2004.

[3]

P. M. Kogge, EXECUBE-A New Architecture for Scaleable MPPs. In International Conference on Parallel Processing (ICPP), pp. 77--84, 1997.

Digital Library

[4]

M. Gokhale et al., Processing in memory: the terasys massively parallel PIM array, Computer, vol. 28, no. 4, pp. 23--31, Apr. 1995.

Digital Library

[5]

S. Pugsley et al., NDC: Analyzing the Impact of 3D-Stacked Memory+Logic Devices on MapReduce Workloads, In Performance Analysis of Systems and Software (ISPASS), pp. 190--200, 2014.

[6]

K. Lim et al., Thin Servers with Smart Pipes: Designing Accelerators for Memcached, In International Symposium on Computer Architecture (ISCA), pp. 36--47, 2013.

Digital Library

[7]

Q. Guo et al., AC-DIMM: Associative Computing with STT-MRAM, In International Symposium on Computer Architecture (ISCA), pp. 189--200, 2013.

Digital Library

[8]

Cho et al., Flip-N-Write: a simple deterministic technique to improve PRAM write performance, energy and endurance, In Proc. International Symposium on Microarchitecture (MICRO), pp. 347--357, 2009.

Digital Library

[9]

T. Zheng et al., Variable-energy write STT-RAM architecture with bit-wise write-completion monitoring, In Proc. Low Power Electronics and Design (ISLPED), pp. 229--234, 2013.

Digital Library

[10]

F. Bedeschi et al., A Bipolar-selected Phase Change Memory Featuring Multi-level Cell Storage, J. Solid-State Circuits, vol. 44, no. 1, pp. 217--227, 2009.

[11]

Y. Han, Y. Wang, H. Li, and X. Li, Data-aware DRAM refresh to squeeze the margin of retention time in hybrid memory cube, In Proc. Computer-Aided Design (ICCAD), pp. 295--300, 2014.

Digital Library

[12]

D. Kim et al., 3D-MAPS: 3D Massively Parallel Processor with Stacked Memory, In Proc. Solid-State Circuits Conference (ISSCC), pp. 188--190, 2012.

[13]

Hoeju Chung, et al., A 58nm 1.8V 1Gb PRAM with 6.4MB/s program BW, In Proc. Solid-State Circuits Conference (ISSCC), pp. 588--590, 2011.

[14]

B. C. Lee et al., Architecting Phase Change Memory as a Scalable DRAM Alternative, In Proc. International Symposium on Computer Architecture (ISCA), pp. 2--12, 2009.

Digital Library

[15]

V. Seshadri et al., RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization, in Proc. International Symposium on Microarchitecture (MICRO), pp. 185--197, 2013.

Digital Library

[16]

G. Graefe et al., B-tree indexes and CPU caches, In Proc. International Conference on Data Engineering (ICDE), 2001.

Digital Library

[17]

R. Horspool, Practical fast searching in strings, J. Software: Practice and Experience, vol. 10, no. 6, pp. 501--506, 1980.

[18]

J. Chhugani, Efficient Implementation of Sorting on MultiCore SIMD CPU Architecture, In Proc. the VLDB Endowment, vol. 1, no. 2, pp. 1313--1324, 2008.

Digital Library

[19]

R. Ubal et al., Multi2Sim: a simulation framework for CPU-GPU computing, In Proc. Parallel architectures and compilation techniques (PACT), pp. 335--344, 2012.

Digital Library

[20]

X. Dong et al., NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Non-Volatile Memory, IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 31, no. 7, pp. 994--1007, 2012.

Digital Library

[21]

F. Ahmad et al., PUMA: Purdue MapReduce Benchmarks Suite," Technical Report, Purdue ECE Tech Report TR-ECE-12-11.

[22]

M. Guthaus et al., MiBench: A free, commercially representative embedded benchmark suite, In Proc. Workload Characterization (WWC), pp. 3--14, 2001.

Digital Library

[23]

OpenCV library; http://code.opencv.org.

[24]

Pizza&Chili repository, http://pizzachili.dcc.uchile.cl/texts.html

[25]

DARPA Intrusion Detection Data Sets, http://www.ll.mit.edu/mission/

[26]

P. Svärd et al. Evaluation of delta compression techniques for efficient live migration of large virtual machines, in Proc. Virtual execution environments (VEE), pp. 111--120, 2011.

Digital Library

[27]

S. Li et al., McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures, In International Symposium on Microarchitecture (MICRO), pp. 469--480, 2009.

Digital Library

[28]

Free PDK 45nm open-access based PDK for the 45nm technology node. http://www.eda.ncsu.edu/wiki/FreePDK.

Cited By

蒋子(2024)机器学习模型在心血管疾病中的应用智能机器人10.52810/JIR.2024.0031:1(26-38)Online publication date: 7-May-2024
https://doi.org/10.52810/JIR.2024.003
崔佳(2024)基于机器学习和深度学习的蛋白质结构预测研究进展人工智能前沿与应用10.52810/FAAI.2024.0031:1(32-44)Online publication date: 20-May-2024
https://doi.org/10.52810/FAAI.2024.003
金学刘嵩(2024)基于GPS的堆叠串行LSTM组合神经网络目标跟踪方法人工智能前沿与应用10.52810/FAAI.2024.0021:1(16-31)Online publication date: 18-Apr-2024
https://doi.org/10.52810/FAAI.2024.002
Show More Cited By

Index Terms

ProPRAM: exploiting the transparent logic resources in non-volatile memory for near data computing
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory

Recommendations

WOM-Code Solutions for Low Latency and High Endurance in Phase Change Memory
This paper describes a write-once-memory-code phase change memory (WOM-code PCM) architecture for next-generation non-volatile memory applications. Specifically, we address the long latency of the write operation in PCM—attributed to PCM SET—...
A Novel Memory Block Management Scheme for PCM Using WOM-Code
HPCC-CSS-ICESS '15: Proceedings of the 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conf on Embedded Software and Systems

Phase Change Memory (PCM) is a promising DRAM replacement in embedded systems due to its attractive characteristics including low static power consumption and high density. However, long write latency is one of the major drawbacks in current PCM ...
Mellow writes: extending lifetime in resistive memories through selective slow write backs
ISCA'16

Emerging resistive memory technologies, such as PCRAM and ReRAM, have been proposed as promising replacements for DRAM-based main memory, due to their better scalability, low standby power, and non-volatility. However, limited write endurance is a major ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '15: Proceedings of the 52nd Annual Design Automation Conference

June 2015

1204 pages

ISBN:9781450335201

DOI:10.1145/2744769

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Conference

DAC '15

Sponsor:

SIGDA

DAC '15: The 52nd Annual Design Automation Conference 2015

June 7 - 11, 2015

California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
476
Total Downloads

Downloads (Last 12 months)30
Downloads (Last 6 weeks)1

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

蒋子(2024)机器学习模型在心血管疾病中的应用智能机器人10.52810/JIR.2024.0031:1(26-38)Online publication date: 7-May-2024
https://doi.org/10.52810/JIR.2024.003
崔佳(2024)基于机器学习和深度学习的蛋白质结构预测研究进展人工智能前沿与应用10.52810/FAAI.2024.0031:1(32-44)Online publication date: 20-May-2024
https://doi.org/10.52810/FAAI.2024.003
金学刘嵩(2024)基于GPS的堆叠串行LSTM组合神经网络目标跟踪方法人工智能前沿与应用10.52810/FAAI.2024.0021:1(16-31)Online publication date: 18-Apr-2024
https://doi.org/10.52810/FAAI.2024.002
金学王继(2024)滚动轴承故障诊断研究综述人工智能前沿与应用10.52810/FAAI.2024.0011:1(1-15)Online publication date: 12-Apr-2024
https://doi.org/10.52810/FAAI.2024.001
Nguyen HYu JLebdeh MTaouil MHamdioui SCatthoor F(2020)A Classification of Memory-Centric ComputingACM Journal on Emerging Technologies in Computing Systems10.1145/336583716:2(1-26)Online publication date: 30-Jan-2020
https://dl.acm.org/doi/10.1145/3365837
Kishani MAhmadian SAsadi H(2020)A Modeling Framework for Reliability of Erasure Codes in SSD ArraysIEEE Transactions on Computers10.1109/TC.2019.296269169:5(649-665)Online publication date: 1-May-2020
https://doi.org/10.1109/TC.2019.2962691
Yang LLiu WGuan NDutt N(2019)Optimal Application Mapping and Scheduling for Network-on-Chips with Computation in STT-RAM Based RouterIEEE Transactions on Computers10.1109/TC.2018.286474968:8(1174-1189)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.1109/TC.2018.2864749
Xu SChen XWang YHan YQian XLi X(2019)PIMSimIEEE Computer Architecture Letters10.1109/LCA.2018.288575218:1(6-9)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1109/LCA.2018.2885752
Zou KWang YLi HLi XShin Y(2018)XORiMProceedings of the 23rd Asia and South Pacific Design Automation Conference10.5555/3201607.3201696(349-354)Online publication date: 22-Jan-2018
https://dl.acm.org/doi/10.5555/3201607.3201696
Wang YLi HHan YLi X(2018)A Low Overhead In-Network Data Compressor for the Memory Hierarchy of Chip MultiprocessorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2017.272940437:6(1265-1277)Online publication date: Jun-2018
https://doi.org/10.1109/TCAD.2017.2729404
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten