skip to main content
10.1145/2744769.2744896acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

ProPRAM: exploiting the transparent logic resources in non-volatile memory for near data computing

Published: 07 June 2015 Publication History

Abstract

Emerging highly-parallel and big data applications have renewed the research interest in Processing-in-Memory (PIM) architectures. However, moving powerful processing unit into the CMOS-incompatible DRAM chips is not cost-effective for large capacity memory. In this work, we observe that Non-Volatile Memory is often naturally incorporated with basic logics like Data Comparison Write or Flip-n-Write modules that are essential for cell SET/REST operation. In contrast to other conventional PIM or Near Data Computing (NDC) architectures, ProPRAM, as a typical Active NVM, abandons the design approach of moving accelerators or customized processors into memory devices, but begins with exploiting the existing resources inside the memory chips to accelerate the key non-compute-intensive functions for emerging big data applications. With slight hardware and architectural modification, we succeed to expose the transparent peripheral logics to the application layer through instruction set extension and exploit them for in-field bulk data processing with limited hardware cost. Compared to conventional CPU-centric systems, ProPRAM achieves an excellent optimization on energy-efficiency (15x) for important data-intensive micro-benchmarks and kernels.

References

[1]
M. Zaharia et al., Spark: Cluster Computing with Working Sets, In Proc. Hot-Cloud, pp. 10--10, 2010.
[2]
J. Gebis, et al., VIRAM-1: A Media-Oriented Vector Processor with Embedded DRAM, In Design Automation Student Design Contenst, 2004.
[3]
P. M. Kogge, EXECUBE-A New Architecture for Scaleable MPPs. In International Conference on Parallel Processing (ICPP), pp. 77--84, 1997.
[4]
M. Gokhale et al., Processing in memory: the terasys massively parallel PIM array, Computer, vol. 28, no. 4, pp. 23--31, Apr. 1995.
[5]
S. Pugsley et al., NDC: Analyzing the Impact of 3D-Stacked Memory+Logic Devices on MapReduce Workloads, In Performance Analysis of Systems and Software (ISPASS), pp. 190--200, 2014.
[6]
K. Lim et al., Thin Servers with Smart Pipes: Designing Accelerators for Memcached, In International Symposium on Computer Architecture (ISCA), pp. 36--47, 2013.
[7]
Q. Guo et al., AC-DIMM: Associative Computing with STT-MRAM, In International Symposium on Computer Architecture (ISCA), pp. 189--200, 2013.
[8]
Cho et al., Flip-N-Write: a simple deterministic technique to improve PRAM write performance, energy and endurance, In Proc. International Symposium on Microarchitecture (MICRO), pp. 347--357, 2009.
[9]
T. Zheng et al., Variable-energy write STT-RAM architecture with bit-wise write-completion monitoring, In Proc. Low Power Electronics and Design (ISLPED), pp. 229--234, 2013.
[10]
F. Bedeschi et al., A Bipolar-selected Phase Change Memory Featuring Multi-level Cell Storage, J. Solid-State Circuits, vol. 44, no. 1, pp. 217--227, 2009.
[11]
Y. Han, Y. Wang, H. Li, and X. Li, Data-aware DRAM refresh to squeeze the margin of retention time in hybrid memory cube, In Proc. Computer-Aided Design (ICCAD), pp. 295--300, 2014.
[12]
D. Kim et al., 3D-MAPS: 3D Massively Parallel Processor with Stacked Memory, In Proc. Solid-State Circuits Conference (ISSCC), pp. 188--190, 2012.
[13]
Hoeju Chung, et al., A 58nm 1.8V 1Gb PRAM with 6.4MB/s program BW, In Proc. Solid-State Circuits Conference (ISSCC), pp. 588--590, 2011.
[14]
B. C. Lee et al., Architecting Phase Change Memory as a Scalable DRAM Alternative, In Proc. International Symposium on Computer Architecture (ISCA), pp. 2--12, 2009.
[15]
V. Seshadri et al., RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization, in Proc. International Symposium on Microarchitecture (MICRO), pp. 185--197, 2013.
[16]
G. Graefe et al., B-tree indexes and CPU caches, In Proc. International Conference on Data Engineering (ICDE), 2001.
[17]
R. Horspool, Practical fast searching in strings, J. Software: Practice and Experience, vol. 10, no. 6, pp. 501--506, 1980.
[18]
J. Chhugani, Efficient Implementation of Sorting on MultiCore SIMD CPU Architecture, In Proc. the VLDB Endowment, vol. 1, no. 2, pp. 1313--1324, 2008.
[19]
R. Ubal et al., Multi2Sim: a simulation framework for CPU-GPU computing, In Proc. Parallel architectures and compilation techniques (PACT), pp. 335--344, 2012.
[20]
X. Dong et al., NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Non-Volatile Memory, IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 31, no. 7, pp. 994--1007, 2012.
[21]
F. Ahmad et al., PUMA: Purdue MapReduce Benchmarks Suite," Technical Report, Purdue ECE Tech Report TR-ECE-12-11.
[22]
M. Guthaus et al., MiBench: A free, commercially representative embedded benchmark suite, In Proc. Workload Characterization (WWC), pp. 3--14, 2001.
[23]
OpenCV library; http://code.opencv.org.
[24]
Pizza&Chili repository, http://pizzachili.dcc.uchile.cl/texts.html
[25]
DARPA Intrusion Detection Data Sets, http://www.ll.mit.edu/mission/
[26]
P. Svärd et al. Evaluation of delta compression techniques for efficient live migration of large virtual machines, in Proc. Virtual execution environments (VEE), pp. 111--120, 2011.
[27]
S. Li et al., McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures, In International Symposium on Microarchitecture (MICRO), pp. 469--480, 2009.
[28]
Free PDK 45nm open-access based PDK for the 45nm technology node. http://www.eda.ncsu.edu/wiki/FreePDK.

Cited By

View all

Index Terms

  1. ProPRAM: exploiting the transparent logic resources in non-volatile memory for near data computing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DAC '15: Proceedings of the 52nd Annual Design Automation Conference
    June 2015
    1204 pages
    ISBN:9781450335201
    DOI:10.1145/2744769
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 June 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    DAC '15
    Sponsor:
    DAC '15: The 52nd Annual Design Automation Conference 2015
    June 7 - 11, 2015
    California, San Francisco

    Acceptance Rates

    Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

    Upcoming Conference

    DAC '25
    62nd ACM/IEEE Design Automation Conference
    June 22 - 26, 2025
    San Francisco , CA , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)30
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)机器学习模型在心血管疾病中的应用智能机器人10.52810/JIR.2024.0031:1(26-38)Online publication date: 7-May-2024
    • (2024)基于机器学习和深度学习的蛋白质结构预测研究进展人工智能前沿与应用10.52810/FAAI.2024.0031:1(32-44)Online publication date: 20-May-2024
    • (2024)基于GPS的堆叠串行LSTM组合神经网络目标跟踪方法人工智能前沿与应用10.52810/FAAI.2024.0021:1(16-31)Online publication date: 18-Apr-2024
    • (2024)滚动轴承故障诊断研究综述人工智能前沿与应用10.52810/FAAI.2024.0011:1(1-15)Online publication date: 12-Apr-2024
    • (2020)A Classification of Memory-Centric ComputingACM Journal on Emerging Technologies in Computing Systems10.1145/336583716:2(1-26)Online publication date: 30-Jan-2020
    • (2020)A Modeling Framework for Reliability of Erasure Codes in SSD ArraysIEEE Transactions on Computers10.1109/TC.2019.296269169:5(649-665)Online publication date: 1-May-2020
    • (2019)Optimal Application Mapping and Scheduling for Network-on-Chips with Computation in STT-RAM Based RouterIEEE Transactions on Computers10.1109/TC.2018.286474968:8(1174-1189)Online publication date: 1-Aug-2019
    • (2019)PIMSimIEEE Computer Architecture Letters10.1109/LCA.2018.288575218:1(6-9)Online publication date: 1-Jan-2019
    • (2018)XORiMProceedings of the 23rd Asia and South Pacific Design Automation Conference10.5555/3201607.3201696(349-354)Online publication date: 22-Jan-2018
    • (2018)A Low Overhead In-Network Data Compressor for the Memory Hierarchy of Chip MultiprocessorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2017.272940437:6(1265-1277)Online publication date: Jun-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media