ABSTRACT
In this paper, we propose a Flexible processing-in-DRAM framework named FlexiDRAM that supports the efficient implementation of complex bulk bitwise operations. This framework is developed on top of a new reconfigurable in-DRAM accelerator that leverages the analog operation of DRAM sub-arrays and elevates it to implement XOR2-MAJ3 operations between operands stored in the same bit-line. FlexiDRAM first generates an efficient XOR-MAJ representation of the desired logic and then appropriately allocates DRAM rows to the operands to execute any in-DRAM computation. We develop ISA and software support required to compute in-DRAM operation. FlexiDRAM transforms current memory architecture to a massively parallel computational unit and can be leveraged to significantly reduce the latency and energy consumption of complex workloads. Our extensive circuit-to-architecture simulation results show that averaged across two well-known deep learning workloads, FlexiDRAM achieves ∼ 15 × energy-saving and 13 × speedup over the GPU outperforming recent processing-in-DRAM platforms.
- 2011. NCSU EDA FreePDK45. http://www.eda.ncsu.edu/wiki/FreePDK45:ContentsGoogle Scholar
- Mustafa F Ali 2019. In-memory low-cost bit-serial addition using commodity DRAM technology. IEEE TCAS I: Regular Papers 67 (2019), 155–165.Google ScholarCross Ref
- Mohamed W Allam 2000. High-speed dynamic logic styles for scaled-down CMOS and MTCMOS technologies. In ISLPED. ACM, 155–160.Google Scholar
- Shaahin Angizi and Deliang Fan. 2019. Graphide: A graph processing accelerator leveraging in-dram-computing. In GLSVLSI. 45–50.Google ScholarDigital Library
- Shaahin Angizi and Deliang Fan. 2019. Redram: A reconfigurable processing-in-dram platform for accelerating bulk bit-wise operations. In ICCAD. IEEE, 1–8.Google Scholar
- Nathan Binkert 2011. The gem5 simulator. ACM SIGARCH computer architecture news 39 (2011), 1–7.Google Scholar
- Robert Brayton and Alan Mishchenko. 2010. ABC: An academic industrial-strength verification tool. In International Conference on Computer Aided Verification. Springer, 24–40.Google ScholarDigital Library
- Zhufei Chu 2019. Structural rewriting in XOR-majority graphs. In ASP-DAC. 663–668.Google Scholar
- João Dinis Ferreira, , 2021. pluto: In-dram lookup tables to enable massively parallel general-purpose computation. arXiv preprint arXiv:2104.07699(2021).Google Scholar
- Nastaran Hajinazar 2021. SIMDRAM: a framework for bit-serial SIMD processing using DRAM. In asplos. 329–345.Google Scholar
- Tadahiro Kuroda 1996. A 0.9-V, 150-MHz, 10-mW, 4 mm/sup 2/, 2-D discrete cosine transform core processor with variable threshold-voltage (VT) scheme. IEEE JSSC 31(1996), 1770–1779.Google ScholarCross Ref
- Shuangchen Li 2017. Drisa: A dram-based reconfigurable in-situ accelerator. In MICRO. IEEE, 288–301.Google Scholar
- Giulia Meuli 2022. Xor-And-Inverter Graphs for Quantum Compilation. npj Quantum Information 8, 1 (2022), 1–11.Google Scholar
- Shin’ichiro Mutoh 1995. 1-V power supply high-speed digital circuit technology with multithreshold-voltage CMOS. IEEE JSSC 30, 8 (1995), 847–854.Google ScholarCross Ref
- Keivan Navi 2009. A novel low-power full-adder cell with new technique in designing logical gates based on static CMOS inverter. Microelectronics Journal 40 (2009), 1441–1448.Google ScholarDigital Library
- J Thomas Pawlowski. 2011. Hybrid memory cube (HMC). In 2011 IEEE Hot chips 23 symposium (HCS). IEEE, 1–24.Google Scholar
- Vivek Seshadri 2013. RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization. In Micro. 185–197.Google Scholar
- Vivek Seshadri 2017. Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology. In Micro. ACM, 273–287.Google Scholar
- George Sideris. 1973. Intel 1103-MOS memory that defied cores. Electronics 46(1973), 108–113.Google Scholar
- Mathias Soeken 2017. Exact synthesis of majority-inverter graphs and its applications. IEEE TCAD 36(2017), 1842–1855.Google Scholar
Recommendations
Exploiting Refresh Effect of DRAM Read Operations: A Practical Approach to Low-Power Refresh
Dynamic random access memory (DRAM) requires periodic refresh operations to retain its data. In practice, DRAM retention times are normally distributed from 64 ms to several seconds. However, the conventional refresh method uses 64 ms as the refresh ...
VRL-DRAM: improving DRAM performance via variable refresh latency
DAC '18: Proceedings of the 55th Annual Design Automation ConferenceA DRAM chip requires periodic refresh operations to prevent data loss due to charge leakage in DRAM cells. Refresh operations incur significant performance overhead as a DRAM bank/rank becomes unavailable to service access requests while being ...
CARAM: A Content-Aware Hybrid PCM/DRAM Main Memory System Framework
Network and Parallel ComputingAbstractThe emergence of Phase-Change Memory (PCM) provides opportunities for directly connecting persistent memory to main memory bus. While PCM achieves high read throughput and low standby power, the critical concerns are its poor write performance and ...
Comments