ABSTRACT
Many important applications trigger bulk bitwise operations, i.e., bitwise operations on large bit vectors. In fact, recent works design techniques that exploit fast bulk bitwise operations to accelerate databases (bitmap indices, BitWeaving) and web search (BitFunnel). Unfortunately, in existing architectures, the throughput of bulk bitwise operations is limited by the memory bandwidth available to the processing unit (e.g., CPU, GPU, FPGA, processing-in-memory).
To overcome this bottleneck, we propose Ambit, an Accelerator-in-Memory for bulk bitwise operations. Unlike prior works, Ambit exploits the analog operation of DRAM technology to perform bitwise operations completely inside DRAM, thereby exploiting the full internal DRAM bandwidth. Ambit consists of two components. First, simultaneous activation of three DRAM rows that share the same set of sense amplifiers enables the system to perform bitwise AND and OR operations. Second, with modest changes to the sense amplifier, the system can use the inverters present inside the sense amplifier to perform bitwise NOT operations. With these two components, Ambit can perform any bulk bitwise operation efficiently inside DRAM. Ambit largely exploits existing DRAM structure, and hence incurs low cost on top of commodity DRAM designs (1% of DRAM chip area). Importantly, Ambit uses the modern DRAM interface without any changes, and therefore it can be directly plugged onto the memory bus.
Our extensive circuit simulations show that Ambit works as expected even in the presence of significant process variation. Averaged across seven bulk bitwise operations, Ambit improves performance by 32X and reduces energy consumption by 35X compared to state-of-the-art systems. When integrated with Hybrid Memory Cube (HMC), a 3D-stacked DRAM with a logic layer, Ambit improves performance of bulk bitwise operations by 9.7X compared to processing in the logic layer of the HMC. Ambit improves the performance of three real-world data-intensive applications, 1) database bitmap indices, 2) BitWeaving, a technique to accelerate database scans, and 3) bit-vector-based implementation of sets, by 3X-7X compared to a state-of-the-art baseline using SIMD optimizations. We describe four other applications that can benefit from Ambit, including a recent technique proposed to speed up web search. We believe that large performance and energy improvements provided by Ambit can enable other applications to use bulk bitwise operations.
- Belly Card Engineering. https://tech.bellycard.com/.Google Scholar
- bitmapist. https://github.com/Doist/bitmapist.Google Scholar
- FastBit: An Efficient Compressed Bitmap Index Technology. https://sdm.lbl.gov/fastbit/.Google Scholar
- GeForce GTX 745. http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-745-oem/specifications.Google Scholar
- High Bandwidth Memory DRAM. http://www.jedec.org/standards-documents/docs/jesd235.Google Scholar
- Hybrid Memory Cube Specification 2.0. http://www.hybridmemorycube.org/files/SiteDownloads/HMC-30G-VSR_HMCC_Specification_Rev2.0_Public.pdf.Google Scholar
- 6th Generation Intel Core Processor Family Datasheet. http://www.intel.com/content/www/us/en/processors/core/desktop-6th-gen-core-family-datasheet-vol-1.html.Google Scholar
- Using Bitmap Indexes in Data Warehouses. https://docs.oracle.com/cd/B28359_01/server.111/b28313/indexes.htm.Google Scholar
- Predictive Technology Model. http://ptm.asu.edu/.Google Scholar
- Redis - bitmaps. http://redis.io/topics/data-types-intro.Google Scholar
- rlite. https://github.com/seppo0010/rlite.Google Scholar
- Spool. http://www.getspool.com/.Google Scholar
- std::set, std::bitset. http://en.cppreference.com/w/cpp/.Google Scholar
- DRAM Power Model. https://www.rambus.com/energy/, 2010.Google Scholar
- S. Aga, S. Jeloka, A. Subramaniyan, S. Narayanasamy D. Blaauw, and R. Das. Compute Caches. In HPCA, 2017.Google ScholarCross Ref
- J. Ahn, S. Hong, S. Yoo, O. Mutlu, and K. Choi. A Scalable Processing-in-memory Accelerator for Parallel Graph Processing. In ISCA, 2015. Google ScholarDigital Library
- J. Ahn, S. Yoo, O. Mutlu, and K. Choi. PIM-enabled Instructions: A Low-overhead, Locality-aware Processing-in-memory Architecture. In ISCA, 2015. Google ScholarDigital Library
- A. Akerib, O. Agam, E. Ehrman, and M. Meyassed. Using Storage Cells to Perform Computation. US Patent 8908465, 2014.Google Scholar
- A. Akerib and E. Ehrman. In-memory Computational Device. US Patent 9653166, 2015.Google Scholar
- M. Alser, H. Hassan, H. Xin, O. Ergin, O. Mutlu, and C. Alkan. GateKeeper: A New Hardware Architecture for Accelerating Pre-Alignment in DNA Short Read Mapping. Bioinformatics, 2017.Google Scholar
- G. Benson, Y. Hernandez, and J. Loving. A Bit-Parallel, General Integer-Scoring Sequence Alignment Algorithm. In CPM, 2013.Google Scholar
- N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. The Gem5 Simulator. SIGARCH CAN, 2011. Google ScholarDigital Library
- B. H. Bloom. Space/time Trade-offs in Hash Coding with Allowable Errors. ACM Communications, 13, July 1970. Google ScholarDigital Library
- A. Boroumand, S. Ghose, B. Lucia, K. Hsieh, K. Malladi, H. Zheng, and O. Mutlu. LazyPIM: An Efficient Cache Coherence Mechanism for Processing-in-Memory. IEEE CAL, 2017.Google ScholarCross Ref
- A. Boroumand, S. Ghose, M. Patel, H. Hassan, B. Lucia, N. Hajinazar, K. Hsieh, K. T. Malladi, H. Zheng, and O. Mutlu. LazyPIM: Efficient Support for Cache Coherence in Processing-in-Memory Architectures. arXiv preprint arXiv:1706.03162, 2017.Google Scholar
- C.-Y. Chan and Y. E. Ioannidis. Bitmap Index Design and Evaluation. In SIGMOD, 1998. Google ScholarDigital Library
- K. K. Chang, D. Lee, Z. Chisti, A. R. Alameldeen, C. Wilkerson, Y. Kim, and O. Mutlu. Improving DRAM Performance by Parallelizing Refreshes with Accesses. In HPCA, 2014.Google ScholarCross Ref
- K. K. Chang, A. Kashyap, H. Hassan, S. Ghose, K. Hsieh, D. Lee, T. Li, G. Pekhimenko, S. Khan, and O. Mutlu. Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization. In SIGMETRICS, 2016. Google ScholarDigital Library
- K. K. Chang, P. J. Nair, D. Lee, S. Ghose, M. K. Qureshi, and O. Mutlu. Low-cost Inter-linked Subarrays (LISA): Enabling Fast Inter-subarray Data Movement in DRAM. In HPCA, 2016.Google ScholarCross Ref
- K. K. Chang, A. G. Yaălikçi, S. Ghose, A. Agrawal, N. Chatterjee, A. Kashyap, D. Lee, M. O'Connor, H. Hassan, and O. Mutlu. Understanding Reduced-voltage Operation in Modern DRAM Devices: Experimental Characterization, Analysis, and Mechanisms. SIGMETRICS, 2017. Google ScholarDigital Library
- J. Corbet, A. Rubini, and G. Kroah-Hartman. Linux Device Drivers, page 445. O'Reilly Media, 2005. Google ScholarDigital Library
- D. Denir, I. AbdelRahman, L. He, and Y. Gao. Audience Insights Query Engine. https://www.facebook.com/business/news/audience-insights.Google Scholar
- P. Dlugosch, D. Brown, P. Glendenning, M. Leventhal, and H. Noyes. An Efficient and Scalable Semiconductor Architecture for Parallel Automata Processing. IEEE TPDS, 2014.Google ScholarCross Ref
- J. Draper, J. Chame, M. Hall, C. Steele, T. Barrett, J. LaCoss, J. Granacki, J. Shin, C. Chen, C. W. Kang, I. Kim, and G. Daglikoca. The Architecture of the DIVA Processing-in-memory Chip. In ICS, 2002. Google ScholarDigital Library
- D. Elliott, M. Stumm, W. M. Snelgrove, C. Cojocaru, and R. McKenzie. Computational RAM: Implementing Processors in Memory. IEEE DT, 1999. Google ScholarDigital Library
- C. F. Falconer, C. P. Mozak, and A. J. Normal. Suppressing Power Supply Noise Using Data Scrambling in Double Data Rate Memory Systems. US Patent 8503678, 2009.Google Scholar
- A. Farmahini-Farahani, J. H. Ahn, K. Morrow, and N. S. Kim. NDA: Near-DRAM Acceleration Architecture Leveraging Commodity DRAM Devices and Standard Memory Modules. In HPCA, 2015.Google ScholarCross Ref
- B. B. Fraguela, J. Renau, P. Feautrier, D. Padua, and J. Torrellas. Programming the FlexRAM Parallel Intelligent Memory System. In PPoPP, 2003. Google ScholarDigital Library
- M. Gokhale, B. Holmes, and K. Iobst. Processing in Memory: The Terasys Massively Parallel PIM Array. Computer, 1995. Google ScholarDigital Library
- B. Goodwin, M. Hopcroft, D. Luu, A. Clemmer, M. Curmei, S. Elnikety, and Y. He. BitFunnel: Revisiting Signatures for Search. In SIGIR, 2017. Google ScholarDigital Library
- L. J. Guibas and R. Sedgewick. A Dichromatic Framework for Balanced Trees. In SFCS, 1978. Google ScholarDigital Library
- Q. Guo, N. Alachiotis, B. Akin, F. Sadi, G. Xu, T. M. Low, L. Pileggi, J. C. Hoe, and F. Franchetti. 3D-stacked Memory-side Acceleration: Accelerator and System Design. In WoNDP, 2013.Google Scholar
- R. W. Hamming. Error Detecting and Error Correcting Codes. BSTJ, 1950.Google ScholarCross Ref
- J.-W. Han, C.-S. Park, D.-H. Ryu, and E.-S. Kim. Optical Image Encryption Based on XOR Operations. SPIE OE, 1999.Google ScholarCross Ref
- H. Hassan, G. Pekhimenko, N. Vijaykumar, V. Seshadri, D. Lee, O. Ergin, and O. Mutlu. ChargeCache: Reducing DRAM Latency by Exploiting Row Access Locality. In HPCA, 2016.Google ScholarCross Ref
- H. Hassan, N. Vijaykumar, S. Khan, S. Ghose, K. Chang, G. Pekhimenko, D. Lee, O. Ergin, and O. Mutlu. SoftMC: A Flexible and Practical Open-source Infrastructure for Enabling Experimental DRAM Studies. In HPCA, 2017.Google ScholarCross Ref
- K. Hsieh, E. Ebrahimi, G. Kim, N. Chatterjee, M. O'Connor, N. Vijaykumar, O. Mutlu, and S. W. Keckler. Transparent Offloading and Mapping (TOM): Enabling Programmer-transparent Near-data Processing in GPU Systems. In ISCA, 2016. Google ScholarDigital Library
- K. Hsieh, S. Khan, N. Vijaykumar, K. K. Chang, A. Boroumand, S. Ghose, and O. Mutlu. Accelerating Pointer Chasing in 3D-stacked Memory: Challenges, Mechanisms, Evaluation. In ICCD, 2016.Google ScholarCross Ref
- Intel. Intel Instruction Set Architecture Extensions. https://software.intel.com/en-us/intel-isa-extensions.Google Scholar
- K. Itoh. VLSI Memory Chip Design, volume 5. Springer Science & Business Media, 2013.Google Scholar
- J. Jeddeloh and B. Keeth. Hybrid Memory Cube: New DRAM Architecture Increases Density and Performance. In VLSIT, 2012.Google ScholarCross Ref
- JEDEC. DDR3 SDRAM Standard, JESD79-3D. http://www.jedec.org/sites/default/files/docs/JESD79-3D.pdf, 2009.Google Scholar
- H. Kang and S. Hong. One-Transistor Type DRAM. US Patent 7701751, 2009.Google Scholar
- M. Kang, M.-S. Keel, N. R. Shanbhag, S. Eilert, and K. Curewitz. An Energy-efficient VLSI Architecture for Pattern Recognition via Deep Embedding of Computation in SRAM. In ICASSP, 2014.Google ScholarCross Ref
- U. Kang, H.-s. Yu, C. Park, H. Zheng, J. Halbert, K. Bains, S. Jang, and J. S. Choi. Co-architecting Controllers and DRAM to Enhance DRAM Process Scaling. In The Memory Forum, 2014.Google Scholar
- Y. Kang, W. Huang, S.-M. Yoo, D. Keen, Z. Ge, V. Lam, P. Pattnaik, and J. Torrellas. FlexRAM: Toward an Advanced Intelligent Memory System. In ICCD, 1999.Google ScholarCross Ref
- B. Keeth, R. J. Baker, B. Johnson, and F. Lin. DRAM Circuit Design: Fundamental and High-Speed Topics. Wiley-IEEE Press, 2007. Google ScholarDigital Library
- J. S. Kim, D. Senol, H. Xin, D. Lee, S. Ghose, M. Alser, H. Hassan, O. Ergin, C. Alkan, and O. Mutlu. GRIM-filter: Fast Seed Filtering in Read Mapping Using Emerging Memory Technologies. arXiv preprint arXiv:1708.04329, 2017.Google Scholar
- Y. Kim, V. Seshadri, D. Lee, J. Liu, and O. Mutlu. A Case for Exploiting Subarray-level Parallelism (SALP) in DRAM. In ISCA, 2012. Google ScholarDigital Library
- Y. Kim, W. Yang, and O. Mutlu. Ramulator: A Fast and Extensible DRAM Simulator. IEEE CAL, 2016. Google ScholarDigital Library
- D. E. Knuth. The Art of Computer Programming. Fascicle 1: Bitwise Tricks & Techniques; Binary Decision Diagrams, 2009. Google ScholarDigital Library
- P. M. Kogge. EXECUBE: A New Architecture for Scaleable MPPs. In ICPP, 1994. Google ScholarDigital Library
- S. Kvatinsky, A. Kolodny, U. C. Weiser, and E. G. Friedman. Memristor-based IMPLY Logic Design Procedure. In ICCD, 2011. Google ScholarDigital Library
- S. Kvatinsky, D. Belousov, S. Liman, G. Satat, N. Wald, E. G. Friedman, A. Kolodny, and U. C. Weiser. MAGIC ---Memristor-Aided Logic. IEEE TCAS II: Express Briefs, 2014.Google ScholarCross Ref
- S. Kvatinsky, G. Satat, N. Wald, E. G. Friedman, A. Kolodny, and U. C. Weiser. Memristor-Based Material Implication (IMPLY) Logic: Design Principles and Methodologies. IEEE TVLSI, 2014.Google ScholarCross Ref
- B. C. Lee, E. Ipek, O. Mutlu, and D. Burger. Architecting Phase Change Memory As a Scalable DRAM Alternative. In ISCA, 2009. Google ScholarDigital Library
- D. Lee, Y. Kim, G. Pekhimenko, S. Khan, V. Seshadri, K. K. Chang, and O. Mutlu. Adaptive-Latency DRAM: Optimizing DRAM Timing for the Common-case. In HPCA, 2015.Google ScholarCross Ref
- D. Lee, Y. Kim, V. Seshadri, J. Liu, L. Subramanian, and O. Mutlu. Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture. In HPCA, 2013. Google ScholarDigital Library
- D. Lee, F. Hormozdiari, H. Xin, F. Hach, O. Mutlu, and C. Alkan. Fast and Accurate Mapping of Complete Genomics Reads. Methods, 2015.Google Scholar
- D. Lee, S. Ghose, G. Pekhimenko, S. Khan, and O. Mutlu. Simultaneous Multi-Layer Access: Improving 3D-Stacked Memory Bandwidth at Low Cost. ACM TACO, 2016. Google ScholarDigital Library
- D. Lee, S. Khan, L. Subramanian, S. Ghose, R. Ausavarungnirun, G. Pekhimenko, V. Seshadri, and O. Mutlu. Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms. In SIGMETRICS, 2017. Google ScholarDigital Library
- Y. Levy, J. Bruck, Y. Cassuto, E. G. Friedman, A. Kolodny, E. Yaakobi, and S. Kvatinsky. Logic Operations in Memory Using a Memristive Akers Array. Microelectronics Journal, 2014. Google ScholarDigital Library
- H. Li and R. Durbin. Fast and Accurate Long-read Alignment with Burrows-Wheeler Transform. Bioinformatics, 2010. Google ScholarDigital Library
- S. Li, C. Xu, Q. Zou, J. Zhao, Y. Lu, and Y. Xie. Pinatubo: A Processing-in-Memory Architecture for Bulk Bitwise Operations in Emerging Non-Volatile Memories. In DAC, 2016. Google ScholarDigital Library
- Y. Li and J. M. Patel. BitWeaving: Fast Scans for Main Memory Data Processing. In SIGMOD, 2013. Google ScholarDigital Library
- Y. Li and J. M. Patel. WideTable: An Accelerator for Analytical Data Processing. Proc. VLDB Endow., 2014. Google ScholarDigital Library
- E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. NVIDIA Tesla: A Unified Graphics and Computing Architecture. IEEE Micro, 2008. Google ScholarDigital Library
- J. Liu, B. Jaiyen, R. Veras, and O. Mutlu. RAIDR: Retention-Aware Intelligent DRAM Refresh. In ISCA, 2012. Google ScholarDigital Library
- J. Liu, B. Jaiyen, Y. Kim, C. Wilkerson, and O. Mutlu. An Experimental Study of Data Retention Behavior in Modern DRAM Devices: Implications for Retention Time Profiling Mechanisms. In ISCA, 2013. Google ScholarDigital Library
- Z. Liu, I. Calciu, M. Herlihy and O. Mutlu. Concurrent Data Structures for Near-Memory Computing. In SPAA, 2017. Google ScholarDigital Library
- S.-L. Lu, Y.-C. Lin, and C.-L. Yang. Improving DRAM Latency with Dynamic Asymmetric Subarray. In MICRO, 2015. Google ScholarDigital Library
- R. E. Lyons and W. Vanderkulk. The Use of Triple-Modular Redundancy to Improve Computer Reliability. IBM JRD, 1962. Google ScholarDigital Library
- S. A. Manavski. CUDA Compatible GPU as an Efficient Hardware Accelerator for AES Cryptography. In ICSPC, 2007.Google ScholarCross Ref
- G. Myers. A Fast Bit-vector Algorithm for Approximate String Matching Based on Dynamic Programming. JACM, 1999. Google ScholarDigital Library
- E. O'Neil, P. O'Neil, and K. Wu. Bitmap Index Design Choices and Their Performance Implications. In IDEAS, 2007. Google ScholarDigital Library
- M. Oskin, F. T. Chong, and T. Sherwood. Active Pages: A Computation Model for Intelligent Memory. In ISCA, 1998. Google ScholarDigital Library
- M. Patel, J. S. Kim, and O. Mutlu. The Reach Profiler (REAPER): Enabling the Mitigation of DRAM Retention Failures via Profiling at Aggressive Conditions. In ISCA, 2017. Google ScholarDigital Library
- D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick. A Case for Intelligent RAM. IEEE Micro, 1997. Google ScholarDigital Library
- A. Pattnaik, X. Tang, A. Jog, O. Kayıran, A. K. Mishra, M. T. Kandemir, O. Mutlu, and C. R. Das. SchEduling Techniques for GPU Architectures with Processing-in-memory Capabilities. In PACT, 2016. Google ScholarDigital Library
- A. Peleg and U. Weiser. MMX Technology Extension to the Intel Architecture. IEEE Micro, 1996. Google ScholarDigital Library
- K. R. Rasmussen, J. Stoye, and E. W. Myers. Efficient Q-gram Filters for Finding All ε-matches Over a Given Length. JCB, 2006.Google ScholarCross Ref
- P. J. Restle, J. W. Park, and B. F. Lloyd. DRAM Variable Retention Time. In IEDM, 1992.Google ScholarCross Ref
- R. L. Rivest, L. Adleman, and M. L. Dertouzos. On Data Banks and Privacy Homomorphisms. FSC, 1978.Google Scholar
- S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory Access Scheduling. In ISCA, 2000. Google ScholarDigital Library
- S. M. Rumble, P. Lacroute, A. V. Dalca, M. Fiume, A. Sidow, and M. Brudno. SHRiMP: Accurate Mapping of Short Color-space Reads. PLOS Computational Biology, 2009.Google ScholarCross Ref
- V. Seshadri and O. Mutlu. Simple Operations in Memory to Reduce Data Movement, ADCOM, Chapter 5. Elsevier, 2017.Google Scholar
- V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. RowClone: Fast and Energy-efficient In-DRAM Bulk Data Copy and Initialization. In MICRO, 2013. Google ScholarDigital Library
- V. Seshadri, A. Bhowmick, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. The Dirty-block Index. In ISCA, 2014. Google ScholarDigital Library
- V. Seshadri, K. Hsieh, A. Boroumand, D. Lee, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry. Fast Bulk Bitwise AND and OR in DRAM. IEEE CAL, 2015. Google ScholarDigital Library
- V. Seshadri, T. Mullins, A. Boroumand, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. Gather-scatter DRAM: In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses. In MICRO, 2015. Google ScholarDigital Library
- V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry. Buddy-RAM: Improving the Performance and Efficiency of Bulk Bitwise Operations Using DRAM. arXiv preprint arXiv:1611.09988, 2016.Google Scholar
- A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars. In ISCA, 2016. Google ScholarDigital Library
- D. E. Shaw, S. Stolfo, H. Ibrahim, B. K. Hillyer, J. Andrews, and G. Wiederhold. The NON-VON Database Machine: An Overview. http://hdl.handle.net/10022/AC:P:11530, 1981.Google Scholar
- R. Sikorski. Boolean Algebras, volume 2. Springer, 1969.Google Scholar
- H. S. Stone. A Logic-in-Memory Computer. IEEE Trans. Comput., 1970. Google ScholarDigital Library
- A. Subramaniyan and R. Das. Parallel Automata Processor. In ISCA, 2017. Google ScholarDigital Library
- P. Tuyls, H. D. L. Hollmann, J. H. V. Lint, and L. Tolhuizen. XOR-based Visual Cryptography Schemes. Designs, Codes and Cryptography. Google ScholarDigital Library
- H. S. Warren. Hacker's Delight. Addison-Wesley Professional, 2nd edition, 2012. ISBN 0321842685, 9780321842688. Google ScholarDigital Library
- D. Weese, A.-K. Emde, T. Rausch, A. Döring, and K. Reinert. RazerS - fast Read Mapping with Sensitivity Control. Genome research, 2009.Google Scholar
- T. Willhalm, I. Oukid, I. Muller, and F. Faerber. Vectorizing Database Column Scans with Complex Predicates. In ADMS, 2013.Google Scholar
- K. Wu, E. J. Otoo, and A. Shoshani. Compressing Bitmap Indexes for Faster Search Operations. In SSDBM, 2002. Google ScholarDigital Library
- H. Xin, D. Lee, F. Hormozdiari, S. Yedkar, O. Mutlu, and C. Alkan. Accelerating Read Mapping with FastHASH. BMC Genomics, 2013.Google ScholarCross Ref
- H. Xin, J. Greth, J. Emmons, G. Pekhimenko, C. Kingsford, C. Alkan, and O. Mutlu. Shifted Hamming Distance: A Fast and Accurate SIMD-friendly Filter to Accelerate Alignment Verification in Read Mapping. Bioinformatics, 2015.Google Scholar
- D. S. Yaney, C. Y. Lu, R. A. Kohler, M. J. Kelly, and J. T. Nelson. A Meta-stable Leakage Phenomenon in DRAM Charge Storage - Variable Hold Time. In IEDM, 1987.Google Scholar
- D. Zhang, N. Jayasena, A. Lyashevsky, J. L. Greathouse, L. Xu, and M. Ignatowski. TOP-PIM: Throughput-oriented Programmable Processing in Memory. In HPDC, 2014. Google ScholarDigital Library
- T. Zhang, K. Chen, C. Xu, G. Sun, T. Wang, and Y. Xie. Half-DRAM: A High-bandwidth and Low-power DRAM Architecture from the Rethinking of Fine-grained Activation. In ISCA, 2014. Google ScholarDigital Library
- W. Zhao and Y. Cao. New Generation of Predictive Technology Model for Sub-45 nm Early Design Exploration. IEEE TED, 2006.Google ScholarCross Ref
- W. K. Zuravleff and T. Robinson. Controller for a Synchronous DRAM that Maximizes Throughput by Allowing Memory Requests and Commands to be Issued Out of Order. US Patent 5630096, 1997.Google Scholar
Index Terms
- Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology
Recommendations
SIMDRAM: a framework for bit-serial SIMD processing using DRAM
ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating SystemsProcessing-using-DRAM has been proposed for a limited set of basic operations (i.e., logic operations, addition). However, in order to enable full adoption of processing-using-DRAM, it is necessary to provide support for more complex operations. In this ...
RowClone: fast and energy-efficient in-DRAM bulk data copy and initialization
MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on MicroarchitectureSeveral system-level operations trigger bulk data copy or initialization. Even though these bulk data operations do not require any computation, current systems transfer a large quantity of data back and forth on the memory channel to perform such ...
Improving phase change memory performance with data content aware access
ISMM 2020: Proceedings of the 2020 ACM SIGPLAN International Symposium on Memory ManagementPhase change memory (PCM) is a scalable non-volatile memory technology that has low access latency (like DRAM) and high capacity (like Flash). Writing to PCM incurs significantly higher latency and energy penalties compared to reading its content. A ...
Comments