skip to main content
10.1145/3123939.3124544acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article
Public Access

Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology

Published:14 October 2017Publication History

ABSTRACT

Many important applications trigger bulk bitwise operations, i.e., bitwise operations on large bit vectors. In fact, recent works design techniques that exploit fast bulk bitwise operations to accelerate databases (bitmap indices, BitWeaving) and web search (BitFunnel). Unfortunately, in existing architectures, the throughput of bulk bitwise operations is limited by the memory bandwidth available to the processing unit (e.g., CPU, GPU, FPGA, processing-in-memory).

To overcome this bottleneck, we propose Ambit, an Accelerator-in-Memory for bulk bitwise operations. Unlike prior works, Ambit exploits the analog operation of DRAM technology to perform bitwise operations completely inside DRAM, thereby exploiting the full internal DRAM bandwidth. Ambit consists of two components. First, simultaneous activation of three DRAM rows that share the same set of sense amplifiers enables the system to perform bitwise AND and OR operations. Second, with modest changes to the sense amplifier, the system can use the inverters present inside the sense amplifier to perform bitwise NOT operations. With these two components, Ambit can perform any bulk bitwise operation efficiently inside DRAM. Ambit largely exploits existing DRAM structure, and hence incurs low cost on top of commodity DRAM designs (1% of DRAM chip area). Importantly, Ambit uses the modern DRAM interface without any changes, and therefore it can be directly plugged onto the memory bus.

Our extensive circuit simulations show that Ambit works as expected even in the presence of significant process variation. Averaged across seven bulk bitwise operations, Ambit improves performance by 32X and reduces energy consumption by 35X compared to state-of-the-art systems. When integrated with Hybrid Memory Cube (HMC), a 3D-stacked DRAM with a logic layer, Ambit improves performance of bulk bitwise operations by 9.7X compared to processing in the logic layer of the HMC. Ambit improves the performance of three real-world data-intensive applications, 1) database bitmap indices, 2) BitWeaving, a technique to accelerate database scans, and 3) bit-vector-based implementation of sets, by 3X-7X compared to a state-of-the-art baseline using SIMD optimizations. We describe four other applications that can benefit from Ambit, including a recent technique proposed to speed up web search. We believe that large performance and energy improvements provided by Ambit can enable other applications to use bulk bitwise operations.

References

  1. Belly Card Engineering. https://tech.bellycard.com/.Google ScholarGoogle Scholar
  2. bitmapist. https://github.com/Doist/bitmapist.Google ScholarGoogle Scholar
  3. FastBit: An Efficient Compressed Bitmap Index Technology. https://sdm.lbl.gov/fastbit/.Google ScholarGoogle Scholar
  4. GeForce GTX 745. http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-745-oem/specifications.Google ScholarGoogle Scholar
  5. High Bandwidth Memory DRAM. http://www.jedec.org/standards-documents/docs/jesd235.Google ScholarGoogle Scholar
  6. Hybrid Memory Cube Specification 2.0. http://www.hybridmemorycube.org/files/SiteDownloads/HMC-30G-VSR_HMCC_Specification_Rev2.0_Public.pdf.Google ScholarGoogle Scholar
  7. 6th Generation Intel Core Processor Family Datasheet. http://www.intel.com/content/www/us/en/processors/core/desktop-6th-gen-core-family-datasheet-vol-1.html.Google ScholarGoogle Scholar
  8. Using Bitmap Indexes in Data Warehouses. https://docs.oracle.com/cd/B28359_01/server.111/b28313/indexes.htm.Google ScholarGoogle Scholar
  9. Predictive Technology Model. http://ptm.asu.edu/.Google ScholarGoogle Scholar
  10. Redis - bitmaps. http://redis.io/topics/data-types-intro.Google ScholarGoogle Scholar
  11. rlite. https://github.com/seppo0010/rlite.Google ScholarGoogle Scholar
  12. Spool. http://www.getspool.com/.Google ScholarGoogle Scholar
  13. std::set, std::bitset. http://en.cppreference.com/w/cpp/.Google ScholarGoogle Scholar
  14. DRAM Power Model. https://www.rambus.com/energy/, 2010.Google ScholarGoogle Scholar
  15. S. Aga, S. Jeloka, A. Subramaniyan, S. Narayanasamy D. Blaauw, and R. Das. Compute Caches. In HPCA, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  16. J. Ahn, S. Hong, S. Yoo, O. Mutlu, and K. Choi. A Scalable Processing-in-memory Accelerator for Parallel Graph Processing. In ISCA, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Ahn, S. Yoo, O. Mutlu, and K. Choi. PIM-enabled Instructions: A Low-overhead, Locality-aware Processing-in-memory Architecture. In ISCA, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Akerib, O. Agam, E. Ehrman, and M. Meyassed. Using Storage Cells to Perform Computation. US Patent 8908465, 2014.Google ScholarGoogle Scholar
  19. A. Akerib and E. Ehrman. In-memory Computational Device. US Patent 9653166, 2015.Google ScholarGoogle Scholar
  20. M. Alser, H. Hassan, H. Xin, O. Ergin, O. Mutlu, and C. Alkan. GateKeeper: A New Hardware Architecture for Accelerating Pre-Alignment in DNA Short Read Mapping. Bioinformatics, 2017.Google ScholarGoogle Scholar
  21. G. Benson, Y. Hernandez, and J. Loving. A Bit-Parallel, General Integer-Scoring Sequence Alignment Algorithm. In CPM, 2013.Google ScholarGoogle Scholar
  22. N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. The Gem5 Simulator. SIGARCH CAN, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. B. H. Bloom. Space/time Trade-offs in Hash Coding with Allowable Errors. ACM Communications, 13, July 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Boroumand, S. Ghose, B. Lucia, K. Hsieh, K. Malladi, H. Zheng, and O. Mutlu. LazyPIM: An Efficient Cache Coherence Mechanism for Processing-in-Memory. IEEE CAL, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  25. A. Boroumand, S. Ghose, M. Patel, H. Hassan, B. Lucia, N. Hajinazar, K. Hsieh, K. T. Malladi, H. Zheng, and O. Mutlu. LazyPIM: Efficient Support for Cache Coherence in Processing-in-Memory Architectures. arXiv preprint arXiv:1706.03162, 2017.Google ScholarGoogle Scholar
  26. C.-Y. Chan and Y. E. Ioannidis. Bitmap Index Design and Evaluation. In SIGMOD, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. K. Chang, D. Lee, Z. Chisti, A. R. Alameldeen, C. Wilkerson, Y. Kim, and O. Mutlu. Improving DRAM Performance by Parallelizing Refreshes with Accesses. In HPCA, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  28. K. K. Chang, A. Kashyap, H. Hassan, S. Ghose, K. Hsieh, D. Lee, T. Li, G. Pekhimenko, S. Khan, and O. Mutlu. Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization. In SIGMETRICS, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. K. K. Chang, P. J. Nair, D. Lee, S. Ghose, M. K. Qureshi, and O. Mutlu. Low-cost Inter-linked Subarrays (LISA): Enabling Fast Inter-subarray Data Movement in DRAM. In HPCA, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  30. K. K. Chang, A. G. Yaălikçi, S. Ghose, A. Agrawal, N. Chatterjee, A. Kashyap, D. Lee, M. O'Connor, H. Hassan, and O. Mutlu. Understanding Reduced-voltage Operation in Modern DRAM Devices: Experimental Characterization, Analysis, and Mechanisms. SIGMETRICS, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Corbet, A. Rubini, and G. Kroah-Hartman. Linux Device Drivers, page 445. O'Reilly Media, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. Denir, I. AbdelRahman, L. He, and Y. Gao. Audience Insights Query Engine. https://www.facebook.com/business/news/audience-insights.Google ScholarGoogle Scholar
  33. P. Dlugosch, D. Brown, P. Glendenning, M. Leventhal, and H. Noyes. An Efficient and Scalable Semiconductor Architecture for Parallel Automata Processing. IEEE TPDS, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  34. J. Draper, J. Chame, M. Hall, C. Steele, T. Barrett, J. LaCoss, J. Granacki, J. Shin, C. Chen, C. W. Kang, I. Kim, and G. Daglikoca. The Architecture of the DIVA Processing-in-memory Chip. In ICS, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. Elliott, M. Stumm, W. M. Snelgrove, C. Cojocaru, and R. McKenzie. Computational RAM: Implementing Processors in Memory. IEEE DT, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C. F. Falconer, C. P. Mozak, and A. J. Normal. Suppressing Power Supply Noise Using Data Scrambling in Double Data Rate Memory Systems. US Patent 8503678, 2009.Google ScholarGoogle Scholar
  37. A. Farmahini-Farahani, J. H. Ahn, K. Morrow, and N. S. Kim. NDA: Near-DRAM Acceleration Architecture Leveraging Commodity DRAM Devices and Standard Memory Modules. In HPCA, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  38. B. B. Fraguela, J. Renau, P. Feautrier, D. Padua, and J. Torrellas. Programming the FlexRAM Parallel Intelligent Memory System. In PPoPP, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. Gokhale, B. Holmes, and K. Iobst. Processing in Memory: The Terasys Massively Parallel PIM Array. Computer, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. B. Goodwin, M. Hopcroft, D. Luu, A. Clemmer, M. Curmei, S. Elnikety, and Y. He. BitFunnel: Revisiting Signatures for Search. In SIGIR, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. L. J. Guibas and R. Sedgewick. A Dichromatic Framework for Balanced Trees. In SFCS, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Q. Guo, N. Alachiotis, B. Akin, F. Sadi, G. Xu, T. M. Low, L. Pileggi, J. C. Hoe, and F. Franchetti. 3D-stacked Memory-side Acceleration: Accelerator and System Design. In WoNDP, 2013.Google ScholarGoogle Scholar
  43. R. W. Hamming. Error Detecting and Error Correcting Codes. BSTJ, 1950.Google ScholarGoogle ScholarCross RefCross Ref
  44. J.-W. Han, C.-S. Park, D.-H. Ryu, and E.-S. Kim. Optical Image Encryption Based on XOR Operations. SPIE OE, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  45. H. Hassan, G. Pekhimenko, N. Vijaykumar, V. Seshadri, D. Lee, O. Ergin, and O. Mutlu. ChargeCache: Reducing DRAM Latency by Exploiting Row Access Locality. In HPCA, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  46. H. Hassan, N. Vijaykumar, S. Khan, S. Ghose, K. Chang, G. Pekhimenko, D. Lee, O. Ergin, and O. Mutlu. SoftMC: A Flexible and Practical Open-source Infrastructure for Enabling Experimental DRAM Studies. In HPCA, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  47. K. Hsieh, E. Ebrahimi, G. Kim, N. Chatterjee, M. O'Connor, N. Vijaykumar, O. Mutlu, and S. W. Keckler. Transparent Offloading and Mapping (TOM): Enabling Programmer-transparent Near-data Processing in GPU Systems. In ISCA, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. K. Hsieh, S. Khan, N. Vijaykumar, K. K. Chang, A. Boroumand, S. Ghose, and O. Mutlu. Accelerating Pointer Chasing in 3D-stacked Memory: Challenges, Mechanisms, Evaluation. In ICCD, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  49. Intel. Intel Instruction Set Architecture Extensions. https://software.intel.com/en-us/intel-isa-extensions.Google ScholarGoogle Scholar
  50. K. Itoh. VLSI Memory Chip Design, volume 5. Springer Science & Business Media, 2013.Google ScholarGoogle Scholar
  51. J. Jeddeloh and B. Keeth. Hybrid Memory Cube: New DRAM Architecture Increases Density and Performance. In VLSIT, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  52. JEDEC. DDR3 SDRAM Standard, JESD79-3D. http://www.jedec.org/sites/default/files/docs/JESD79-3D.pdf, 2009.Google ScholarGoogle Scholar
  53. H. Kang and S. Hong. One-Transistor Type DRAM. US Patent 7701751, 2009.Google ScholarGoogle Scholar
  54. M. Kang, M.-S. Keel, N. R. Shanbhag, S. Eilert, and K. Curewitz. An Energy-efficient VLSI Architecture for Pattern Recognition via Deep Embedding of Computation in SRAM. In ICASSP, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  55. U. Kang, H.-s. Yu, C. Park, H. Zheng, J. Halbert, K. Bains, S. Jang, and J. S. Choi. Co-architecting Controllers and DRAM to Enhance DRAM Process Scaling. In The Memory Forum, 2014.Google ScholarGoogle Scholar
  56. Y. Kang, W. Huang, S.-M. Yoo, D. Keen, Z. Ge, V. Lam, P. Pattnaik, and J. Torrellas. FlexRAM: Toward an Advanced Intelligent Memory System. In ICCD, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  57. B. Keeth, R. J. Baker, B. Johnson, and F. Lin. DRAM Circuit Design: Fundamental and High-Speed Topics. Wiley-IEEE Press, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. J. S. Kim, D. Senol, H. Xin, D. Lee, S. Ghose, M. Alser, H. Hassan, O. Ergin, C. Alkan, and O. Mutlu. GRIM-filter: Fast Seed Filtering in Read Mapping Using Emerging Memory Technologies. arXiv preprint arXiv:1708.04329, 2017.Google ScholarGoogle Scholar
  59. Y. Kim, V. Seshadri, D. Lee, J. Liu, and O. Mutlu. A Case for Exploiting Subarray-level Parallelism (SALP) in DRAM. In ISCA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Y. Kim, W. Yang, and O. Mutlu. Ramulator: A Fast and Extensible DRAM Simulator. IEEE CAL, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. D. E. Knuth. The Art of Computer Programming. Fascicle 1: Bitwise Tricks & Techniques; Binary Decision Diagrams, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. P. M. Kogge. EXECUBE: A New Architecture for Scaleable MPPs. In ICPP, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. S. Kvatinsky, A. Kolodny, U. C. Weiser, and E. G. Friedman. Memristor-based IMPLY Logic Design Procedure. In ICCD, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. S. Kvatinsky, D. Belousov, S. Liman, G. Satat, N. Wald, E. G. Friedman, A. Kolodny, and U. C. Weiser. MAGIC ---Memristor-Aided Logic. IEEE TCAS II: Express Briefs, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  65. S. Kvatinsky, G. Satat, N. Wald, E. G. Friedman, A. Kolodny, and U. C. Weiser. Memristor-Based Material Implication (IMPLY) Logic: Design Principles and Methodologies. IEEE TVLSI, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  66. B. C. Lee, E. Ipek, O. Mutlu, and D. Burger. Architecting Phase Change Memory As a Scalable DRAM Alternative. In ISCA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. D. Lee, Y. Kim, G. Pekhimenko, S. Khan, V. Seshadri, K. K. Chang, and O. Mutlu. Adaptive-Latency DRAM: Optimizing DRAM Timing for the Common-case. In HPCA, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  68. D. Lee, Y. Kim, V. Seshadri, J. Liu, L. Subramanian, and O. Mutlu. Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture. In HPCA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. D. Lee, F. Hormozdiari, H. Xin, F. Hach, O. Mutlu, and C. Alkan. Fast and Accurate Mapping of Complete Genomics Reads. Methods, 2015.Google ScholarGoogle Scholar
  70. D. Lee, S. Ghose, G. Pekhimenko, S. Khan, and O. Mutlu. Simultaneous Multi-Layer Access: Improving 3D-Stacked Memory Bandwidth at Low Cost. ACM TACO, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. D. Lee, S. Khan, L. Subramanian, S. Ghose, R. Ausavarungnirun, G. Pekhimenko, V. Seshadri, and O. Mutlu. Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms. In SIGMETRICS, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Y. Levy, J. Bruck, Y. Cassuto, E. G. Friedman, A. Kolodny, E. Yaakobi, and S. Kvatinsky. Logic Operations in Memory Using a Memristive Akers Array. Microelectronics Journal, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. H. Li and R. Durbin. Fast and Accurate Long-read Alignment with Burrows-Wheeler Transform. Bioinformatics, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. S. Li, C. Xu, Q. Zou, J. Zhao, Y. Lu, and Y. Xie. Pinatubo: A Processing-in-Memory Architecture for Bulk Bitwise Operations in Emerging Non-Volatile Memories. In DAC, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Y. Li and J. M. Patel. BitWeaving: Fast Scans for Main Memory Data Processing. In SIGMOD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Y. Li and J. M. Patel. WideTable: An Accelerator for Analytical Data Processing. Proc. VLDB Endow., 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. NVIDIA Tesla: A Unified Graphics and Computing Architecture. IEEE Micro, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. J. Liu, B. Jaiyen, R. Veras, and O. Mutlu. RAIDR: Retention-Aware Intelligent DRAM Refresh. In ISCA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. J. Liu, B. Jaiyen, Y. Kim, C. Wilkerson, and O. Mutlu. An Experimental Study of Data Retention Behavior in Modern DRAM Devices: Implications for Retention Time Profiling Mechanisms. In ISCA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Z. Liu, I. Calciu, M. Herlihy and O. Mutlu. Concurrent Data Structures for Near-Memory Computing. In SPAA, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. S.-L. Lu, Y.-C. Lin, and C.-L. Yang. Improving DRAM Latency with Dynamic Asymmetric Subarray. In MICRO, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. R. E. Lyons and W. Vanderkulk. The Use of Triple-Modular Redundancy to Improve Computer Reliability. IBM JRD, 1962. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. S. A. Manavski. CUDA Compatible GPU as an Efficient Hardware Accelerator for AES Cryptography. In ICSPC, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  84. G. Myers. A Fast Bit-vector Algorithm for Approximate String Matching Based on Dynamic Programming. JACM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. E. O'Neil, P. O'Neil, and K. Wu. Bitmap Index Design Choices and Their Performance Implications. In IDEAS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. M. Oskin, F. T. Chong, and T. Sherwood. Active Pages: A Computation Model for Intelligent Memory. In ISCA, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. M. Patel, J. S. Kim, and O. Mutlu. The Reach Profiler (REAPER): Enabling the Mitigation of DRAM Retention Failures via Profiling at Aggressive Conditions. In ISCA, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick. A Case for Intelligent RAM. IEEE Micro, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. A. Pattnaik, X. Tang, A. Jog, O. Kayıran, A. K. Mishra, M. T. Kandemir, O. Mutlu, and C. R. Das. SchEduling Techniques for GPU Architectures with Processing-in-memory Capabilities. In PACT, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. A. Peleg and U. Weiser. MMX Technology Extension to the Intel Architecture. IEEE Micro, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. K. R. Rasmussen, J. Stoye, and E. W. Myers. Efficient Q-gram Filters for Finding All ε-matches Over a Given Length. JCB, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  92. P. J. Restle, J. W. Park, and B. F. Lloyd. DRAM Variable Retention Time. In IEDM, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  93. R. L. Rivest, L. Adleman, and M. L. Dertouzos. On Data Banks and Privacy Homomorphisms. FSC, 1978.Google ScholarGoogle Scholar
  94. S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory Access Scheduling. In ISCA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. S. M. Rumble, P. Lacroute, A. V. Dalca, M. Fiume, A. Sidow, and M. Brudno. SHRiMP: Accurate Mapping of Short Color-space Reads. PLOS Computational Biology, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  96. V. Seshadri and O. Mutlu. Simple Operations in Memory to Reduce Data Movement, ADCOM, Chapter 5. Elsevier, 2017.Google ScholarGoogle Scholar
  97. V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. RowClone: Fast and Energy-efficient In-DRAM Bulk Data Copy and Initialization. In MICRO, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. V. Seshadri, A. Bhowmick, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. The Dirty-block Index. In ISCA, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. V. Seshadri, K. Hsieh, A. Boroumand, D. Lee, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry. Fast Bulk Bitwise AND and OR in DRAM. IEEE CAL, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. V. Seshadri, T. Mullins, A. Boroumand, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. Gather-scatter DRAM: In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses. In MICRO, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry. Buddy-RAM: Improving the Performance and Efficiency of Bulk Bitwise Operations Using DRAM. arXiv preprint arXiv:1611.09988, 2016.Google ScholarGoogle Scholar
  102. A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars. In ISCA, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. D. E. Shaw, S. Stolfo, H. Ibrahim, B. K. Hillyer, J. Andrews, and G. Wiederhold. The NON-VON Database Machine: An Overview. http://hdl.handle.net/10022/AC:P:11530, 1981.Google ScholarGoogle Scholar
  104. R. Sikorski. Boolean Algebras, volume 2. Springer, 1969.Google ScholarGoogle Scholar
  105. H. S. Stone. A Logic-in-Memory Computer. IEEE Trans. Comput., 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. A. Subramaniyan and R. Das. Parallel Automata Processor. In ISCA, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. P. Tuyls, H. D. L. Hollmann, J. H. V. Lint, and L. Tolhuizen. XOR-based Visual Cryptography Schemes. Designs, Codes and Cryptography. Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. H. S. Warren. Hacker's Delight. Addison-Wesley Professional, 2nd edition, 2012. ISBN 0321842685, 9780321842688. Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. D. Weese, A.-K. Emde, T. Rausch, A. Döring, and K. Reinert. RazerS - fast Read Mapping with Sensitivity Control. Genome research, 2009.Google ScholarGoogle Scholar
  110. T. Willhalm, I. Oukid, I. Muller, and F. Faerber. Vectorizing Database Column Scans with Complex Predicates. In ADMS, 2013.Google ScholarGoogle Scholar
  111. K. Wu, E. J. Otoo, and A. Shoshani. Compressing Bitmap Indexes for Faster Search Operations. In SSDBM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. H. Xin, D. Lee, F. Hormozdiari, S. Yedkar, O. Mutlu, and C. Alkan. Accelerating Read Mapping with FastHASH. BMC Genomics, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  113. H. Xin, J. Greth, J. Emmons, G. Pekhimenko, C. Kingsford, C. Alkan, and O. Mutlu. Shifted Hamming Distance: A Fast and Accurate SIMD-friendly Filter to Accelerate Alignment Verification in Read Mapping. Bioinformatics, 2015.Google ScholarGoogle Scholar
  114. D. S. Yaney, C. Y. Lu, R. A. Kohler, M. J. Kelly, and J. T. Nelson. A Meta-stable Leakage Phenomenon in DRAM Charge Storage - Variable Hold Time. In IEDM, 1987.Google ScholarGoogle Scholar
  115. D. Zhang, N. Jayasena, A. Lyashevsky, J. L. Greathouse, L. Xu, and M. Ignatowski. TOP-PIM: Throughput-oriented Programmable Processing in Memory. In HPDC, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. T. Zhang, K. Chen, C. Xu, G. Sun, T. Wang, and Y. Xie. Half-DRAM: A High-bandwidth and Low-power DRAM Architecture from the Rethinking of Fine-grained Activation. In ISCA, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. W. Zhao and Y. Cao. New Generation of Predictive Technology Model for Sub-45 nm Early Design Exploration. IEEE TED, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  118. W. K. Zuravleff and T. Robinson. Controller for a Synchronous DRAM that Maximizes Throughput by Allowing Memory Requests and Commands to be Issued Out of Order. US Patent 5630096, 1997.Google ScholarGoogle Scholar

Index Terms

  1. Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture
          October 2017
          850 pages
          ISBN:9781450349529
          DOI:10.1145/3123939

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 14 October 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate484of2,242submissions,22%

          Upcoming Conference

          MICRO '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader