research-article

Concurrent Data Structures for Near-Memory Computing

Authors:

Maurice Herlihy,

Onur MutluAuthors Info & Claims

SPAA '17: Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures

Pages 235 - 245

https://doi.org/10.1145/3087556.3087582

Published: 24 July 2017 Publication History

Abstract

The performance gap between memory and CPU has grown exponentially. To bridge this gap, hardware architects have proposed near-memory computing (also called processing-in-memory, or PIM), where a lightweight processor (called a PIM core) is located close to memory. Due to its proximity to memory, a memory access from a PIM core is much faster than that from a CPU core. New advances in 3D integration and die-stacked memory make PIM viable in the near future. Prior work has shown significant performance improvements by using PIM for embarrassingly parallel and data-intensive applications, as well as for pointer-chasing traversals in sequential data structures. However, current server machines have hundreds of cores, and algorithms for concurrent data structures exploit these cores to achieve high throughput and scalability, with significant benefits over sequential data structures. Thus, it is important to examine how PIM performs with respect to modern concurrent data structures and understand how concurrent data structures can be developed to take advantage of PIM.

This paper is the first to examine the design of concurrent data structures for PIM. We show two main results: (1) naive PIM data structures cannot outperform state-of-the-art concurrent data structures, such as pointer-chasing data structures and FIFO queues, (2) novel designs for PIM data structures, using techniques such as combining, partitioning and pipelining, can outperform traditional concurrent data structures, with a significantly simpler design.

References

[1]

Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. A scalable processing-in-memory accelerator for parallel graph processing. In Proceedings of the 42nd Annual International Symposium on Computer Architecture, ISCA '15, pages 105--117, New York, NY, USA, 2015. ACM.

Digital Library

[2]

Junwhan Ahn, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture. In Proceedings of the 42nd Annual International Symposium on Computer Architecture, ISCA '15, pages 336--348, New York, NY, USA, 2015. ACM.

Digital Library

[3]

Berkin Akin, Franz Franchetti, and James C. Hoe. Data reorganization in memory using 3D-stacked DRAM. In Proceedings of the 42nd Annual International Symposium on Computer Architecture, ISCA '15, pages 131--143, New York, NY, USA, 2015. ACM.

Digital Library

[4]

Erfan Azarkhish, Christoph Pfister, Davide Rossi, Igor Loi, and Luca Benini. Logic-base interconnect design for near memory computing in the Smart Memory Cube. IEEE Trans. VLSI Syst., 25(1):210--223, 2017.

Digital Library

[5]

Erfan Azarkhish, Davide Rossi, Igor Loi, and Luca Benini. High performance AXI-4.0 based interconnect for extensible Smart Memory Cubes. In Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, DATE '15, pages 1317--1322, San Jose, CA, USA, 2015. EDA Consortium.

[6]

Erfan Azarkhish, Davide Rossi, Igor Loi, and Luca Benini. Design and evaluation of a processing-in-memory architecture for the Smart Memory Cube. In Proceedings of the 29th International Conference on Architecture of Computing Systems -- ARCS 2016 - Volume 9637, pages 19--31, New York, NY, USA, 2016. Springer-Verlag New York, Inc.

Digital Library

[7]

Oana Balmau, Rachid Guerraoui, Vasileios Trigonakis, and Igor Zablotchi. FloDB: Unlocking memory in persistent key-value stores. In Proceedings of the Twelfth European Conference on Computer Systems, EuroSys '17, pages 80--94, New York, NY, USA, 2017. ACM.

Digital Library

[8]

Bryan Black, Murali Annavaram, Ned Brekelbaum, John DeVale, Lei Jiang, Gabriel H. Loh, Don McCaule, Pat Morrow, Donald W. Nelson, Daniel Pantuso, Paul Reed, Jeff Rupley, Sadasivan Shankar, John Shen, and Clair Webb. Die stacking (3D) microarchitecture. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 39, pages 469--479, Washington, DC, USA, 2006. IEEE Computer Society.

Digital Library

[9]

Amirali Boroumand, Saugata Ghose, Brandon Lucia, Kevin Hsieh, Krishna Malladi, Hongzhong Zheng, and Onur Mutlu. LazyPIM: An efficient cache coherence mechanism for processing-in-memory. IEEE Computer Architecture Letters, 2016.

[10]

Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, and Marcos K. Aguilera. Black-box concurrent data structures for NUMA architectures. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '17, pages 207--221, New York, NY, USA, 2017. ACM.

Digital Library

[11]

Kevin K. Chang. Understanding and Improving Latency of DRAM-Based Memory Systems. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, USA, 2017.

[12]

Kevin K. Chang, Abhijith Kashyap, Hasan Hassan, Saugata Ghose, Kevin Hsieh, Donghyuk Lee, Tianshi Li, Gennady Pekhimenko, Samira Khan, and Onur Mutlu. Understanding latency variation in modern DRAM chips: Experimental characterization, analysis, and optimization. In Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, SIGMETRICS '16, pages 323--336, New York, NY, USA, 2016. ACM.

Digital Library

[13]

Kevin K. Chang, Prashant J. Nair, Donghyuk Lee, Saugata Ghose, Moinuddin K. Qureshi, and Onur Mutlu. Low-cost inter-linked subarrays (LISA): enabling fast inter-subarray data movement in DRAM. In IEEE International Symposium on High Performance Computer Architecture, HPCA 2016, Barcelona, Spain, March 12--16, 2016, pages 568--580, 2016.

[14]

Kevin K. Chang, A. Giray Yaglikci, Saugata Ghose, Aditya Agrawal, Niladrish Chatterjee, Abhijith Kashyap, Donghyuk Lee, Mike O'Connor, Hasan Hassan, and Onur Mutlu. Understanding reduced-voltage operation in modern dram devices: Experimental characterization, analysis, and mechanisms. In to appear in Proceedings of the 2017 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, SIGMETRICS '17.

[15]

Hybrid Memory Cube Consortium. Hybrid Memory Cube specification 1.0, 2013.

[16]

Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. Everything you always wanted to know about synchronization but were afraid to ask. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pages 33--48, New York, NY, USA, 2013. ACM.

Digital Library

[17]

Duncan G. Elliott, W. Martin Snelgrove, and Michael Stumm. Computational RAM: A memory-SIMD hybrid and its application to DSP. In Proceedings of the IEEE 1992 Custom Integrated Circuits Conference, CICC '92, pages 30.6.1--30.6.4, Piscataway, NJ, USA, 1992. IEEE Press.

[18]

Panagiota Fatourou and Nikolaos D. Kallimanis. Revisiting the combining synchronization technique. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 257--266, New York, NY, USA, 2012. ACM.

Digital Library

[19]

Keir Fraser. Practical lock-freedom. Technical Report UCAM-CL-TR-579, University of Cambridge, Computer Laboratory, February 2004.

[20]

Maya Gokhale, Bill Holmes, and Ken Iobst. Processing in memory: The Terasys massively parallel PIM array. Computer, 28(4):23--31, April 1995.

Digital Library

[21]

Mary Hall, Peter Kogge, Jeff Koller, Pedro Diniz, Jacqueline Chame, Jeff Draper, Jeff LaCoss, John Granacki, Jay Brockman, Apoorv Srivastava, William Athas, Vincent Freeh, Jaewook Shin, and Joonseok Park. Mapping irregular applications to DIVA, a PIM-based data-intensive architecture. In Proceedings of the 1999 ACM/IEEE Conference on Supercomputing, SC '99, New York, NY, USA, 1999. ACM.

Digital Library

[22]

M. Hashemi, O. Mutlu, and Y. N. Patt. Continuous Runahead: Transparent hardware acceleration for memory intensive workloads. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO '16, Oct 2016.

[23]

Milad Hashemi, Khubaib, Eiman Ebrahimi, Onur Mutlu, and Yale N. Patt. Accelerating dependent cache misses with an enhanced memory controller. In Proceedings of the 43rd International Symposium on Computer Architecture, ISCA '16, pages 444--455, Piscataway, NJ, USA, 2016. IEEE Press.

Digital Library

[24]

Steve Heller, Maurice Herlihy, Victor Luchangco, Mark Moir, William N. Scherer, and Nir Shavit. A lazy concurrent list-based set algorithm. In Proceedings of the 9th International Conference on Principles of Distributed Systems, OPODIS'05, pages 3--16, Berlin, Heidelberg, 2006. Springer-Verlag.

Digital Library

[25]

Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. Flat combining and the synchronization-parallelism tradeoff. In Proceedings of the Twenty-second Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '10, pages 355--364, New York, NY, USA, 2010. ACM.

Digital Library

[26]

Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. Scalable flat-combining based synchronous queues. In Proceedings of the 24th International Conference on Distributed Computing, DISC'10, pages 79--93, Berlin, Heidelberg, 2010. Springer-Verlag.

[27]

Maurice Herlihy and Nir Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2008.

Digital Library

[28]

Maurice P. Herlihy and Jeannette M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst., 12(3):463--492, July 1990.

Digital Library

[29]

Kevin Hsieh, Eiman Ebrahimi, Gwangsun Kim, Niladrish Chatterjee, Mike O'Connor, Nandita Vijaykumar, Onur Mutlu, and Stephen W. Keckler. Transparent offloading and mapping (TOM): Enabling programmer-transparent near-data processing in GPU systems. In Proceedings of the 43rd International Symposium on Computer Architecture, ISCA '16, pages 204--216, Piscataway, NJ, USA, 2016. IEEE Press.

Digital Library

[30]

Kevin Hsieh, Samira Khan, Nandita Vijaykumar, Kevin K Chang, Amirali Boroumand, Saugata Ghose, and Onur Mutlu. Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation. In IEEE 34th International Conference on Computer Design, ICCD 2016, pages 25--32. IEEE, 2016.

[31]

Joe Jeddeloh and Brent Keeth. Hybrid memory cube new DRAM architecture increases density and performance. In Symposium on VLSI Technology, VLSIT 2012, pages 87--88. IEEE, 2012.

[32]

Yi Kang, Wei Huang, Seung-Moon Yoo, Diana Keen, Zhenzhou Ge, Vinh Vi Lam, Josep Torrellas, and Pratap Pattnaik. FlexRAM: Toward an advanced intelligent memory system. In Proceedings of the IEEE International Conference On Computer Design, ICCD '99.

Digital Library

[33]

Joonyoung Kim and Younsu Kim. HBM: Memory solution for bandwidth-hungry processors. 2014 IEEE Hot Chips 26 Symposium (HCS), 00:1--24, 2014.

[34]

Peter M. Kogge. EXECUBE-a new architecture for scaleable MPPs. In Proceedings of the 1994 International Conference on Parallel Processing - Volume 01, ICPP '94, pages 77--84, Washington, DC, USA, 1994. IEEE Computer Society.

Digital Library

[35]

Donghyuk Lee. Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity. PhD thesis, Carnegie Mellon University, Pittsburgh, PA, USA, 2017.

[36]

Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, and Onur Mutlu. Simultaneous multi-layer access: Improving 3D-stacked memory bandwidth at low cost. ACM Trans. Archit. Code Optim., 12(4):63:1--63:29, January 2016.

Digital Library

[37]

Donghyuk Lee, Samira Khan, Lavanya Subramanian, Saugata Ghose, Rachata Ausavarungnirun, Gennady Pekhimenko, Vivek Seshadri, and Onur Mutlu. Design-induced latency variation in modern dram chips: Characterization, analysis, and latency reduction mechanisms. In to appear in Proceedings of the 2017 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, SIGMETRICS '17.

[38]

Donghyuk Lee, Yoongu Kim, Gennady Pekhimenko, Samira Manabi Khan, Vivek Seshadri, Kevin Kai-Wei Chang, and Onur Mutlu. Adaptive-latency DRAM: optimizing DRAM timing for the common-case. In 21st IEEE International Symposium on High Performance Computer Architecture, HPCA 2015, Burlingame, CA, USA, February 7--11, 2015, pages 489--501, 2015.

[39]

Donghyuk Lee, Yoongu Kim, Vivek Seshadri, Jamie Liu, Lavanya Subramanian, and Onur Mutlu. Tiered-latency DRAM: A low latency and low cost DRAM architecture. In 19th IEEE International Symposium on High Performance Computer Architecture, HPCA 2013, Shenzhen, China, February 23--27, 2013, pages 615--626, 2013.

[40]

Gabriel H. Loh. 3D-stacked memory architectures for multi-core processors. In Proceedings of the 35th Annual International Symposium on Computer Architecture, ISCA '08, pages 453--464, Washington, DC, USA, 2008. IEEE Computer Society.

Digital Library

[41]

Adam Morrison and Yehuda Afek. Fast concurrent queues for x86 processors. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, pages 103--112, New York, NY, USA, 2013. ACM.

Digital Library

[42]

Onur Mutlu. Memory scaling: a systems architecture perspective. In Proceedings of the 5th International Memory Workshop, IMW '13, 2013.

[43]

Onur Mutlu and Lavanya Subramanian. Research problems and opportunities in memory systems. Supercomputing Frontiers and Innovations, 1, 2014.

[44]

Mark Oskin, Frederic T. Chong, and Timothy Sherwood. Active pages: A computation model for intelligent memory. In Proceedings of the 25th Annual International Symposium on Computer Architecture, ISCA '98, pages 192--203, Washington, DC, USA, 1998. IEEE Computer Society.

Digital Library

[45]

David Patterson, Thomas Anderson, Neal Cardwell, Richard Fromm, Kimberly Keeton, Christoforos Kozyrakis, Randi Thomas, and Katherine Yelick. A case for intelligent RAM. IEEE Micro, 17(2):34--44, March 1997.

Digital Library

[46]

W. Pugh. Concurrent maintenance of skip lists. Technical report, University of Maryland at College Park, 1990.

[47]

Vivek Seshadri, Kevin Hsieh, Amirali Boroumand, Donghyuk Lee, Michael A. Kozuch, Onur Mutlu, Phillip B. Gibbons, and Todd C. Mowry. Fast bulk bitwise AND and OR in DRAM. IEEE Comput. Archit. Lett., 14(2):127--131, July 2015.

Digital Library

[48]

Vivek Seshadri, Yoongu Kim, Chris Fallin, Donghyuk Lee, Rachata Ausavarungnirun, Gennady Pekhimenko, Yixin Luo, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-46, pages 185--197, New York, NY, USA, 2013. ACM.

Digital Library

[49]

Vivek Seshadri, Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali Boroumand, Jeremie Kim, Michael A. Kozuch, Onur Mutlu, Phillip B. Gibbons, and Todd C. Mowry. Buddy-ram: Improving the performance and efficiency of bulk bitwise operations using DRAM. CoRR, abs/1611.09988, 2016.

[50]

Vivek Seshadri and Onur Mutlu. The processing using memory paradigm: In-DRAM bulk copy, initialization, bitwise AND and OR. CoRR, abs/1610.09603, 2016.

[51]

Harold S. Stone. A logic-in-memory computer. IEEE Trans. Comput., 19(1):73--78, January 1970.

Digital Library

[52]

J. Valois. Lock-free Data Structures. PhD thesis, Rensselaer Polytechnic Institute, Troy, NY, USA, 1996.

[53]

Dongping Zhang, Nuwan Jayasena, Alexander Lyashevsky, Joseph L. Greathouse, Lifan Xu, and Michael Ignatowski. TOP-PIM: Throughput-oriented programmable processing in memory. In Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, HPDC '14, pages 85--98, New York, NY, USA, 2014. ACM.

Digital Library

[54]

Qiuling Zhu, Berkin Akin, H. Ekin Sumbul, Fazle Sadi, James C. Hoe, Larry T. Pileggi, and Franz Franchetti. A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing. In IEEE International 3D Systems Integration Conference, 3DIC 2013, San Francisco, CA, USA, October 2--4, 2013, pages 1--7, 2013.

[55]

Qiuling Zhu, Tobias Graf, H. Ekin Sumbul, Larry T. Pileggi, and Franz Franchetti. Accelerating sparse matrix-matrix multiplication with 3D-stacked logic-in-memory hardware. In IEEE High Performance Extreme Computing Conference, HPEC 2013, Waltham, MA, USA, September 10--12, 2013, pages 1--6, 2013.

Cited By

Giannoula CYang PFernandez IYang JDurvasula SLi YSadrosadati MLuna JMutlu OPekhimenko G(2024)PyGim : An Efficient Graph Neural Network Library for Real Processing-In-Memory ArchitecturesProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/37004348:3(1-36)Online publication date: 13-Dec-2024
https://doi.org/10.1145/3700434
Lopes ACastro DRomano PTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)PIM-STM: Software Transactional Memory for Processing-In-Memory SystemsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640428(897-911)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640428
Hua YZheng SKong WZhou CHuang KMa RHuang L(2024)RADAR: A Skew-Resistant and Hotness-Aware Ordered Index Design for Processing-in-Memory SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.342485335:9(1598-1614)Online publication date: Sep-2024
https://doi.org/10.1109/TPDS.2024.3424853
Show More Cited By

Recommendations

Black-box Concurrent Data Structures for NUMA Architectures
Asplos'17

High-performance servers are Non-Uniform Memory Access (NUMA) machines. To fully leverage these machines, programmers need efficient concurrent data structures that are aware of the NUMA performance artifacts. We propose Node Replication (NR), a black-...
Main-Memory Near-Data Acceleration with Concurrent Host Access
Fast non-intrusive memory reclamation for highly-concurrent data structures
ISMM '16

Current memory reclamation mechanisms for highly-concurrent data structures present an awkward trade-off. Techniques such as epoch-based reclamation perform well when all threads are running on dedicated processors, but the delay or failure of a single ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SPAA '17: Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures

July 2017

392 pages

ISBN:9781450345934

DOI:10.1145/3087556

General Chair:
Christian Scheideler
Paderborn University, Germany
,
Program Chair:
Mohammad Hajiaghayi
University of Maryland at College Park, USA

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
SIGARCH: ACM Special Interest Group on Computer Architecture
EATCS: European Association for Theoretical Computer Science

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 July 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SPAA '17

Sponsor:

SPAA '17: 29th ACM Symposium on Parallelism in Algorithms and Architectures

July 24 - 26, 2017

DC, Washington, USA

Acceptance Rates

SPAA '17 Paper Acceptance Rate 31 of 127 submissions, 24%;

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Upcoming Conference

SPAA '25

Sponsor:
sigact
sigact

37th ACM Symposium on Parallelism in Algorithms and Architectures

July 28 - August 1, 2025

Portland , OR , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

69
Total Citations
View Citations
1,193
Total Downloads

Downloads (Last 12 months)89
Downloads (Last 6 weeks)12

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Giannoula CYang PFernandez IYang JDurvasula SLi YSadrosadati MLuna JMutlu OPekhimenko G(2024)PyGim : An Efficient Graph Neural Network Library for Real Processing-In-Memory ArchitecturesProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/37004348:3(1-36)Online publication date: 13-Dec-2024
https://doi.org/10.1145/3700434
Lopes ACastro DRomano PTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)PIM-STM: Software Transactional Memory for Processing-In-Memory SystemsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640428(897-911)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620665.3640428
Hua YZheng SKong WZhou CHuang KMa RHuang L(2024)RADAR: A Skew-Resistant and Hotness-Aware Ordered Index Design for Processing-in-Memory SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.342485335:9(1598-1614)Online publication date: Sep-2024
https://doi.org/10.1109/TPDS.2024.3424853
Rogers JSoliman TJahre M(2024)AIO: An Abstraction for Performance Analysis Across Diverse Accelerator Architectures2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00043(487-500)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00043
Fernandez IGiannoula CManglik AQuislant RGhiasi NGómez-Luna JGutierrez EPlata OMutlu O(2024)MATSA: An MRAM-Based Energy-Efficient Accelerator for Time Series AnalysisIEEE Access10.1109/ACCESS.2024.337331112(36727-36742)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3373311
Regassa DYeom HHwang J(2023)ESH: Design and Implementation of an Optimal Hashing Scheme for Persistent MemoryApplied Sciences10.3390/app13201152813:20(11528)Online publication date: 20-Oct-2023
https://doi.org/10.3390/app132011528
Kang HZhao YBlelloch GDhulipala LGu YMcGuffey CGibbons PDhulipala LSun Y(2023)PIM-tree: A Skew-resistant Index for Processing-in-Memory (Abstract)Proceedings of the 2023 ACM Workshop on Highlights of Parallel Computing10.1145/3597635.3598029(13-14)Online publication date: 18-Jul-2023
https://dl.acm.org/doi/10.1145/3597635.3598029
Liu MBaumann ACrooks NSchwarzkopf M(2023)Fabric-Centric ComputingProceedings of the 19th Workshop on Hot Topics in Operating Systems10.1145/3593856.3595907(118-126)Online publication date: 22-Jun-2023
https://dl.acm.org/doi/10.1145/3593856.3595907
Gómez-Luna JGuo YBrocard SLegriel JCimadomo ROliveira GSingh GMutlu O(2023)Evaluating Machine LearningWorkloads on Memory-Centric Computing Systems2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS57527.2023.00013(35-49)Online publication date: Apr-2023
https://doi.org/10.1109/ISPASS57527.2023.00013
Zhou ZLi CYang FSun G(2023)DIMM-Link: Enabling Efficient Inter-DIMM Communication for Near-Memory Processing2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071005(302-316)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071005
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents