skip to main content
10.1145/3317550.3321433acmconferencesArticle/Chapter ViewAbstractPublication PageshotosConference Proceedingsconference-collections
research-article

Designing Far Memory Data Structures: Think Outside the Box

Published: 13 May 2019 Publication History

Abstract

Technologies like RDMA and Gen-Z, which give access to memory outside the box, are gaining in popularity. These technologies provide the abstraction of far memory, where memory is attached to the network and can be accessed by remote processors without mediation by a local processor. Unfortunately, far memory is hard to use because existing data structures are mismatched to it. We argue that we need new data structures for far memory, borrowing techniques from concurrent data structures and distributed systems. We examine the requirements of these data structures and show how to realize them using simple hardware extensions.

References

[1]
M. K. Aguilera, N. Amit, I. Calciu, X. Deguillard, J. Gandhi, S. Novakovic, A. Ramanathan, P. Subrahmanyam, L. Suresh, K. Tati, R. Venkatasubramanian, and M. Wei. Remote regions: a simple abstraction for remote memory. In USENIX Annual Technical Conference, (ATC), pages 775--787, July 2018.
[2]
K. Asanović. FireBox: A hardware building block for 2020 warehouse-scale computers. In USENIX Conference on File and Storage Technologies (FAST), Feb. 2014.
[3]
N. Askitis and R. Sinha. HAT-trie: A cache-conscious trie-based data structure for strings. In Australasian Conference on Computer Science (ACSC), pages 97--105, Jan. 2007.
[4]
N. Ben-David, G. E. Blelloch, J. T. Fineman, P. B. Gibbons, Y. Gu, C. McGuffey, and J. Shun. Parallel algorithms for asymmetric read-write costs. In ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 145--156, July 2016.
[5]
C. Binnig, A. Crotty, A. Galakatos, T. Kraska, and E. Zamanian. The end of slow networks: It's time for a redesign. Proceedings of the VLDB Endowment, 9(7):528--539, Mar. 2016.
[6]
B. Brock, A. Buluç, and K. A. Yelick. BCL: A cross-platform distributed container library. CoRR, abs/1810.13029, Oct. 2018.
[7]
Q. Cai, W. Guo, H. Zhang, D. Agrawal, G. Chen, B. C. Ooi, K. Tan, Y. M. Teo, and S. Wang. Efficient distributed memory management with RDMA and caching. Proceedings of the VLDB Endowment, 11(11):1604--1617, July 2018.
[8]
E. Carson, J. Demmel, L. Grigori, N. Knight, P. Koanantakool, O. Schwartz, and H. V. Simhadri. Write-avoiding algorithms. In IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 648--658, May 2016.
[9]
J. B. Carter. Design of the Munin distributed shared memory system. Journal of Parallel and Distributed Computing, 29(2):219--227, Sept. 1995.
[10]
J. Demmel. Tutorial: Introduction to communication-avoiding algorithms. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Nov. 2016.
[11]
A. Dragojevic, D. Narayanan, M. Castro, and O. Hodson. FaRM: Fast remote memory. In USENIX Symposium on Networked Systems Design and Implementation (NSDI), pages 401--414, Apr. 2014.
[12]
A. Dragojevic, D. Narayanan, E. B. Nightingale, M. Renzelmann, A. Shamis, A. Badam, and M. Castro. No compromises: Distributed transactions with consistency, availability, and performance. In ACM Symposium on Operating Systems Principles (SOSP), pages 54--70, Oct. 2015.
[13]
P. T. Eugster, P. A. Felber, R. Guerraoui, and A.-M. Kermarrec. The many faces of publish/subscribe. ACM Computing Surveys, 35(2):114--131, June 2003.
[14]
P. Faraboschi, K. Keeton, T. Marsland, and D. Milojicic. Beyond processor-centric operating systems. In Workshop on Hot Topics in Operating Systems (HotOS), May 2015.
[15]
Gen-Z Core Specification, Revision 1.0. http://www.genzconsortium.org.
[16]
Gen-Z Atomics. http://genzconsortium.org/wp-content/uploads/2017/08/Gen-Z-Atomics.pdf, July 2017.
[17]
S. Heinz, J. Zobel, and H. E. Williams. Burst tries: A fast, efficient data structure for string keys. ACM Transactions on Information Systems, 20(2):192--223, Apr. 2002.
[18]
Hewlett Packard Enterprise The Machine. https://www.labs.hpe.com/the-machine.
[19]
High throughput computing data center architecture. http://www.huawei.com/ilink/en/download/HW_349607.
[20]
K. Hsieh, S. M. Khan, N. Vijaykumar, K. K. Chang, A. Boroumand, S. Ghose, and O. Mutlu. Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation. In IEEE International Conference on Computer Design (ICCD), pages 25--32, Oct. 2016.
[21]
Intel Omni-Path Architecture. http://www.intel.com/content/www/us/en/high-performance-computing-fabrics/omni-path-architecture-fabric-overview.html.
[22]
Intel Rack Scale Design. http://www.intel.com/content/www/us/en/architecture-and-technology/rack-scale-design-overview.html.
[23]
Intel/Facebook Disaggregated Rack. http://goo.gl/6h2Ut.
[24]
A. Kalia, M. Kaminsky, and D. G. Andersen. Using RDMA efficiently for key-value services. In ACM SIGCOMM, pages 295--304, Aug. 2014.
[25]
A. Kalia, M. Kaminsky, and D. G. Andersen. FaSST: Fast, scalable and simple distributed transactions with two-sided (RDMA) datagram RPCs. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 185--201, Nov. 2016.
[26]
K. Keeton, S. Singhal, and M. Raymond. The OpenFAM API: A programming model for disaggregated persistent memory. OpenSHMEM and Related Technologies: OpenSHMEM in the Era of Extreme Heterogeneity, Springer Lecture Notes in Computer Science, 11283:70--89, Mar. 2019.
[27]
M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed, V. Josifovski, J. Long, E. J. Shekita, and B.-Y. Su. Scaling distributed machine learning with the parameter server. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 583--598, Oct. 2014.
[28]
K. Lim, J. Chang, T. Mudge, P. Ranganathan, S. K. Reinhardt, and T. F. Wenisch. Disaggregated memory for expansion and sharing in blade servers. In International Symposium on Computer Architecture (ISCA), pages 267--278, June 2009.
[29]
K. Lim, Y. Turner, J. R. Santos, A. AuYoung, J. Chang, P. Ranganathan, and T. F. Wenisch. System-level implications of disaggregated memory. In International Symposium on High Performance Computer Architecture (HPCA), pages 1--12, Feb. 2012.
[30]
S. Loesing, M. Pilman, T. Etter, and D. Kossmann. On the design and scalability of distributed shared-data databases. In International Conference on Management of Data (SIGMOD), pages 663--676, May 2015.
[31]
Y. Mao, E. Kohler, and R. T. Morris. Cache craftiness for fast multicore key-value storage. In ACM European Conference on Computer Systems (EuroSys), pages 183--196, Apr. 2012.
[32]
S. Novakovic, A. Daglis, D. Ustiugov, E. Bugnion, B. Falsafi, and B. Grot. Mitigating load imbalance in distributed data serving with rack-scale memory pooling. ACM Transactions on Computer Systems, 36(2), Apr. 2019.
[33]
RDMA Consortium. http://www.rdmaconsortium.org/.
[34]
H. Shah, F. Marti, W. Noureddine, A. Eiriksson, and R. Sharp. Remote direct memory access (RDMA) protocol extensions. https://tools.ietf.org/html/rfc7306, June 2014.
[35]
X. Wei, Z. Dong, R. Chen, and H. Chen. Deconstructing RDMA-enabled distributed transactions: Hybrid is better! In USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 233--251, Oct. 2018.
[36]
G. Weisz, J. Melber, Y. Wang, K. Fleming, E. Nurvitadhi, and J. C. Hoe. A study of pointer-chasing performance on shared-memory processor-FPGA systems. In ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), pages 264--273, Feb. 2016.
[37]
H. Zhao, A. Shriraman, S. Kumar, and S. Dwarkadas. Protozoa: adaptive granularity cache coherence. In International Symposium on Computer Architecture (ISCA), pages 547--558, June 2013.

Cited By

View all
  • (2024)TianMen: a DPU-based storage network offloading structure for disaggregated datacentersProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698528(689-703)Online publication date: 20-Nov-2024
  • (2024)CHIME: A Cache-Efficient and High-Performance Hybrid Index on Disaggregated MemoryProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695959(110-126)Online publication date: 4-Nov-2024
  • (2024)A Memory-Disaggregated Radix TreeACM Transactions on Storage10.1145/366428920:3(1-41)Online publication date: 6-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HotOS '19: Proceedings of the Workshop on Hot Topics in Operating Systems
May 2019
227 pages
ISBN:9781450367271
DOI:10.1145/3317550
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2019

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

HotOS '19
Sponsor:

Upcoming Conference

HOTOS '25
Workshop on Hot Topics in Operating Systems
May 14 - 16, 2025
Banff , AB , Canada

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)95
  • Downloads (Last 6 weeks)12
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)TianMen: a DPU-based storage network offloading structure for disaggregated datacentersProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698528(689-703)Online publication date: 20-Nov-2024
  • (2024)CHIME: A Cache-Efficient and High-Performance Hybrid Index on Disaggregated MemoryProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695959(110-126)Online publication date: 4-Nov-2024
  • (2024)A Memory-Disaggregated Radix TreeACM Transactions on Storage10.1145/366428920:3(1-41)Online publication date: 6-Jun-2024
  • (2024)Scalable Distributed Inverted List Indexes in Disaggregated MemoryProceedings of the ACM on Management of Data10.1145/36549742:3(1-27)Online publication date: 30-May-2024
  • (2024)DM-TEE: Trusted Execution Environment for Disaggregated MemoryProceedings of the Great Lakes Symposium on VLSI 202410.1145/3649476.3658702(204-209)Online publication date: 12-Jun-2024
  • (2024)Effortless Locality on Data Systems Using Relational FabricIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.338682736:12(7410-7422)Online publication date: Dec-2024
  • (2023)PatronusProceedings of the 21st USENIX Conference on File and Storage Technologies10.5555/3585938.3585958(315-330)Online publication date: 21-Feb-2023
  • (2023)Building Write-Optimized Tree Indexes on Disaggregated MemoryACM SIGMOD Record10.1145/3604437.360444852:1(45-52)Online publication date: 8-Jun-2023
  • (2023)Cowbird: Freeing CPUs to Compute by Offloading the Disaggregation of MemoryProceedings of the ACM SIGCOMM 2023 Conference10.1145/3603269.3604833(1060-1073)Online publication date: 10-Sep-2023
  • (2023)Partial Failure Resilient Memory Management System for (CXL-based) Distributed Shared MemoryProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613135(658-674)Online publication date: 23-Oct-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media