skip to main content
10.1145/3514221.3526187acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

ScaleStore: A Fast and Cost-Efficient Storage Engine using DRAM, NVMe, and RDMA

Authors Info & Claims
Published:11 June 2022Publication History

ABSTRACT

In this paper, we propose ScaleStore, a novel distributed storage engine that exploits DRAM caching, NVMe storage, and RDMA networking to achieve high performance, cost-efficiency, and scalability at the same time. Using low latency RDMA messages, ScaleStore implements a transparent memory abstraction that provides access to the aggregated DRAM memory and NVMe storage of all nodes. In contrast to existing distributed RDMA designs such as NAM-DB or FaRM, ScaleStore stores cold data on NVMe SSDs (flash), lowering the overall hardware cost significantly. The core of ScaleStore is a distributed caching strategy that dynamically decides which data to keep in memory (and which on SSDs) based on the workload. The caching protocol also provides strong consistency in the presence of concurrent data modifications. Our evaluation shows that ScaleStore achieves high performance for various types of workloads (read/write-dominated, uniform/skewed) even when the data size is larger than the aggregated memory of all nodes. We further show that ScaleStore can efficiently handle dynamic workload changes and supports elasticity.

References

  1. Industry Perspectives | Nov 12. 2015. Don't forget about Memory: DRAM's Surprising role in the high cost of data centers. https://www.datacenterknowledge.com/archives/2015/11/12/dont-forget-memory-drams-surprising-role-high-cost-data-centersGoogle ScholarGoogle Scholar
  2. Karolina Alexiou, Donald Kossmann, and Per-Åke Larson. 2013. Adaptive Range Filters for Cold Data: Avoiding Trips to Siberia. PVLDB, Vol. 6, 14 (2013).Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Gustavo Alonso, Carsten Binnig, Ippokratis Pandis, Kenneth Salem, Jan Skrzypczak, Ryan Stutsman, Lasse Thostrup, Tianzheng Wang, Zeke Wang, and Tobias Ziegler. 2019. DPI: The Data Processing Interface for Modern Networks. In CIDR.Google ScholarGoogle Scholar
  4. InfiniBand Trade Association. 2000. InfiniBand Architecture Specification, Release 1.0, 2000. http://www.infinibandta.org/specs.Google ScholarGoogle Scholar
  5. Claude Barthels, Simon Loesing, Gustavo Alonso, and Donald Kossmann. 2015. Rack-Scale In-Memory Join Processing using RDMA. In SIGMOD.Google ScholarGoogle Scholar
  6. Lawrence Benson, Hendrik Makait, and Tilmann Rabl. 2021. Viper: An Efficient Hybrid PMem-DRAM Key-Value Store. PVLDB, Vol. 14, 9 (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Carsten Binnig, Andrew Crotty, Alex Galakatos, Tim Kraska, and Erfan Zamanian. 2016. The End of Slow Networks: It's Time for a Redesign. PVLDB, Vol. 9, 7 (2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Peter A. Boncz, Marcin Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper-Pipelining Query Execution. In CIDR.Google ScholarGoogle Scholar
  9. Jan Bö ttcher, Viktor Leis, Jana Giceva, Thomas Neumann, and Alfons Kemper. 2020. Scalable and robust latches for database systems. In DaMoN.Google ScholarGoogle Scholar
  10. Qingchao Cai, Wentian Guo, Hao Zhang, Divyakant Agrawal, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, Yong Meng Teo, and Sheng Wang. 2018. Efficient Distributed Memory Management with RDMA and Caching. PVLDB, Vol. 11, 11 (2018).Google ScholarGoogle Scholar
  11. Yanzhe Chen, Xingda Wei, Jiaxin Shi, Rong Chen, and Haibo Chen. 2016. Fast and general distributed transactions using RDMA and HTM. In EuroSys.Google ScholarGoogle Scholar
  12. GAM Code. 2018a. https://github.com/ooibc88/gamGoogle ScholarGoogle Scholar
  13. LeanStore Code. 2022 a. https://github.com/leanstore/leanstoreGoogle ScholarGoogle Scholar
  14. OLC B-Tree Code. 2018b. https://github.com/wangziqi2016/index-microbench/blob/master/BTreeOLC/BTreeOLC.hGoogle ScholarGoogle Scholar
  15. ScaleStore Code. 2022 b. https://github.com/DataManagementLab/ScaleStoreGoogle ScholarGoogle Scholar
  16. Zipf Generator Code. 2021. https://github.com/opencog/cogutilGoogle ScholarGoogle Scholar
  17. Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Beno^i t Dageville, Thierry Cruanes, Marcin Zukowski, Vadim Antonov, Artin Avanes, Jon Bock, Jonathan Claybaugh, Daniel Engovatov, Martin Hentschel, Jiansheng Huang, Allison W. Lee, Ashish Motivala, Abdul Q. Munir, Steven Pelley, Peter Povinec, Greg Rahn, Spyridon Triantafyllis, and Philipp Unterbrunner. 2016. The Snowflake Elastic Data Warehouse. In SIGMOD.Google ScholarGoogle Scholar
  19. Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Åke Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, and Mike Zwilling. 2013. Hekaton: SQL server's memory-optimized OLTP engine. In SIGMOD.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Aleksandar Dragojevic, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. 2014. FaRM: Fast Remote Memory. In NSDI.Google ScholarGoogle Scholar
  21. Aleksandar Dragojevic, Dushyanth Narayanan, Edmund B. Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam, and Miguel Castro. 2015. No compromises: distributed transactions with consistency, availability, and performance. In SOSP.Google ScholarGoogle Scholar
  22. Dominik Durner, Badrish Chandramouli, and Yinan Li. 2021. Crystal: A Unified Cache Storage System for Analytical Databases. PVLDB, Vol. 14 (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Aaron J. Elmore, Vaibhav Arora, Rebecca Taft, Andrew Pavlo, Divyakant Agrawal, and Amr El Abbadi. 2015. Squall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases. In SIGMOD.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Franz F"a rber, Sang Kyun Cha, Jü rgen Primsch, Christof Bornhö vd, Stefan Sigg, and Wolfgang Lehner. 2011. SAP HANA database: data management for modern business applications. SIGMOD Rec., Vol. 40, 4 (2011).Google ScholarGoogle Scholar
  25. Philipp Fent, Alexander van Renen, Andreas Kipf, Viktor Leis, Thomas Neumann, and Alfons Kemper. 2020. Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory. In ICDE.Google ScholarGoogle Scholar
  26. Gabriela Gligor, Silviu Teodoru, et al. 2011. Oracle exalytics: engineered for speed-of-thought analytics. Database Systems Journal, Vol. 2, 4 (2011), 3--8.Google ScholarGoogle Scholar
  27. Gabriel Haas, Michael Haubenschild, and Viktor Leis. 2020. Exploiting Directly-Attached NVMe Arrays in DBMS. In CIDR.Google ScholarGoogle Scholar
  28. Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, and Michael Stonebraker. 2008. OLTP through the looking glass, and what we found there. In SIGMOD.Google ScholarGoogle Scholar
  29. Michael Haubenschild, Caetano Sauer, Thomas Neumann, and Viktor Leis. 2020. Rethinking Logging, Checkpoints, and Recovery for High-Performance Storage Engines. In SIGMOD.Google ScholarGoogle Scholar
  30. IBM. [n.d.]. Moving from a TCP/IP protocol network to an RDMA protocol network. https://www.ibm.com/docs/en/db2/11.1?topic=tfsai-moving-from-tcpip-protocol-network-rdma-protocol-networkGoogle ScholarGoogle Scholar
  31. Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2014. Using RDMA efficiently for key-value services. In SIGCOMM.Google ScholarGoogle Scholar
  32. Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016a. Design Guidelines for High Performance RDMA Systems. login Usenix Mag., Vol. 41, 3 (2016).Google ScholarGoogle Scholar
  33. Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016b. FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs. In OSDI.Google ScholarGoogle Scholar
  34. Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alex Rasin, Stanley B. Zdonik, Evan P. C. Jones, Samuel Madden, Michael Stonebraker, Yang Zhang, John Hugg, and Daniel J. Abadi. 2008. H-store: a high-performance, distributed main memory transaction processing system. PVLDB, Vol. 1, 2 (2008).Google ScholarGoogle Scholar
  35. Antonios Katsarakis, Yijun Ma, Zhaowei Tan, Andrew Bainbridge, Matthew Balkwill, Aleksandar Dragojevic, Boris Grot, Bozidar Radunovic, and Yongguang Zhang. 2021. Zeus: locality-aware distributed transactions. In EuroSys.Google ScholarGoogle Scholar
  36. Stefanos Kaxiras, David Klaftenegger, Magnus Norgren, Alberto Ros, and Konstantinos Sagonas. 2015. Turning Centralized Coherence and Distributed Critical-Section Execution on their Head: A New Approach for Scalable Distributed Shared Memory. In HPDC.Google ScholarGoogle Scholar
  37. Alfons Kemper and Thomas Neumann. 2011. HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In ICDE.Google ScholarGoogle Scholar
  38. Viktor Leis, Michael Haubenschild, Alfons Kemper, and Thomas Neumann. 2018. LeanStore: In-Memory Data Management beyond Main Memory. In ICDE.Google ScholarGoogle Scholar
  39. Viktor Leis, Michael Haubenschild, and Thomas Neumann. 2019. Optimistic Lock Coupling: A Scalable and Efficient General-Purpose Synchronization Method. IEEE Data Eng. Bull. (2019).Google ScholarGoogle Scholar
  40. Viktor Leis, Florian Scheibner, Alfons Kemper, and Thomas Neumann. 2016. The ART of practical synchronization. In DaMoN.Google ScholarGoogle Scholar
  41. Lucas Lersch, Wolfgang Lehner, and Ismail Oukid. 2019. Persistent Buffer Management with Optimistic Consistency. In DaMoN.Google ScholarGoogle Scholar
  42. Justin J. Levandoski, Per-Åke Larson, and Radu Stoica. 2013. Identifying hot and cold data in main-memory databases. In ICDE.Google ScholarGoogle Scholar
  43. Feng Li, Sudipto Das, Manoj Syamala, and Vivek R. Narasayya. 2016. Accelerating Relational Databases by Leveraging Remote Memory and RDMA. In SIGMOD.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kaminsky. 2014. MICA: A Holistic Approach to Fast In-Memory Key-Value Storage. In NSDI.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Qian Lin, Pengfei Chang, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, and Zhengkui Wang. 2016. Towards a Non-2PC Transaction Management in Distributed Database Systems. In SIGMOD.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Feilong Liu, Lingyan Yin, and Spyros Blanas. 2017. Design and Evaluation of an RDMA-aware Data Shuffling Operator for Parallel Database Systems. In EuroSys.Google ScholarGoogle Scholar
  47. Simon Loesing, Markus Pilman, Thomas Etter, and Donald Kossmann. 2015. On the Design and Scalability of Distributed Shared-Data Databases. In SIGMOD.Google ScholarGoogle Scholar
  48. Christopher Mitchell, Yifeng Geng, and Jinyang Li. 2013. Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store. In USENIX ATC.Google ScholarGoogle Scholar
  49. Christopher Mitchell, Kate Montgomery, Lamont Nelson, Siddhartha Sen, and Jinyang Li. 2016. Balancing CPU and Network in the Cell Distributed B-Tree Store. In USENIX ATC.Google ScholarGoogle Scholar
  50. Jacob Nelson, Brandon Holt, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, and Mark Oskin. 2015. Latency-Tolerant Software Distributed Shared Memory. In USENIX. Santa Clara, CA.Google ScholarGoogle Scholar
  51. Thomas Neumann and Michael J. Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In CIDR.Google ScholarGoogle Scholar
  52. NVIDIA. 2012. Mellanox InfiniBand Helps Accelerate Teradata Aster Big Analytics Appliance. https://www.mellanox.com/news/press_release/mellanox-infiniband-helps-accelerate-teradata-aster-big-analytics-applianceGoogle ScholarGoogle Scholar
  53. Oracle. 2012. Delivering Application Performance with Oracle's InfiniBand Technology.Google ScholarGoogle Scholar
  54. John K. Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Maziè res, Subhasish Mitra, Aravind Narayanan, Guru M. Parulkar, Mendel Rosenblum, Stephen M. Rumble, Eric Stratmann, and Ryan Stutsman. 2009. The case for RAMClouds: scalable high-performance storage entirely in DRAM. ACM SIGOPS Oper. Syst. Rev., Vol. 43, 4 (2009).Google ScholarGoogle Scholar
  55. Magdalena Prö bstl, Philipp Fent, Maximilian E. Schü le, Moritz Sichert, Thomas Neumann, and Alfons Kemper. 2021. One Buffer Manager to Rule Them All: Using Distributed Memory with Cache Coherence over RDMA. In ADMS.Google ScholarGoogle Scholar
  56. Alex Shamis, Matthew Renzelmann, Stanko Novakovic, Georgios Chatzopoulos, Aleksandar Dragojevic, Dushyanth Narayanan, and Miguel Castro. 2019. Fast General Distributed Transactions with Opacity. In SIGMOD.Google ScholarGoogle Scholar
  57. Yizhou Shan, Shin-Yeh Tsai, and Yiying Zhang. 2017. Distributed shared persistent memory. In SoCC.Google ScholarGoogle Scholar
  58. Rebecca Taft, Essam Mansour, Marco Serafini, Jennie Duggan, Aaron J. Elmore, Ashraf Aboulnaga, Andrew Pavlo, and Michael Stonebraker. 2014. E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing. PVLDB, Vol. 8 (2014).Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Lasse Thostrup, Jan Skrzypczak, Matthias Jasny, Tobias Ziegler, and Carsten Binnig. 2021. DFI: The Data Flow Interface for High-Speed Networks. In SIGMOD.Google ScholarGoogle Scholar
  60. Alexander van Renen, Viktor Leis, Alfons Kemper, Thomas Neumann, Takushi Hashida, Kazuichi Oe, Yoshiyasu Doi, Lilian Harada, and Mitsuru Sato. 2018. Managing Non-Volatile Memory in Database Systems. In SIGMOD.Google ScholarGoogle Scholar
  61. Alexandre Verbitski, Anurag Gupta, Debanjan Saha, James Corey, Kamal Gupta, Murali Brahmadesam, Raman Mittal, Sailesh Krishnamurthy, Sandor Maurice, Tengiz Kharatishvili, and Xiaofeng Bao. 2018. Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes. In SIGMOD.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Jé rô me Vienne, Jitong Chen, Md. Wasi-ur-Rahman, Nusrat S. Islam, Hari Subramoni, and Dhabaleswar K. Panda. 2012. Performance Analysis and Evaluation of InfiniBand FDR and 40GigE RoCE on HPC and Cloud Computing Systems. In HOTI.Google ScholarGoogle Scholar
  63. Tianzheng Wang and Ryan Johnson. 2014. Scalable Logging through Emerging Non-Volatile Memory. PVLDB, Vol. 7, 10 (2014).Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Ziqi Wang, Andrew Pavlo, Hyeontaek Lim, Viktor Leis, Huanchen Zhang, Michael Kaminsky, and David G. Andersen. 2018. Building a Bw-Tree Takes More Than Just Buzz Words. In SIGMOD.Google ScholarGoogle Scholar
  65. Xingda Wei, Sijie Shen, Rong Chen, and Haibo Chen. 2017. Replication-driven Live Reconfiguration for Fast Distributed Transaction Processing. In USENIX.Google ScholarGoogle Scholar
  66. Xingda Wei, Jiaxin Shi, Yanzhe Chen, Rong Chen, and Haibo Chen. 2015. Fast in-memory transaction processing using RDMA and HTM. In SOSP.Google ScholarGoogle Scholar
  67. Erfan Zamanian, Carsten Binnig, Tim Kraska, and Tim Harris. 2016. The End of a Myth: Distributed Transactions Can Scale. CoRR, Vol. abs/1607.00655 (2016).Google ScholarGoogle Scholar
  68. Erfan Zamanian, Carsten Binnig, and Abdallah Salama. 2015. Locality-aware Partitioning in Parallel Database Systems. In SIGMOD.Google ScholarGoogle Scholar
  69. Erfan Zamanian, Julian Shun, Carsten Binnig, and Tim Kraska. 2021. Chiller: Contention-centric Transaction Execution and Data Partitioning for Modern Networks. SIGMOD Rec., Vol. 50, 1 (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Qizhen Zhang, Philip A. Bernstein, Daniel S. Berger, and Badrish Chandramouli. 2021. Redy: Remote Dynamic Memory Cache. CoRR (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Xinjing Zhou, Joy Arulraj, Andrew Pavlo, and David Cohen. 2021. Spitfire: A Three-Tier Buffer Manager for Volatile and Non-Volatile Memory. In SIGMOD.Google ScholarGoogle Scholar
  72. Tobias Ziegler, Viktor Leis, and Carsten Binnig. 2020. RDMA Communciation Patterns. Datenbank-Spektrum, Vol. 20 (2020).Google ScholarGoogle ScholarCross RefCross Ref
  73. Tobias Ziegler, Sumukha Tumkur Vani, Carsten Binnig, Rodrigo Fonseca, and Tim Kraska. 2019. Designing Distributed Tree-based Index Structures for Fast RDMA-capable Networks. In SIGMOD.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ScaleStore: A Fast and Cost-Efficient Storage Engine using DRAM, NVMe, and RDMA

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
        June 2022
        2597 pages
        ISBN:9781450392495
        DOI:10.1145/3514221

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 June 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader