skip to main content
10.1145/3423211.3425672acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

JellyFish: A Fast Skip List with MVCC

Published: 11 December 2020 Publication History

Abstract

Multi-version concurrency control is a widely employed concurrency control mechanism, as it allows non-blocking accesses while providing isolation among transactions. However, maintaining multiple versions increases the latency for both point lookups and ranged retrievals because of the overhead in finding the right version. In particular, the append-only skip list---widely used in the state-of-the-art key-value stores (KVS)---shows a significant performance degradation due to its append-only nature.
This paper presents a novel skip list implementation called JellyFish. JellyFish reduces the overhead of multi-version concurrency control by separating the per-key updates from the key indexing. We implement our design on top of RocksDB and compare it against a wide variety of data structures. Our evaluation with micro-benchmarks and real-world workloads show that we not only improve the throughput by up to 93%, but also reduce the latency of update operations by up to 42%.

References

[1]
Sarwar Alam, Humaira Kamal, and Alan Wagner. 2014. A Scalable Distributed Skip List for Range Queries. In Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing (Vancouver, BC, Canada) (HPDC '14). Association for Computing Machinery, New York, NY, USA, 315--318. https://doi.org/10.1145/2600212.2600712
[2]
Apache. [n.d.]. Apache CouchDB. https://en.wikipedia.org/wiki/Apache_CouchDB.
[3]
Apache. 2019. Cassandra. http://cassandra.apache.org/.
[4]
Ardb. 2013. Ardb. https://github.com/yinqiwen/ardb.
[5]
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. In ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '12, London, United Kingdom, June 11-15, 2012, Peter G. Harrison, Martin F. Arlitt, and Giuliano Casale (Eds.). ACM, 53--64. https://doi.org/10.1145/2254756.2254766
[6]
Oana Balmau, Florin Dinu, Willy Zwaenepoel, Karan Gupta, Ravishankar Chandhiramoorthi, and Diego Didona. 2019. SILK: Preventing Latency Spikes in Log-Structured Merge Key-Value Stores. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 753--766. https://www.usenix.org/conference/atc19/presentation/balmau
[7]
Oana Balmau, Rachid Guerraoui, Vasileios Trigonakis, and Igor Zablotchi. 2017. FloDB: Unlocking Memory in Persistent Key-Value Stores. In Proceedings of the Twelfth European Conference on Computer Systems (Belgrade, Serbia) (EuroSys '17). Association for Computing Machinery, New York, NY, USA, 80--94. https://doi.org/10.1145/3064176.3064193
[8]
Dmitry Basin, Edward Bortnikov, Anastasia Braginsky, Guy Golan-Gueta, Eshcar Hillel, Idit Keidar, and Moshe Sulamy. 2017. KiWi: A Key-Value Map for Scalable Real-Time Analytics. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Austin, Texas, USA) (PPoPP '17). Association for Computing Machinery, New York, NY, USA, 357--369. https://doi.org/10.1145/3018743.3018761
[9]
Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Elizabeth O'Neil, and Patrick O'Neil. 1995. A critique of ANSI SQL isolation levels. In ACM SIGMOD Record, Vol. 24. ACM, NewYork, NY, USA, 1--10.
[10]
Philip A Bernstein and Nathan Goodman. 1981. Concurrency control in distributed database systems. ACM Computing Surveys (CSUR) 13, 2 (1981), 185--221.
[11]
Mihaela A. Bornea, Orion Hodson, Sameh Elnikety, and Alan Fekete. 2011. One-copy serializability with snapshot isolation under the hood. In 2011 IEEE 27th International Conference on Data Engineering. IEEE. https://doi.org/10.1109/icde.2011.5767897
[12]
Edward Bortnikov, Anastasia Braginsky, Eshcar Hillel, Idit Keidar, and Gali Sheffi. 2018. Accordion: Better Memory Organization for LSM Key-Value Stores. Proc. VLDB Endow. 11, 12(Aug. 2018), 1863-1875. https://doi.org/10.14778/3229863.3229873
[13]
Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. 2009. Serializable Isolation for Snapshot Databases. ACM Trans. Database Syst. 34, 4, Article 20 (Dec. 2009), 42 pages. https://doi.org/10.1145/1620585.1620587
[14]
Zhichao Cao, Siying Dong, Sagar Vemuri, and David H. C. Du. 2020. Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook. In 18th USENIX Conference on File and Storage Technologies, FAST 2020, Santa Clara, CA, USA, February 24-27, 2020, Sam H. Noh and Brent Welch (Eds.). USENIX Association, 209--223. https://www.usenix.org/conference/fast20/presentation/cao-zhichao
[15]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (Indianapolis, Indiana, USA) (SoCC '10). Association for Computing Machinery, New York, NY, USA, 143--154. https://doi.org/10.1145/1807128.1807152
[16]
Tyler Crain, Vincent Gramoli, and Michel Raynal. 2013. No hot spot non-blocking skip list. In 2013 IEEE 33rd International Conference on Distributed Computing Systems. IEEE, 196--205.
[17]
Henry Daly, Ahmed Hassan, Michael F. Spear, and Roberto Palmieri. 2018. NUMASK: High Performance Scalable Skip List for NUMA. In 32nd International Symposium on Distributed Computing (DISC 2018) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 121), Ulrich Schmid and Josef Widder (Eds.). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 18:1-18:19. https://doi.org/10.4230/LIPIcs.DISC.2018.18
[18]
Ian Dick, Alan Fekete, and Vincent Gramoli. 2016. A skip list for multicore. Concurrency and Computation: Practice and Experience 29, 4 (May 2016), e3876. https://doi.org/10.1002/cpe.3876
[19]
Facebook. [n.d.]. Under the Hood: Building and open-sourcing RocksDB. https://www.facebook.com/notes/facebook-engineering/under-the-hood-building-and-open-sourcing-rocksdb/10151822347683920/.
[20]
Facebook. 2012. RocksDB. https://github.com/facebook/rocksdb.
[21]
Facebook. 2019. Linkbench. https://github.com/facebookarchive/linkbench.
[22]
Facebook. 2019. MyRocks. https://github.com/facebook/mysql-5.6/wiki.
[23]
Facebook. 2019. RocksDB - MemTable. https://github.com/facebook/rocksdb/wiki/MemTable.
[24]
Mikhail Fomitchev and Eric Ruppert. 2004. Lock-free linked lists and skip lists. In Proceedings of the Twenty-Third Annual ACM Symposium on Principles of Distributed Computing, PODC 2004, St. John's, Newfoundland, Canada, July 25-28, 2004. 50--59. https://doi.org/10.1145/1011767.1011776.
[25]
Keir Fraser. 2004. Practical lock-freedom. Technical Report UCAM-CL-TR-579. University of Cambridge, Computer Laboratory. https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-579.pdf
[26]
Guy Golan-Gueta, Edward Bortnikov, Eshcar Hillel, and Idit Keidar. 2015. Scaling concurrent log-structured data stores. In Proceedings of the Tenth European Conference on Computer Systems, EuroSys 2015, Bordeaux, France, April 21-24, 2015. ACM, NewYork, NY, USA, 32:1-32:14.
[27]
Google. 2011. LevelDB. https://github.com/google/leveldb.
[28]
Youil Han, Bryan S. Kim, Jeseong Yeon, Sungjin Lee, and Eunji Lee. 2019. TeksDB: Weaving Data Structures for a High-Performance Key-Value Store. Proc. ACM Meats. Anal. Comput. Syst. 3, 1, Article 8 (March 2019), 23 pages. https://doi.org/10.1145/3322205.3311079
[29]
Timothy L. Harris. 2001. A Pragmatic Implementation of Non-blocking Linked-Lists. In DISC (Lecture Notes in Computer Science, Vol. 2180), Jennifer L. Welch (Ed.). Springer, 300--314.
[30]
Red Hat. 2020. Ceph. https://github.com/ceph/ceph/blob/master/src/common/options.cc#l4385.
[31]
Maurice Herlihy, Yossi Lev, Victor Luchangco, and Nir Shavit. 2006. A Provably Correct Scalable Concurrent Skip List. In Proceedings of the 10th International Conference on Principles of Distributed Systems.
[32]
Gui Huang, Xuntao Cheng, Jianying Wang, Yujie Wang, Dengcheng He, Tieying Zhang, Feifei Li, Sheng Wang, Wei Cao, and Qiang Li. 2019. X-Engine: An Optimized Storage Engine for Large-Scale E-Commerce Transaction Processing. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD 19). Association for Computing Machinery, New York, NY, USA, 651--665. https://doi.org/10.1145/3299869.3314041
[33]
HyperDex. 2011. HyperLevelDB. https://github.com/rescrv/HyperLevelDB.
[34]
IBM. [n.d.]. DB2: Currently committed semantics improve concurrency. https://docs.oracle.com/cd/E17076_02/html/programmer_reference/transapp_read.html.
[35]
Intel. 2019. Optane Technology. https://www.intel.com/content/www/us/en/architecture-and-technology/intel-optane-technology.html.
[36]
Jaeho Kim, Ajit Mathew, Sanidhya Kashyap, Madhava Krishnan Ramanathan, and Changwoo Min. 2019. MV-RLU: Scaling Read-Log-Update with Multi-Versioning. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (Providence, RI, USA) (ASPLOS '19). Association for Computing Machinery, New York, NY, USA, 779--792. https://doi.org/10.1145/3297858.3304040
[37]
Per-Åke Larson, Spyros Blanas, Cristian Diaconu, Craig Freedman, Jignesh M. Patel, and Mike Zwilling. 2011. High-Performance Concurrency Control Mechanisms for Main-Memory Databases. Proc. VLDB Endow. 5, 4 (Dec. 2011), 298--309. https://doi.org/10.14778/2095686.2095689
[38]
Doug Lea. [n.d.]. Doug Lea's Home Page. https://gee.cs.oswego.edu.
[39]
Linux. 20l5. perf: Linux profiling with performance counters. https://perf.wiki.kernel.org/index.php/Main_Page.
[40]
LMDB. 2011. LMDB. https://symas.com/lmdb/.
[41]
Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Hariharan Gopalakrishnan, Andrea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. 2017. WiscKey: Separating keys from values in SSD-conscious storage. ACM Transactions on Storage (TOS) 13, 1 (2017), 5.
[42]
MariaDB. 2019. MyRocks for MariaDB. https://mariadb.com/kb/en/myrocks/.
[43]
Alexander Merritt, Ada Gavrilovska, Yuan Chen, and Dejan Milojicic. 2017. Concurrent log-structured memory for many-core key-value stores. Proceedings of the VLDB Endowment 11, 4 (2017), 458--471.
[44]
Microsoft. [n.d.]. Snapshot Isolation in SQL Server. https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/sql/snapshot-isolation-in-sql-server?redirectedfrom=MSDN.
[45]
MIT. 2012. MemSQL. https://github.com/facebook/rocksdb.
[46]
MongoDB. [n.d.]. MongoDB CTO: How our new WiredTiger storage engine will earn its stripes. https://www.zdnet.com/article/mongodb-cto-how-our-new-wiredtiger-storage-engine-will-earn-itsstripes/.
[47]
MongoDB. 2016. WiredTiger. http://www.wiredtiger.com/.
[48]
Thomas Neumann, Tobias Mühlbauer, and Alfons Kemper. 2015. Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (Melbourne, Victoria, Australia) (SIGMOD '15). Association for Computing Machinery, New York, NY, USA, 677--689. https://doi.org/10.1145/2723372.2749436
[49]
Patrick E. O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth J. O'Neil. 1996. The Log-Structured Merge-Tree (LSM-Tree). Acta Inf. 33, 4 (1996), 351--385. https://doi.org/10.1007/s002360050048
[50]
Oracle. [n.d.]. Berkeley DB Transactional Data Store Applications. https://docs.oracle.com/cd/E17076_02/html/programmer_reference/transapp_read.html.
[51]
Daniel Peng and Frank Dabek. 2010. Large-scale Incremental Processing Using Distributed Transactions and Notifications. In 9th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2010, October 4-6, 2010, Vancouver, BC, Canada, Proceedings. 251--264. http://www.usenix.org/events/osdi10/tech/full_papers/Peng.pdf.
[52]
Markus Pilman, Kevin Bocksrocker, Lucas Braun, Renato Marroquin, and Donald Kossmann. 2017. Fast Scans on Key-Value Stores. PVLDB 10, 11 (2017), 1526--1537. http://www.vldb.org/pvldb/vol10/p1526-bocksrocker.pdf.
[53]
William Pugh. 1990. A Skip List Cookbook. Technical Report.
[54]
William Pugh. 1990. Skip Lists: A Probabilistic Alternative to Balanced Trees. Commun. ACM 33, 6 (1990), 668--676.
[55]
Pandian Raju, Rohan Kadekodi, Vijay Chidambaram, and Ittai Abraham. 2017. PebblesDB: Building Key-Value Stores using Fragmented Log-Structured Merge Trees. In Proceedings of the 26th Symposium on Operating Systems Principles, Shanghai, China, October 28-31, 2017. ACM, NewYork, NY, USA, 497--514.
[56]
SciPy. [n.d.]. numpy.random.zipf. https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.zipf.html.
[57]
Nir Shavit and Itay Lotan. 2000. Skiplist-Based Concurrent Priority Queues. In Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), Cancun, Mexico, May 1-5, 2000. 263--268. https://doi.org/10.1109/IPDPS.2000.845994.
[58]
Andrew Shewmaker. 2013. A kernel skiplist implementation. https://lwn.net/Articles/551896/.
[59]
Yihan Sun, Daniel Ferizovic, and Guy E. Belloch. 2018. PAM: Parallel Augmented Maps. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Vienna, Austria) (PPoPP '18). Association for Computing Machinery, New York, NY, USA, 290--304. https://doi.org/10.1145/3178487.3178509
[60]
Wikipedia. [n.d.]. Snapshot Isolation. https://en.wikipedia.org/wiki/Snapshot_isolation.
[61]
Jingtian Zhang, Sai Wu, Zeyuan Tan, Gang Chen, Zhushi Cheng, Wei Cao, Yusong Gao, and Xiaojie Feng. 2019. S3: A Scalable in-Memory Skip-List Index for Key-Value Store. Proc. VLDB Endow. 12, 12 (Aug. 2019), 2183--2194. https://doi.org/10.14778/3352063.3352134

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
Middleware '20: Proceedings of the 21st International Middleware Conference
December 2020
455 pages
ISBN:9781450381536
DOI:10.1145/3423211
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 December 2020

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

Middleware '20
Sponsor:
Middleware '20: 21st International Middleware Conference
December 7 - 11, 2020
Delft, Netherlands

Acceptance Rates

Overall Acceptance Rate 203 of 948 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 466
    Total Downloads
  • Downloads (Last 12 months)52
  • Downloads (Last 6 weeks)3
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media