skip to main content
10.1145/3332466.3374526acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Oak: a scalable off-heap allocated key-value map

Published: 19 February 2020 Publication History

Abstract

Efficient ordered in-memory key-value (KV-)maps are paramount for the scalability of modern data platforms. In managed languages like Java, KV-maps face unique challenges due to the high overhead of garbage collection (GC).
We present Oak, a scalable concurrent KV-map for environments with managed memory. Oak offloads data from the managed heap, thereby reducing GC overheads and improving memory utilization. An important consideration in this context is the programming model since a standard object-based API entails moving data between the on- and off-heap spaces. In order to avoid the cost associated with such movement, we introduce a novel zero-copy (ZC) API. It provides atomic get, put, remove, and various conditional put operations such as compute (in-situ update).
We have released an open-source Java version of Oak. We further present a prototype Oak-based implementation of the internal multidimensional index in Apache Druid. Our experiments show that Oak is often 2x faster than Java's state-of-the-art concurrent skiplist.

References

[1]
2014. Apache HBase, a distributed, scalable, big data store, http://hbase.apache.org/. (April 2014).
[2]
2018. Druid DataSketches extension, https://druid.apache.org/docs/latest/development/extensions-core/datasketches-extension.html.
[3]
2018. Druid Integration with Oak. https://github.com/apache/incubator-druid/issues/5698.
[4]
2018. Elasticsearch: Open Source Search and Analytics. https://elastic.col.
[5]
2018. HBase Offheap write path. https://hbase.apache.org/book.html#regionserver.offheap.writepath.
[6]
2018. In-Memory Analytics Market worth 3.85 Billion USD by 2022 (retrieved October 2018). https://www.marketsandmarkets.com/PressReleases/in-memory-analytics.asp.
[7]
2018. Memcached, an open source, high-performance, distributed memory object caching system, https://memcached.org/.
[8]
2018. Off-heap memtables in Cassandra 2.1. https://www.datastax.com/dev/blog/off-heap-memtables-in-cassandra-2-1.
[9]
2018. Offheap read-path in production the Alibaba story. https://blog.cloudera.com/blog/2017/03/.
[10]
Yehuda Afek, Haim Kaplan, Boris Korenfeld, Adam Morrison, and Robert E. Tarjan. 2012. CBTree: A Practical Concurrent Self-adjusting Search Tree. In Proceedings of the 26th International Conference on Distributed Computing (DISC'12). Springer-Verlag, Berlin, Heidelberg, 1--15.
[11]
Maya Arbel and Hagit Attiya. 2014. Concurrent Updates with RCU: Search Tree As an Example. In Proceedings of the 2014 ACM Symposium on Principles of Distributed Computing (PODC '14). ACM, New York, NY, USA, 196--205.
[12]
Avoiding Full GC 2011. https://www.slideshare.net/cloudera/hbase-hug-presentation.
[13]
Dmitry Basin, Edward Bortnikov, Anastasia Braginsky, Guy Golan-Gueta, Eshcar Hillel, Idit Keidar, and Moshe Sulamy. 2017. KiWi: A Key-Value Map for Scalable Real-Time Analytics. In PPoPP'17. 13.
[14]
Edward Bortnikov, Anastasia Braginsky, Eshcar Hillel, Idit Keidar, and Gali Sheffi. 2018. Accordion: Better Memory Organization for LSM Key-Value Stores. PVLDB 11, 12 (2018), 1863--1875.
[15]
Anastasia Braginsky, Nachshon Cohen, and Erez Petrank. 2016. CBPQ: High Performance Lock-Free Priority Queue. In Euro-Par.
[16]
Anastasia Braginsky and Erez Petrank. 2011. Locality-conscious Lock-free Linked Lists. In ICDCN'11. 107--118.
[17]
Anastasia Braginsky and Erez Petrank. 2012. A Lock-free B+Tree. In SPAA '12. 58--67.
[18]
Nathan G. Bronson, Jared Casper, Hassan Chafi, and Kunle Olukotun. 2010. A Practical Concurrent Binary Search Tree. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '10). ACM, New York, NY, USA, 257--268.
[19]
Trevor Brown and Hillel Avni. 2012. Range queries in non-blocking k-ary search trees. In International Conference On Principles Of Distributed Systems. Springer, 31--45.
[20]
Trevor Brown, Faith Ellen, and Eric Ruppert. 2014. A General Technique for Non-blocking Trees. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '14). ACM, New York, NY, USA, 329--342.
[21]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2008. Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst. 26, 2 (June 2008), 4:1--4:26.
[22]
Tyler Crain, Vincent Gramoli, and Michel Raynal. 2013. A Contention-friendly Binary Search Tree. In Proceedings of the 19th International Conference on Parallel Processing (Euro-Par'13). Springer-Verlag, Berlin, Heidelberg, 229--240.
[23]
Tyler Crain, Vincent Gramoli, and Michel Raynal. 2013. No Hot Spot Non-blocking Skip List. In 2013 IEEE 33rd International Conference on Distributed Computing Systems. 196--205.
[24]
Dana Drachsler, Martin Vechev, and Eran Yahav. 2014. Practical Concurrent Binary Search Trees via Logical Ordering. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '14). ACM, New York, NY, USA, 343--356.
[25]
Druid [n. d.]. (retrieved August 2018). http://druid.io/.
[26]
Druid off-heap [n. d.]. (retrieved August 2018). http://druid.io/docs/latest/operations/performance-faq.html.
[27]
Faith Ellen, Panagiota Fatourou, Eric Ruppert, and Franck van Breugel. 2010. Non-blocking Binary Search Trees. In Proceedings of the 29th ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC '10). ACM, New York, NY, USA, 131--140.
[28]
Keir Fraser. 2004. Practical lock-freedom. Technical Report. University of Cambridge, Computer Laboratory.
[29]
Vincent Gramoli. 2015. More Than You Ever Wanted to Know About Synchronization: Synchrobench, Measuring the Impact of the Synchronization on Concurrent Algorithms. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2015). ACM, New York, NY, USA, 1--10.
[30]
Maurice Herlihy, Yossi Lev, Victor Luchangco, and Nir Shavit. 2006. A provably correct scalable concurrent skip list. In Conference On Principles of Distributed Systems (OPODIS). Citeseer.
[31]
Maurice Herlihy, Yossi Lev, Victor Luchangco, and Nir Shavit. 2007. A Simple Optimistic Skiplist Algorithm. In SIROCCO'07. 15.
[32]
Maurice Herlihy and Nir Shavit. 2008. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers.
[33]
Maurice P. Herlihy and Jeannette M. Wing. 1990. Linearizability: A Correctness Condition for Concurrent Objects. ACM Trans. Program. Lang. Syst. 12, 3 (July 1990), 463--492.
[34]
Java Concurrent Navigable Map 2018. https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentNavigableMap.html.
[35]
Java Concurrent Skip List Map 1993. https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentSkipListMap.html.
[36]
Java Maps, Sets, Lists, Queues and other collections backed by off-heap or on-disk storage 2019. http://www.mapdb.org/.
[37]
Java Stream Package 2018. https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.htmll.
[38]
Anoop Sam John. 2017. Track memstore data size and heap overhead separately. https://issues.apache.org/jira/browse/HBASE-16747.
[39]
Avinash Lakshman and Prashant Malik. 2010. Cassandra: A Decentralized Structured Storage System. SIGOPS Oper. Syst. Rev. 44, 2 (April 2010), 35--40.
[40]
Yu Li, Yu Sun, Anoop Sam John, and Ramkrishna S Vasudevan. 2017. Offheap Read-Path in Production - The Alibaba story. https://blogs.apache.org/hbase/entry/offheap-read-path-in-production.
[41]
Hagar Meir, Dmitry Basin, Edward Bortnikov, Anastasia Braginsky, Idit Keidar, and Gali Sheffi. 2018. Oak - A Key-Value Map for Big Data Analytics. (May 2018). https://hal.archives-ouvertes.fr/hal-01789846 working paper or preprint.
[42]
Aravind Natarajan and Neeraj Mittal. 2014. Fast Concurrent Lock-free Binary Search Trees. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '14). ACM, New York, NY, USA, 317--328.
[43]
Oak Repository 2018. Oak Open-Source Repository. https://github.com/yahoo/Oak.
[44]
Yehoshua Sagiv. 1985. Concurrent Operations on B-trees with Overtaking. In Proceedings of the Fourth ACM SIGACT-SIGMOD Symposium on Principles of Database Systems (PODS '85). ACM, New York, NY, USA, 28--37.
[45]
Alexander Spiegelman, Guy Golan-Gueta, and Idit Keidar. 2016. Transactional Data Structure Libraries. In PLDI '16. 682--696.

Cited By

View all
  • (2024)VisionEmbedder: Bit-Level-Compact Key-Value Storage with Constant Lookup, Rapid Updates, and Rare Failure2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00324(4248-4261)Online publication date: 13-May-2024
  • (2021)KVCGProceedings of the 14th ACM International Conference on Systems and Storage10.1145/3456727.3463779(1-12)Online publication date: 14-Jun-2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '20: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
February 2020
454 pages
ISBN:9781450368186
DOI:10.1145/3332466
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication Notes

Badge change: Article originally badged under Version 1.0 guidelines https://www.acm.org/publications/policies/artifact-review-badging

Publication History

Published: 19 February 2020

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. concurrent data structures
  2. key-value maps
  3. memory management

Qualifiers

  • Research-article

Conference

PPoPP '20

Acceptance Rates

PPoPP '20 Paper Acceptance Rate 28 of 121 submissions, 23%;
Overall Acceptance Rate 230 of 1,014 submissions, 23%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)VisionEmbedder: Bit-Level-Compact Key-Value Storage with Constant Lookup, Rapid Updates, and Rare Failure2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00324(4248-4261)Online publication date: 13-May-2024
  • (2021)KVCGProceedings of the 14th ACM International Conference on Systems and Storage10.1145/3456727.3463779(1-12)Online publication date: 14-Jun-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media