Abstract
Non-volatile memory (NVM) technologies as persistent memory are promising candidates to complement or replace DRAM for building future memory systems, due to having the advantages of high density, low power, and non-volatility. In main memory systems, hashing index structures are fundamental building blocks to provide fast query responses. However, hashing index structures originally designed for dynamic random access memory (DRAM) become inefficient for persistent memory due to new challenges including hardware limitations of NVM and the requirement of data consistency. To address these challenges, this article proposes level hashing, a write-optimized and high-performance hashing index scheme with low-overhead consistency guarantee and cost-efficient resizing. Level hashing provides a sharing-based two-level hash table, which achieves constant-scale worst-case time complexity for search, insertion, deletion, and update operations, and rarely incurs extra NVM writes. To guarantee the consistency with low overhead, level hashing leverages log-free consistency schemes for deletion, insertion, and resizing operations, and an opportunistic log-free scheme for update operation. To cost-efficiently resize this hash table, level hashing leverages an in-place resizing scheme that only needs to rehash 1/3 of buckets instead of the entire table to expand a hash table and rehash 2/3 of buckets to shrink a hash table, thus significantly improving the resizing performance and reducing the number of rehashed buckets. Extensive experimental results show that the level hashing speeds up insertions by 1.4×−3.0×, updates by 1.2×−2.1×, expanding by over 4.3×, and shrinking by over 1.4× while maintaining high search and deletion performance compared with start-of-the-art hashing schemes.
- Hiroyuki Akinaga and Hisashi Shima. 2010. Resistive random access memory (ReRAM) based on metal oxides. Proc. IEEE 98, 12 (2010), 2237--2251.Google ScholarCross Ref
- Dmytro Apalkov, Alexey Khvalkovskiy, Steven Watts, Vladimir Nikitin, Xueti Tang, Daniel Lottis, Kiseok Moon, Xiao Luo, Eugene Chen, Adrian Ong, Alexander Driskill-Smith, and Mohamad Krounbi. 2013. Spin-transfer torque magnetic random access memory (STT-MRAM). ACM J. Emerging Technol. Comput. Syst. (JETC) 9, 2 (2013), 13. Google ScholarDigital Library
- Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. In ACM SIGMETRICS Performance Evaluation Review, Vol. 40. ACM, 53--64. Google ScholarDigital Library
- Jon Louis Bentley. 1975. Multidimensional binary search trees used for associative searching. Commun. ACM 18, 9 (1975), 509--517. Google ScholarDigital Library
- Alex D. Breslow, Dong Ping Zhang, Joseph L. Greathouse, Nuwan Jayasena, and Dean M. Tullsen. 2016. Horton tables: Fast hash tables for in-memory data-intensive computing. In USENIX Annual Technical Conference (USENIX ATC). 281--294. Google ScholarDigital Library
- John Byers, Jeffrey Considine, and Michael Mitzenmacher. 2003. Simple load balancing for distributed hash tables. In Proceedings of the International Workshop on Peer-to-Peer Systems (IPTPS). 80--88.Google ScholarCross Ref
- Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, and Marcos K. Aguilera. 2017. Black-box concurrent data structures for NUMA architectures. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 207--221. Google ScholarDigital Library
- Shimin Chen, Phillip B. Gibbons, and Suman Nath. 2011. Rethinking database algorithms for phase change memory. In Proceedings of the Biennial Conference on Innovative Data Systems Research (CIDR). 21--31.Google Scholar
- Shimin Chen and Qin Jin. 2015. Persistent b+-trees in non-volatile main memory. Proceedings of the VLDB Endowment 8, 7 (2015), 786--797. Google ScholarDigital Library
- Joel Coburn, Adrian M. Caulfield, Ameen Akel, Laura M. Grupp, Rajesh K. Gupta, Ranjit Jhala, and Steven Swanson. 2011. NV-Heaps: Making persistent objects fast and safe with next-generation, non-volatile memories. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2011). 105--117. Google ScholarDigital Library
- Douglas Comer. 1979. Ubiquitous B-tree. ACM Comput. Surv. (CSUR) 11, 2 (1979), 121--137. Google ScholarDigital Library
- Jeremy Condit, Edmund B. Nightingale, Christopher Frost, Engin Ipek, Benjamin Lee, Doug Burger, and Derrick Coetzee. 2009. Better I/O through byte-addressable, persistent memory. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP). 133--146.Google ScholarDigital Library
- Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC). 143--154. Google ScholarDigital Library
- Biplob Debnath, Alireza Haghdoost, Asim Kadav, Mohammed G. Khatib, and Cristian Ungureanu. 2015. Revisiting hash table design for phase change memory. In Proceedings of the 3rd Workshop on Interactions of NVM/FLASH with Operating Systems and Workloads (INFLOW). 18--26. Google ScholarDigital Library
- David J. DeWitt, Randy H. Katz, Frank Olken, Leonard D. Shapiro, Michael R. Stonebraker, and David A. Wood. 1984. Implementation techniques for main memory database systems. In Proceedings of the ACM International Conference on Management of Data (SIGMOD). 1--8. Google ScholarDigital Library
- Subramanya R Dulloor, Sanjay Kumar, Anil Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. 2014. System software for persistent memory. In Proceedings of the 9th European Conference on Computer Systems (EuroSys). 1--15.Google ScholarDigital Library
- Bin Fan, David G. Andersen, and Michael Kaminsky. 2013. MemC3: Compact and concurrent memcache with dumber caching and smarter hashing. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI). 371--384. Google ScholarDigital Library
- Hui Gao, Jan Friso Groote, and Wim H. Hesselink. 2004. Almost wait-free resizable hashtables. In Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS).Google Scholar
- Hector Garcia-Molina and Kenneth Salem. 1992. Main memory database systems: An overview. IEEE Trans. Knowl. Data Eng. 4, 6 (1992), 509--516. Google ScholarDigital Library
- Jorge Guerra, Leonardo Mármol, Daniel Campello, Carlos Crespo, Raju Rangaswami, and Jinpeng Wei. 2012. Software persistent memory. In USENIX Annual Technical Conference (USENIX ATC). 1--15.Google Scholar
- Path Hashing. 2017. Path Hashing: A Write-friendly Hashing Scheme for Non-volatile Memory Systems. Retrieved from https://github.com/Pfzuo/Path-Hashing.Google Scholar
- Maurice Herlihy, Nir Shavit, and Moran Tzafrir. 2008. Hopscotch hashing. In Proceedings of the International Symposium on Distributed Computing (DISC). 350--364. Google ScholarDigital Library
- Deukyeon Hwang, Wook-Hee Kim, Youjip Won, and Beomseok Nam. 2018. Endurable transient inconsistency in byte-addressable persistent b+-tree. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST). 187--200.Google Scholar
- Intel. 2015. Introducing Intel Optane Technology - Bringing 3D XPoint Memory to Storage and Memory Products. Retrieved from https://newsroom.intel.com/press-kits/introducing-intel-optane-technology-bringing-3d-xpoint-memory-to-storage-and-memory-products/.Google Scholar
- Intel. 2017. Intel Threading Building Blocks. Retrieved from https://www.threadingbuildingblocks.org/.Google Scholar
- Intel. 2018. Intel Architecture Instruction Set Extensions Programming Reference. Retrieved from https://software.intel.com/en-us/isa-extensions.Google Scholar
- Java. 2018. Java HashMap. Retrieved from http://www.docjar.com/html/api/java/util/HashMap.java.html.Google Scholar
- Wook-Hee Kim, Jinwoong Kim, Woongki Baek, Beomseok Nam, and Youjip Won. 2016. NVWAL: Exploiting NVRAM in write-ahead logging. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 385--398. Google ScholarDigital Library
- Donald E. Knuth. 1998. The Art of Computer Programming, Volume 3: (2Nd Ed.) Sorting and Searching. Addison Wesley Longman Publishing Co., Inc., Redwood City, CA. Google ScholarDigital Library
- Onur Kocberber, Boris Grot, Javier Picorel, Babak Falsafi, Kevin Lim, and Parthasarathy Ranganathan. 2013. Meet the walkers: Accelerating index traversals for in-memory databases. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 468--479. Google ScholarDigital Library
- Chunbo Lai, Song Jiang, Liqiong Yang, Shiding Lin, Guangyu Sun, Zhenyu Hou, Can Cui, and Jason Cong. 2015. Atlas: Baidu’s key-value storage system for cloud data. In Proceedings of the 31st International Conference on Massive Storage Systems and Technology (MSST).Google ScholarCross Ref
- Se Kwon Lee, K. Hyun Lim, Hyunsub Song, Beomseok Nam, and Sam H. Noh. 2017. WORT: Write optimal radix tree for persistent memory storage systems. In Proceeding of the USENIX Conference on File and Storage Technologies (FAST). 257--270. Google ScholarDigital Library
- Sheng Li, Hyeontaek Lim, Victor W. Lee, Jung Ho Ahn, Anuj Kalia, Michael Kaminsky, David G. Andersen, O. Seongil, Sukhan Lee, and Pradeep Dubey. 2015. Architecting to achieve a billion requests per second throughput on a single key-value store server platform. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA). 476--488.Google ScholarDigital Library
- Xiaozhou Li, David G. Andersen, Michael Kaminsky, and Michael J. Freedman. 2014. Algorithmic improvements for fast concurrent cuckoo hashing. In Proceedings of the 9th European Conference on Computer Systems (EuroSys). 1--14 Google ScholarDigital Library
- Libcuckoo. 2018. Libcuckoo: A high-performance, concurrent hash table. Retrieved from https://github.com/efficient/libcuckoo.Google Scholar
- Hyeontaek Lim, Michael Kaminsky, and David G. Andersen. 2017. Cicada: Dependably fast multi-core in-memory transactions. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD). 21--35.Google Scholar
- Mengxing Liu, Mingxing Zhang, Kang Chen, Xuehai Qian, Yongwei Wu, Weimin Zheng, and Jinglei Ren. 2017. DUDETM: Building durable transactions with decoupling for persistent memory. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 329--343.Google ScholarDigital Library
- Yujie Liu, Kunlong Zhang, and Michael Spear. 2014. Dynamic-sized nonblocking hash tables. In Proceedings of the 2014 ACM Symposium on Principles of Distributed Computing (PODC). 242--251. Google ScholarDigital Library
- Zhiyu Liu, Irina Calciu, Maurice Herlihy, and Onur Mutlu. 2017. Concurrent data structures for near-memory computing. In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 235--245.Google ScholarDigital Library
- Memcached. 2018. Memcached. Retrieved from https://memcached.org/.Google Scholar
- W. Mueller, G. Aichmayr, W. Bergner, E. Erben, T. Hecht, C. Kapteyn, A. Kersch, S. Kudelka, F. Lau, J. Luetzen, A. Orth, J. Nuetzel, T. Schloesser, A. Scholz, U. Schroeder, A. Sieck, A. Spitzer, M. Strasser, P-F. Wang, S. Wege, and R. Weis. 2005. Challenges for the DRAM cell scaling to 40nm. In IEEE International Electron Devices Meeting (IEDM).Google Scholar
- Dushyanth Narayanan and Orion Hodson. 2012. Whole-system persistence. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 401--410. Google ScholarDigital Library
- Ismail Oukid, Johan Lasperas, Anisoara Nica, Thomas Willhalm, and Wolfgang Lehner. 2016. FPTree: A hybrid SCM-DRAM persistent and concurrent b-tree for storage class memory. In Proceedings of the International Conference on Management of Data (SIGMOD). 371--386.Google ScholarDigital Library
- Hewlett Packard. 2015. Quartz: A DRAM-based performance emulator for NVM. Retrieved from https://github.com/HewlettPackard/quartz.Google Scholar
- Rasmus Pagh and Flemming Friche Rodler. 2001. Cuckoo hashing. In Proceedings of the European Symposium on Algorithms (ESA). 1--26. Google ScholarDigital Library
- Nick Piggin. 2008. ddds: “dynamic dynamic data structure” algorithm, for adaptive dcache hash table sizing. Linux kernel mailing list. Retrieved from https://lwn.net/Articles/302132/.Google Scholar
- Boris Pittel. 1987. Linear probing: The probable largest search time grows logarithmically with the number of records. J. Algorithms 8, 2 (1987), 236--249. Google ScholarDigital Library
- Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA). 24--33.Google Scholar
- Redis. 2018. Redis. Retrieved from https://redis.io/.Google Scholar
- Stephen M. Rumble, Ankita Kejriwal, and John K. Ousterhout. 2014. Log-structured memory for DRAM-based storage. In Proceeding of the USENIX Conference on File and Storage Technologies (FAST). 1--16.Google Scholar
- Jihye Seo, Wook-Hee Kim, Woongki Baek, Beomseok Nam, and Sam H. Noh. 2017. Failure-Atomic slotted paging for persistent memory. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 91--104.Google Scholar
- Ori Shalev and Nir Shavit. 2006. Split-ordered lists: Lock-free extensible hash tables. J. ACM 53, 3 (2006), 379--405. Google ScholarDigital Library
- Julian Shun and Guy E. Blelloch. 2014. Phase-concurrent hash tables for determinism. In Proceedings of the ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 96--107.Google Scholar
- Yuanyuan Sun, Yu Hua, Song Jiang, Qiuyu Li, Shunde Cao, and Pengfei Zuo. 2017. SmartCuckoo: A fast and cost-efficient hashing index scheme for cloud storage systems. In Proceedings of the 2017 USENIX Annual Technical Conference (USENIX ATC). 553--565.Google Scholar
- Shyamkumar Thoziyoor, Jung Ho Ahn, Matteo Monchiero, Jay B. Brockman, and Norman P. Jouppi. 2008. A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies. In International Symposium on Computer Architecture (ISCA). 51--62.Google Scholar
- Josh Triplett, Paul E. McKenney, and Jonathan Walpole. 2011. Resizable, scalable, concurrent hash tables via relativistic programming. In USENIX Annual Technical Conference (USENIX ATC). 1--14. Google ScholarDigital Library
- Shivaram Venkataraman, Niraj Tolia, Parthasarathy Ranganathan, and Roy H. Campbell. 2011. Consistent and durable data structures for non-volatile byte-addressable memory. In Proceeding of the USENIX Conference on File and Storage Technologies (FAST). 5.Google Scholar
- Haris Volos, Guilherme Magalhaes, Ludmila Cherkasova, and Jun Li. 2015. Quartz: A lightweight performance emulator for persistent memory software. In Proceedings of the 16th Annual Middleware Conference (Middleware). 37--49. Google ScholarDigital Library
- Haris Volos, Andres Jaan Tack, and Michael M. Swift. 2011. Mnemosyne: Lightweight persistent memory. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 91--103.Google Scholar
- H-S. Philip Wong, Simone Raoux, SangBum Kim, Jiale Liang, John P. Reifenberg, Bipin Rajendran, Mehdi Asheghi, and Kenneth E. Goodson. 2010. Phase change memory. Proc. IEEE 98, 12 (2010), 2201--2227.Google ScholarCross Ref
- Fei Xia, Dejun Jiang, Jin Xiong, and Ninghui Sun. 2017. HiKV: A hybrid index key-value store for DRAM-NVM memory systems. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC). 349--362 Google ScholarDigital Library
- Jian Xu and Steven Swanson. 2016. NOVA: A log-structured file system for hybrid volatile/non-volatile main memories. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). 323--338. Google ScholarDigital Library
- Jun Yang, Qingsong Wei, Cheng Chen, Chundong Wang, Khai Leong Yong, and Bingsheng He. 2015. NV-Tree: Reducing consistency cost for NVM-based single level systems. In Proceeding of the USENIX Conference on File and Storage Technologies (FAST). 167--181. Google ScholarDigital Library
- Xiangyao Yu, George Bezerra, Andrew Pavlo, Srinivas Devadas, and Michael Stonebraker. 2014. Staring into the abyss: An evaluation of concurrency control with one thousand cores. Proceedings of the VLDB Endowment 8, 3 (2014), 209--220. Google ScholarDigital Library
- Kai Zhang, Kaibo Wang, Yuan Yuan, Lei Guo, Rubao Lee, and Xiaodong Zhang. 2015. Mega-KV: A case for GPUs to maximize the throughput of in-memory key-value stores. Proceedings of the VLDB Endowment 8, 11 (2015), 1226--1237. Google ScholarDigital Library
- Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. A durable and energy efficient main memory using phase change memory technology. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA). 14--23. Google ScholarDigital Library
- Pengfei Zuo and Yu Hua. 2017. A write-friendly hashing scheme for non-volatile memory systems. In Proceedings of the 33rd International Conference on Massive Storage Systems and Technology (MSST). 1--10.Google Scholar
- Pengfei Zuo and Yu Hua. 2018. A write-friendly and cache-optimized hashing scheme for non-volatile memory systems. IEEE Trans. Parallel Distrib. Syst. 29, 5 (2018), 985--998.Google ScholarCross Ref
- Pengfei Zuo, Yu Hua, and Jie Wu. 2018. Write-optimized and high-performance hashing index scheme for persistent memory. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI). 461--476.Google Scholar
Index Terms
- Level Hashing: A High-performance and Flexible-resizing Persistent Hashing Index Structure
Recommendations
Halo: A Hybrid PMem-DRAM Persistent Hash Index with Fast Recovery
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataHash index, a fundamental component in many data management systems, can benefit from the emerging persistent memory (PMem) to achieve high performance and instant recovery. However, existing persistent hash indexes are suboptimal in at least three ...
Recipe: converting concurrent DRAM indexes to persistent-memory indexes
SOSP '19: Proceedings of the 27th ACM Symposium on Operating Systems PrinciplesWe present Recipe, a principled approach for converting concurrent DRAM indexes into crash-consistent indexes for persistent memory (PM). The main insight behind Recipe is that isolation provided by a certain class of concurrent in-memory indexes can be ...
RACE: One-sided RDMA-conscious Extendible Hashing
Memory disaggregation is a promising technique in datacenters with the benefit of improving resource utilization, failure isolation, and elasticity. Hashing indexes have been widely used to provide fast lookup services in distributed memory systems. ...
Comments