skip to main content
10.1145/3559009.3569652acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

Tiered Hashing: Revamping Hash Indexing under a Unified Memory-Storage Hierarchy

Published: 27 January 2023 Publication History

Abstract

NAND flash-based Solid State Drives (SSDs) provide a promising opportunity to enable the unified memory-storage hierarchy (UMH). The UMH renders a single memory address space for heterogeneous memories. Thus, the CPUs can directly access structured data in SSDs and eliminate bulk data copy/swap between the memory and storage devices. However, applying traditional indexing structures directly on SSDs may lead to poor performance. Particularly, the popular hash indexing generates highly randomized write traffic, incurring significant garbage collection overhead in SSDs. To address this problem, we propose a novel SSD-friendly hash indexing scheme called Tiered Hashing. It employs a multi-layer structure and opportunistic data movement (ODM) to construct skewed writes. Hence, the SSD can transform the writes into multi-streamed writes, where hot and cold data are separated to reduce GC overhead. Experimental results show Tiered Hashing reduces the average write latency and GC overhead by up to 94.98% and 90.71% compared to state-of-the-art hash indexings, without sacrificing read performance.

References

[1]
[n.d.]. Lightning Memory-mapped Database. https://symas.com/lmdb/.
[2]
[n.d.]. Memcached. https://developer.nvidia.com/blog/gpudirect-storage/.
[3]
[n.d.]. MongoDB: Memory Mapped File Usage. https://docs.mongodb.com/manual/faq/storage/.
[4]
[n.d.]. Redis. https://redis.io/.
[5]
Ahmed Abulila, Vikram Sharma Mailthody, Zaid Qureshi, Jian Huang, Nam Sung Kim, Jinjun Xiong, and Wen-Mei Hwu. 2019. FlatFlash: Exploiting the Byte-Accessibility of SSDs within A Unified Memory-Storage Hierarchy. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 971--985.
[6]
Hiroyuki Akinaga and Hisashi Shima. 2010. Resistive random access memory (ReRAM) based on metal oxides. Proc. IEEE 98, 12 (2010), 2237--2251.
[7]
Dmytro Apalkov, Alexey Khvalkovskiy, Steven Watts, Vladimir Nikitin, Xueti Tang, Daniel Lottis, Kiseok Moon, Xiao Luo, Eugene Chen, Adrian Ong, et al. 2013. Spin-transfer torque magnetic random access memory (STT-MRAM). ACM Journal on Emerging Technologies in Computing Systems (JETC) 9, 2 (2013), 1--35.
[8]
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems. 53--64.
[9]
Duck-Ho Bae, Insoon Jo, Youra Adel Choi, Joo-Young Hwang, Sangyeun Cho, Dong-Gi Lee, and Jaeheon Jeong. 2018. 2B-SSD: the case for dual, byte-and block-addressable solid-state drives. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 425--438.
[10]
Matias Bjørling, Abutalib Aghayev, Hans Holmberg, Aravind Ramesh, Damien Le Moal, Gregory R Ganger, and George Amvrosiadis. 2021. {ZNS}: Avoiding the Block Interface Tax for Flash-based {SSDs}. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). 689--703.
[11]
Renhai Chen, Zili Shao, and Tao Li. 2016. Bridging the I/O performance gap for big data workloads: A new NVDIMM-based approach. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Press, 9.
[12]
Zhangyu Chen, Yu Huang, Bo Ding, and Pengfei Zuo. 2020. Lock-free Concurrent Level Hashing for Persistent Memory. In 2020 {USENIX} Annual Technical Conference ({USENIX} {ATC} 20). 799--812.
[13]
Nachshon Cohen, David T. Aksun, Hillel Avni, and James R. Larus. 2019. Fine-Grain Checkpointing with In-Cache-Line Logging. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (Providence, RI, USA) (ASPLOS '19). Association for Computing Machinery, New York, NY, USA, 441--454.
[14]
Ronald Fagin, Jurg Nievergelt, Nicholas Pippenger, and H Raymond Strong. 1979. Extendible hashing---a fast access method for dynamic files. ACM Transactions on Database Systems (TODS) 4, 3 (1979), 315--344.
[15]
Steffen Friedrich and Norbert Ritter. 2018. YCSB. Springer International Publishing, Cham, 1--4.
[16]
Donghyun Gouk, Sangwon Lee, Miryeong Kwon, and Myoungsoo Jung. 2022. Direct Access, {High-Performance} Memory Disaggregation with {DirectCXL}. In 2022 USENIX Annual Technical Conference (USENIX ATC 22). 287--294.
[17]
Siddharth Gupta, EcoCloud, Lei Yan, Mark Sutherland, Abhishek Bhattacharjee, and Peter Yan-Tek Hsu. 2020. AstriFlash: An Online Flash-Based Memory Hierarchy.
[18]
Kyuhwa Han, Hyunho Gwak, Dongkun Shin, and Jooyoung Hwang. 2021. ZNS+: Advanced zoned namespace interface for supporting in-storage zone compaction. In 15th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 21). 147--162.
[19]
Jian Huang, Anirudh Badam, Moinuddin K Qureshi, and Karsten Schwan. 2015. Unified address translation for memory-mapped SSDs with FlashMap. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 580--591.
[20]
Myoungsoo Jung. 2022. Hello bytes, bye blocks: PCIe storage meets compute express link for memory expansion (CXL-SSD). In Proceedings of the 14th ACM Workshop on Hot Topics in Storage and File Systems. 45--51.
[21]
Taejin Kim, Duwon Hong, Sangwook Shane Hahn, Myoungjun Chun, Sungjin Lee, Jooyoung Hwang, Jongyoul Lee, and Jihong Kim. 2019. Fully automatic stream management for multi-streamed SSDs using program contexts. In 17th {USENIX} Conference on File and Storage Technologies ({FAST} 19). 295--308.
[22]
Apostolos Kokolis, Dimitrios Skarlatos, and Josep Torrellas. 2019. PageSeer: Using page walks to trigger page swaps in hybrid memory systems. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 596--608.
[23]
Changman Lee, Dongho Sim, Jooyoung Hwang, and Sangyeun Cho. 2015. F2FS: A new file system for flash storage. In 13th {USENIX} Conference on File and Storage Technologies ({FAST} 15). 273--286.
[24]
Gyusun Lee, Wenjing Jin, Wonsuk Song, Jeonghun Gong, Jonghyun Bae, Tae Jun Ham, Jae W. Lee, and Jinkyu Jeong. 2020. A Case for Hardware-Based Demand Paging. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 1103--1116.
[25]
Witold Litwin. 1980. Linear Hashing: a new tool for file and table addressing. In VLDB, Vol. 80. 1--3.
[26]
Virendra J Marathe, Margo Seltzer, Steve Byan, and Tim Harris. 2017. Persistent memcached: Bringing legacy code to byte-addressable persistent memory. In 9th {USENIX} Workshop on Hot Topics in Storage and File Systems (HotStorage 17).
[27]
Moohyeon Nam, Hokeun Cha, Young-ri Choi, Sam H Noh, and Beomseok Nam. 2019. Write-optimized dynamic hashing for persistent memory. In 17th {USENIX} Conference on File and Storage Technologies ({FAST} 19). 31--44.
[28]
Intel Newsroom. 2015. Introducing Intel Optane technology-bringing 3D XPoint memory to storage and memory products.
[29]
Rasmus Pagh and Flemming Friche Rodler. 2004. Cuckoo hashing. Journal of Algorithms 51, 2 (2004), 122--144.
[30]
Anastasios Papagiannis, Giorgos Xanthakis, Giorgos Saloustros, Manolis Marazakis, and Angelos Bilas. 2020. Optimizing Memory-mapped I/O for Fast Storage Devices. In 2020 {USENIX} Annual Technical Conference ({USENIX}{ATC} 20). 813--827.
[31]
Eunhee Rho, Kanchan Joshi, Seung-Uk Shin, Nitesh Jagadeesh Shetty, Jooyoung Hwang, Sangyeun Cho, Daniel DG Lee, and Jaeheon Jeong. 2018. FStream: managing flash streams in the file system. In 16th {USENIX} Conference on File and Storage Technologies ({FAST} 18). 257--264.
[32]
Hongchan Roh and Sanghyun Park. 2008. An efficient hash index structure for solid state disks. In 2008 International Conference on Information and Knowledge Engineering, IKE 2008. 256--261.
[33]
H-S Philip Wong, Simone Raoux, SangBum Kim, Jiale Liang, John P Reifenberg, Bipin Rajendran, Mehdi Asheghi, and Kenneth E Goodson. 2010. Phase change memory. Proc. IEEE 98, 12 (2010), 2201--2227.
[34]
Shiqin Yan, Huaicheng Li, Mingzhe Hao, Michael Hao Tong, Swaminathan Sundararaman, Andrew A. Chien, and Haryadi S. Gunawi. 2017. Tiny-Tail Flash: Near-Perfect Elimination of Garbage Collection Tail Latencies in NAND SSDs. In 15th USENIX Conference on File and Storage Technologies (FAST 17). USENIX Association, Santa Clara, CA, 15--28. https://www.usenix.org/conference/fast17/technical-sessions/presentation/yan
[35]
Chengcheng Yang, Peiquan Jin, Lihua Yue, and Dezhi Zhang. 2016. Self-Adaptive Linear Hashing for solid state drives. In 2016 IEEE 32nd International Conference on Data Engineering (ICDE). IEEE, 433--444.
[36]
Fei Yang, Kun Dou, Siyu Chen, Mengwei Hou, Jeong-Uk Kang, and Sangyeun Cho. 2015. Optimizing nosql db on flash: A case study of rocksdb. In 2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom). IEEE, 1062--1069.
[37]
Jingpei Yang, Rajinikanth Pandurangan, Changho Choi, and Vijay Balakrishnan. 2017. AutoStream: automatic stream management for multi-streamed SSDs. In Proceedings of the 10th ACM International Systems and Storage Conference. ACM, 3.
[38]
Hwanjin Yong, Kisik Jeong, Joonwon Lee, and Jin-Soo Kim. 2018. vStream: virtual stream management for multi-streamed SSDs. In 10th {USENIX} Workshop on Hot Topics in Storage and File Systems (HotStorage 18).
[39]
Jie Zhang, Miryeong Kwon, Donghyun Gouk, Sungjoon Koh, Nam Sung Kim, Mahmut Taylan Kandemir, and Myoungsoo Jung. 2021. Revamping Storage Class Memory with Hardware Automated Memory-over-Storage Solution. IEEE Press, 762--775.
[40]
Pengfei Zuo and Yu Hua. 2017. A write-friendly hashing scheme for non-volatile memory systems. In Proc. MSST.
[41]
Pengfei Zuo, Yu Hua, and Jie Wu. 2018. Write-optimized and high-performance hashing index scheme for persistent memory. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 461--476.

Cited By

View all
  • (2024)TieredHM: Hotspot-Optimized Hash Indexing for Memory-Semantic SSD-Based Hybrid MemoryIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.335469343:6(1755-1768)Online publication date: Jun-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PACT '22: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques
October 2022
569 pages
ISBN:9781450398688
DOI:10.1145/3559009
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IFIP WG 10.3: IFIP WG 10.3
  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 January 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. hash indexing
  2. multi-stream SSD
  3. unified memory-storage hierarchy

Qualifiers

  • Research-article

Funding Sources

  • Alibaba Group
  • National Natural Science Foundation of China

Conference

PACT '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)84
  • Downloads (Last 6 weeks)4
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)TieredHM: Hotspot-Optimized Hash Indexing for Memory-Semantic SSD-Based Hybrid MemoryIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.335469343:6(1755-1768)Online publication date: Jun-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media