skip to main content
research-article

Exploiting Flat Namespace to Improve File System Metadata Performance on Ultra-Fast, Byte-Addressable NVMs

Published: 30 January 2024 Publication History

Abstract

The conventional file system provides a hierarchical namespace by structuring it as a directory tree. Tree-based namespace structure leads to inefficient file path walk and expensive namespace tree traversal, underutilizing ultra-low access latency and superior sequential performance provided by non-volatile memories (NVMs). This article proposes FlatFS+, an NVM file system that features a flat namespace architecture while providing a compatible hierarchical namespace view. FlatFS+ incorporates three novel techniques: the direct file path walk model, range-optimized Br tree, and compressed index key design with scan and write dual optimization, to fully exploit flat namespace to improve file system metadata performance on ultra-fast, byte-addressable NVMs. Evaluation results demonstrate that FlatFS+ achieves significant performance improvements for metadata-intensive benchmarks and real-world applications compared to other file systems.

References

[1]
Intel. 2015. Intel and Micron Produce Breakthrough Memory Technology. Retrieved September 16, 2023 from https://newsroom.intel.com/news-releases/intel-and-micron-produce-breakthrough-memory-technology/
[2]
GitHub. 2018. Google LevelDB. Retrieved September 16, 2023 from https://github.com/google/leveldb
[3]
Tom’s Hardware. 2019. Samsung 983 ZET SSD Review: Z-NAND Takes on Optane. Retrieved September 16, 2023 from https://www.tomshardware.com/reviews/samsung-983-zet-z-nand-optane-ssd,6036.html
[4]
Percona. 2021. Percona TokuDB. Retrieved September 16, 2023 from https://docs.percona.com/percona-server/8.0/tokudb/tokudb_intro.html
[5]
Apache Hadoop. 2022. Download. Retrieved September 16, 2023 from https://hadoop.apache.org/releases.html
[6]
CCIX. 2022. Cache Coherent Interconnect for Accelerators. Retrieved September 16, 2023 from https://www.ccixconsortium.com/
[7]
Catalogue of Life. 2022. Catalogue of Life Home Page. Retrieved September 16, 2023 from https://www.catalogueoflife.org/data/download
[8]
OpenCAPI. 2022. Open Coherent Accelerator Processor Interface (OpenCAPI) for Advanced Storage. Retrieved September 16, 2023 from https://www.snia.org/educational-library/open-coherent-accelerator-processor-interface-opencapi-advanced-storage-2018
[9]
Ext4 Wiki. 2022. Ext4 Disk Layout. Retrieved September 16, 2023 from https://ext4.wiki.kernel.org/index.php/
[10]
[11]
GitHub. 2022. Intel Persistent Memory Watch. Retrieved September 16, 2023 from https://github.com/intel/intel-pmwatch
[12]
GitHub. 2022. Path Walking and Name Lookup Locking. Retrieved September 16, 2023 from https://github.com/torvalds/linux/blob/master/Documentation/filesystems/path-lookup.txt
[13]
Samsung Newsroom. 2022. Samsung Electronics Unveils Far-Reaching, Next-Generation Memory Solutions at Flash Memory Summit 2022. Retrieved September 16, 2023 from https://news.samsung.com/global/samsung-electronics-unveils-far-reaching-next-generation-memory-solutions-at-flash-memory-summit-2022
[14]
Man7. 2022. Strace: Trace System Calls and Signals. Retrieved September 16, 2023 from https://man7.org/linux/man-pages/man1/strace.1.html
[15]
Gen-Z Consortium. 2022. Gen-Z Envisages Next-generation of Memory Management. Retrieved September 16, 2023 from https://www.snia.org/educational-library/gen-z-envisages-next-generation-memory-management-2019
[16]
TPC. 2022. TPC-H. Retrieved September 16, 2023 from http://www.tpc.org/tpch/
[17]
ExtremeTech. 2022. Why Latency Impacts SSD Performance More Than Bandwidth Does. Retrieved September 16, 2023 from https://www.extremetech.com/computing/325146-why-latency-impacts-ssd-performance-more-than-bandwidth-does
[18]
Nitin Agrawal, William J. Bolosky, John R. Douceur, and Jacob R. Lorch. 2007. A five-year study of file-system metadata. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST ’07). 31–45.
[19]
Sean Kenneth Barker and Prashant J. Shenoy. 2010. Empirical evaluation of latency-sensitive application performance in the cloud. In Proceedings of the ACM Conference on Multimedia Systems (MM ’10). 35–46.
[20]
Daniel P. Bovet and Marco Cesati. 2006. Understand the Linux Kernel. O’Reilly Media.
[21]
Silas Boyd-Wickizer, Austin T. Clements, Yandong Mao, Aleksey Pesterev, M. Frans Kaashoek, Robert Tappan Morris, and Nickolai Zeldovich. 2010. An analysis of Linux scalability to many cores. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI ’10). 1–16.
[22]
Miao Cai, Junru Shen, Bin Tang, Hao Huang, and Baoliu Ye. 2022. FlatFS: Flatten hierarchical file system namespace on non-volatile memories. In Proceedings of the USENIX Annual Technical Conference (ATC ’22). 899–914.
[23]
Shimin Chen and Qin Jin. 2015. Persistent B+-trees in non-volatile main memory. Proceeding of the VLDB Endowment 8, 7 (2015), 786–797.
[24]
Youmin Chen, Youyou Lu, Bohong Zhu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, and Jiwu Shu. 2021. Scalable persistent memory file system with kernel-userspace collaboration. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST ’21). 81–95.
[25]
Jeremy Condit, Edmund B. Nightingale, Christopher Frost, Engin Ipek, Benjamin C. Lee, Doug Burger, and Derrick Coetzee. 2009. Better I/O through byte-addressable, persistent memory. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP ’09). 133–146.
[26]
Mingkai Dong, Heng Bu, Jifei Yi, Benchao Dong, and Haibo Chen. 2019. Performance and protection in the ZoFS user-space NVM file system. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP ’19). 478–493.
[27]
Mingkai Dong and Haibo Chen. 2017. Soft updates made simple and fast on non-volatile memory. In Proceedings of the USENIX Annual Technical Conference (ATC ’17). 719–731.
[28]
Donghyun Gouk, Sangwon Lee, Miryeong Kwon, and Myoungsoo Jung. 2021. Direct access, high-performance memory disaggregation with DirectCXL. In Proceedings of the USENIX Annual Technical Conference (ATC ’21). 287–294.
[29]
Shashank Gugnani, Arjun Kashyap, and Xiaoyi Lu. 2020. Understanding the idiosyncrasies of real persistent memory. Proceeding of the VLDB Endowment 14, 4 (2020), 626–639.
[30]
William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, John Esmet, Yizheng Jiao, Ankur Mittal, Prashant Pandey, Phaneendra Reddy, Leif Walsh, Michael A. Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. 2015. BetrFS: A right-optimized write-optimized file system. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST ’15). 301–315.
[31]
Myoungsoo Jung. 2022. Hello bytes, bye blocks: PCIe storage meets compute express link for memory expansion (CXL-SSD). In Proceedings of the ACM Workshop on Hot Topics in Storage and File Systems (HotStorage ’22). 45–51.
[32]
Junbin Kang, Benlong Zhang, Tianyu Wo, Weiren Yu, Lian Du, Shuai Ma, and Jinpeng Huai. 2015. SpanFS: A scalable file system on fast storage devices. In Proceedings of the USENIX Annual Technical Conference (ATC ’15). 249–261.
[33]
Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, and Thomas E. Anderson. 2017. Strata: A cross media file system. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP ’17). 460–477.
[34]
Gyusun Lee, Seokha Shin, Wonsuk Song, Tae Jun Ham, Jae W. Lee, and Jinkyu Jeong. 2019. Asynchronous I/O stack: A low-latency kernel I/O stack for ultra-low latency SSDs. In Proceedings of the USENIX Annual Technical Conference (ATC ’19). 603–616.
[35]
Paul Hermann Lensing, Toni Cortes, and André Brinkmann. 2013. Direct lookup and hash-based metadata placement for local file systems. In Proceedings of the International Systems and Storage Conference (SYSTOR ’13). 1–11.
[36]
Andrew W. Leung, Shankar Pasupathy, Garth R. Goodson, and Ethan L. Miller. 2008. Measurement and analysis of large-scale network file system workloads. In Proceedings of the USENIX Annual Technical Conference (ATC ’08). 213–226.
[37]
Andrew W. Leung, Minglong Shao, Timothy Bisson, Shankar Pasupathy, and Ethan L. Miller. 2009. Spyglass: Fast, scalable metadata search for large-scale storage systems. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST ’09). 153–166.
[38]
Chuck Lever. 2000. Linux kernel hash table behavior: Analysis and improvements. In Proceedings of the 4th Annual Linux Showcase and Conference.
[39]
Huaicheng Li, Daniel S. Berger, Lisa Hsu, Daniel Ernst, Pantea Zardoshti, Stanko Novakovic, Monish Shah, Samir Rajadnya, Scott Lee, Ishwar Agarwal, Mark D. Hill, Marcus Fontoura, and Ricardo Bianchini. 2023. Pond: CXL-based memory pooling systems for cloud platforms. In Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 574–587.
[40]
Lanyue Lu, Yupu Zhang, Thanh Do, Samer Al-Kiswany, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2014. Physical disentanglement in a container-based file system. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI ’14). 81–96.
[41]
Youyou Lu, Jiwu Shu, and Wei Wang. 2014. ReconFS: A reconstructable file system on flash storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST ’14). 75–88.
[42]
Wenhao Lv, Youyou Lu, Yiming Zhang, Peile Duan, and Jiwu Shu. 2022. InfiniFS: An efficient metadata service for large-scale distributed filesystems. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST ’22). 313–328.
[43]
Hasan Al Maruf, Hao Wang, Abhishek Dhanotia, Johannes Weiner, Niket Agarwal, Pallab Bhattacharya, Chris Petersen, Mosharaf Chowdhury, Shobhit O. Kanaujia, and Prakash Chauhan. 2023. TPP: Transparent page placement for CXL-enabled tiered-memory. In Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems. 742–755.
[44]
Jim Mauro and Richard McDougall. 2001. Solaris Internals: Core Kernel Components. Vol. 1. Prentice Hall Professional.
[45]
Marshall Kirk McKusick, George V. Neville-Neil, and Robert N. M. Watson. 2015. The Design and Implementation of the FreeBSD Operating System. Pearson Education.
[46]
Changwoo Min, Sanidhya Kashyap, Steffen Maass, and Taesoo Kim. 2016. Understanding manycore scalability of file systems. In Proceedings of the USENIX Annual Technical Conference (ATC ’16). 71–85.
[47]
Asit K. Mishra, Xiangyu Dong, Guangyu Sun, Yuan Xie, Narayanan Vijaykrishnan, and Chita R. Das. 2011. Architecting on-chip interconnects for stacked 3D STT-RAM caches in CMPs. In Proceedings of the International Symposium on Computer Architecture (ISCA ’11). 69–80.
[48]
Jiaxin Ou, Jiwu Shu, and Youyou Lu. 2016. A high performance file system for non-volatile main memory. In Proceedings of the European Conference on Computer Systems (EuroSys ’16). 1–16.
[49]
Ismail Oukid, Johan Lasperas, Anisoara Nica, Thomas Willhalm, and Wolfgang Lehner. 2016. FPTree: A hybrid SCM-DRAM persistent and concurrent B-tree for storage class memory. In Proceedings of the International Conference on Management of Data (SIGMOD ’16). 371–386.
[50]
Swapnil Patil and Garth A. Gibson. 2011. Scale and concurrency of GIGA+: File system directories with millions of files. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST ’11). 177–190.
[51]
Ivy Bo Peng, Maya B. Gokhale, and Eric W. Green. 2019. System evaluation of the Intel Optane byte-addressable NVM. In Proceedings of the International Symposium on Memory Systems (MemSys ’19). 304–315.
[52]
Rob Pike. 2000. Lexical file names in Plan 9, or, getting dot-dot right. In Proceedings of the USENIX Annual Technical Conference (ATC ’00). 85–92.
[53]
Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the International Symposium on Computer Architecture (ISCA ’09). 24–33.
[54]
Dulloor Subramanya Rao, Sanjay Kumar, Anil S. Keshavamurthy, Philip Lantz, Dheeraj Reddy, Rajesh Sankaran, and Jeff Jackson. 2014. System software for persistent memory. In Proceedings of the European Conference on Computer Systems (EuroSys ’14). 1–15.
[55]
Kai Ren and Garth A. Gibson. 2013. TABLEFS: Enhancing metadata efficiency in the local file system. In Proceedings of the USENIX Annual Technical Conference (ATC ’13). 145–156.
[56]
Kai Ren, Qing Zheng, Swapnil Patil, and Garth A. Gibson. 2014. IndexFS: Scaling file system metadata performance with stateless caching and bulk insertion. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC ’14). 237–248.
[57]
Ohad Rodeh, Josef Bacik, and Chris Mason. 2013. BTRFS: The Linux B-tree filesystem. ACM Transactions on Storage 9, 3 (2013), 1–32.
[58]
Siddhartha Sen, Robert E. Tarjan, and David Hong Kyun Kim. 2016. Deletion without rebalancing in binary search trees. ACM Transactions on Algorithms 12, 4 (2016), 1–31.
[59]
Yongseok Son, Sunggon Kim, Heon Y. Yeom, and Hyuck Han. 2018. High-performance transaction processing in journaling file systems. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST ’18). 227–240.
[60]
Hyunsub Song, Shean Kim, J. Hyun Kim, Ethan J. H. Park, and Sam H. Noh. 2021. First Responder: Persistent memory simultaneously as high performance buffer cache and storage. In Proceedings of the USENIX Annual Technical Conference (ATC ’21). 839–853.
[61]
Dmitri B. Strukov, Gregory S. Snider, Duncan R. Stewart, and R. Stanley Williams. 2008. The missing memristor found. Nature 453, 7191 (2008), 80.
[62]
Adam Sweeney, Doug Doucette, Wei Hu, Curtis Anderson, Mike Nishimoto, and Geoff Peck. 1996. Scalability in the XFS file system. In Proceedings of the USENIX Annual Technical Conference (ATC ’96). 1–14.
[63]
Vasily Tarasov, Erez Zadok, and Spencer Shepler. 2016. Filebench: A flexible framework for file system benchmarking. login: 41, 1 (2016), 6–12.
[64]
Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive—A warehousing solution over a map-reduce framework. Proceeding of the VLDB Endowment 2, 2 (2009), 1626–1629.
[65]
ChiaChe Tsai, Yang Zhan, Jayashree Reddy, Yizheng Jiao, Tao Zhang, and Donald E. Porter. 2015. How to get more value from your file system directory cache. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP ’15). 441–456.
[66]
Ying Wang, Dejun Jiang, and Jin Xiong. 2018. Caching or not: Rethinking virtual file system for non-volatile main memory. In Proceedings of the USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage ’18).
[67]
Zixuan Wang, Xiao Liu, Jian Yang, Theodore Michailidis, Steven Swanson, and Jishen Zhao. 2020. Characterizing and modeling non-volatile memory systems. In Proceedings of the International Symposium on Microarchitecture (MICRO ’20). 496–508.
[68]
Michèle Weiland, Holger Brunst, Tiago Quintino, Nick Johnson, Olivier Iffrig, Simon D. Smart, Christian Herold, Antonino Bonanni, Adrian Jackson, and Mark Parsons. 2019. An early evaluation of Intel’s Optane DC persistent memory module and its impact on high-performance scientific applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC ’19). 1–19.
[69]
Chris Wright, Crispin Cowan, Stephen Smalley, James Morris, and Greg Kroah-Hartman. 2002. Linux security modules: General security support for the Linux kernel. In Proceedings of the 11th USENIX Security Symposium (SEC ’02). 17–31.
[70]
Lingfeng Xiang, Xingsheng Zhao, Jia Rao, Song Jiang, and Hong Jiang. 2022. Characterizing the performance of Intel Optane persistent memory: A close look at its on-DIMM buffering. In Proceedings of the European Conference on Computer Systems (EuroSys ’22). 488–505.
[71]
Jian Xu and Steven Swanson. 2016. NOVA: A log-structured file system for hybrid volatile/non-volatile main memories. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST ’16). 323–338.
[72]
Jian Xu, Lu Zhang, Amirsaman Memaripour, Akshatha Gangadharaiah, Amit Borase, Tamires Silva, Steven Swanson, and Andy Rudoff. 2017. NOVA-Fortis: A fault-tolerant non-volatile main memory file system. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP ’17). 478–496.
[73]
Jian Yang, Juno Kim, Morteza Hoseinzadeh, Joseph Izraelevitz, and Steven Swanson. 2020. An empirical guide to the behavior and use of scalable persistent memory. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST ’20). 169–182.
[74]
Xi Yang, Stephen M. Blackburn, and Kathryn S. McKinley. 2016. Elfen scheduling: Fine-grain principled borrowing from latency-critical workloads using simultaneous multithreading. In Proceedings of the USENIX Annual Technical Conference (ATC ’16). 309–322.
[75]
Yang Zhan, Alexander Conway, Yizheng Jiao, Eric Knorr, Michael A. Bender, Martin Farach-Colton, William Jannen, Rob Johnson, Donald E. Porter, and Jun Yuan. 2018. The full path to full-path indexing. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST ’18). 123–138.
[76]
Jie Zhang, Miryeong Kwon, Donghyun Gouk, Sungjoon Koh, Changlim Lee, Mohammad Alian, Myoungjun Chun, Mahmut Taylan Kandemir, Nam Sung Kim, Jihong Kim, and Myoungsoo Jung. 2018. FlashShare: Punching through server storage stack from kernel to firmware for ultra-low latency SSDs. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI ’18). 477–492.
[77]
Shengan Zheng, Morteza Hoseinzadeh, and Steven Swanson. 2019. Ziggurat: A tiered file system for non-volatile main memories and disks. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST ’19). 207–219.

Index Terms

  1. Exploiting Flat Namespace to Improve File System Metadata Performance on Ultra-Fast, Byte-Addressable NVMs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Storage
    ACM Transactions on Storage  Volume 20, Issue 1
    February 2024
    198 pages
    EISSN:1553-3093
    DOI:10.1145/3613537
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 January 2024
    Online AM: 06 September 2023
    Accepted: 22 August 2023
    Revised: 19 May 2023
    Received: 29 December 2022
    Published in TOS Volume 20, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. File system
    2. non-volatile memory
    3. metadata management

    Qualifiers

    • Research-article

    Funding Sources

    • Fundamental Research Funds for the Central Universities
    • National Natural Science Foundation of China
    • Natural Science Foundation of Jiangsu Province
    • CCF-Huawei Innovation Research Plan
    • Future Network Scientific Research Fund Project
    • China Postdoctoral Science Foundation
    • Jiangsu Planned Projects for Postdoctoral Research Funds

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 646
      Total Downloads
    • Downloads (Last 12 months)468
    • Downloads (Last 6 weeks)67
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media