skip to main content
research-article

TPFS: A High-Performance Tiered File System for Persistent Memories and Disks

Published:06 March 2023Publication History
Skip Abstract Section

Abstract

Emerging fast, byte-addressable persistent memory (PM) promises substantial storage performance gains compared with traditional disks. We present TPFS, a tiered file system that combines PM and slow disks to create a storage system with near-PM performance and large capacity. TPFS steers incoming file input/output (I/O) to PM, dynamic random access memory (DRAM), or disk depending on the synchronicity, write size, and read frequency. TPFS profiles the application’s access stream online to predict the behavior of file access. In the background, TPFS estimates the “temperature” of file data and migrates the write-cold and read-hot file data from PM to disks. To fully utilize disk bandwidth, TPFS coalesces data blocks into large, sequential writes. Experimental results show that with a small amount of PM and a large solid-state drive (SSD), TPFS achieves up to 7.3× and 7.9× throughput improvement compared with EXT4 and XFS running on an SSD alone, respectively. As the amount of PM grows, TPFS’s performance improves until it matches the performance of a PM-only file system.

REFERENCES

  1. [1] Agarwal Neha and Wenisch Thomas F.. 2017. Thermostat: Application-transparent page management for two-tiered main memory. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems, Association for Computing Machinery, Xi’an, 631–644.Google ScholarGoogle Scholar
  2. [2] Arulraj Joy, Pavlo Andrew, and Dulloor Subramanya R.. 2015. Let’s talk about storage & recovery methods for non-volatile memory database systems. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, Melbourne, 707722.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Axboe Jens. 2012. Fio: Flexible i/o tester. Retrieved January 25, 2023 from http://freecode.com/projects/fio.Google ScholarGoogle Scholar
  4. [4] Cano Ignacio, Aiyar Srinivas, Arora Varun, Bhattacharyya Manosiz, Chaganti Akhilesh, Cheah Chern, Chun Brent N., Gupta Karan, Khot Vinayak, and Krishnamurthy Arvind. 2017. Curator: Self-managing storage for enterprise clusters. In Proceedings of the 14th USENIX Conference on Networked Systems Design and Implementation (NSDI), USENIX Association, Boston MA, 5166.Google ScholarGoogle Scholar
  5. [5] Chen E., Apalkov D., Diao Z., Driskill-Smith A., Druist D., Lottis D., Nikitin V., Tang X., Watts S., and Wang S.. 2010. Advances and future prospects of spin-transfer torque random access memory. IEEE Transactions on Magnetics 46, 6 (2010), 18731878.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Chen Hao, Ruan Chaoyi, Li Cheng, Ma Xiaosong, and Xu Yinlong. 2021. SpanDB: A fast, cost-effective LSM-tree based KV store on hybrid storage. In 19th USENIX Conference on File and Storage Technologies (FAST 21), USENIX Association, virtual event, 1732.Google ScholarGoogle Scholar
  7. [7] Chen Youmin, Lu Youyou, Zhu Bohong, Arpaci-Dusseau Andrea C., Arpaci-Dusseau Remzi H., and Shu Jiwu. 2021. Scalable persistent memory file system with kernel-userspace collaboration. In 19th USENIX Conference on File and Storage Technologies (FAST 21), USENIX Association, virtual event, 8195.Google ScholarGoogle Scholar
  8. [8] Chinner Dave. 2015. xfs: DAX support. Retrieved January 25, 2023 from https://lwn.net/Articles/635514/.Google ScholarGoogle Scholar
  9. [9] Condit Jeremy, Nightingale Edmund B., Frost Christopher, Ipek Engin, Lee Benjamin, Burger Doug, and Coetzee Derrick. 2009. Better I/O through byte-addressable, persistent memory. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles. ACM, Association for Computing Machinery, Big Sky, MT, 133146.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Consortium CXL. 2022. Compute Express Link\(^{TM}\): The Breakthrough CPU-to-Device Interconnect. Retrieved January 25, 2023 from https://www.computeexpresslink.org.Google ScholarGoogle Scholar
  11. [11] Debnath Biplob, Sengupta Sudipta, and Li Jin. 2010. FlashStore: High throughput persistent key-value store. Proceedings of the VLDB Endowment 3, 1-2 (2010), 14141425.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Dong Mingkai, Bu Heng, Yi Jifei, Dong Benchao, and Chen Haibo. 2019. Performance and protection in the ZoFS user-space NVM file system. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, Association for Computing Machinery, Huntsville Ontario, 478493.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Dong Mingkai and Chen Haibo. 2017. Soft updates made simple and fast on non-volatile memory. In 2017 USENIX Annual Technical Conference (USENIX ATC 17). USENIX Association, Santa Clara, CA. 719731.Google ScholarGoogle Scholar
  14. [14] Dulloor Subramanya R., Kumar Sanjay, Keshavamurthy Anil, Lantz Philip, Reddy Dheeraj, Sankaran Rajesh, and Jackson Jeff. 2014. System software for persistent memory. In Proceedings of the 9th European Conference on Computer Systems. ACM, Amsterdam, 1–15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Dulloor Subramanya R., Roy Amitabha, Zhao Zheguang, Sundaram Narayanan, Satish Nadathur, Sankaran Rajesh, Jackson Jeff, and Schwan Karsten. 2016. Data tiering in heterogeneous memory systems. In Proceedings of the 11th European Conference on Computer Systems. ACM, London, 1–16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Dulong Rémi, Pires Rafael, Correia Andreia, Schiavoni Valerio, Ramalhete Pedro, Felber Pascal, and Thomas Gaël. 2021. NVCache: A plug-and-play NVMM-based I/O booster for legacy systems. In 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’21). IEEE, Taipei, 186198.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Facebook. 2012. Rocksdb. (2012). Retrieved January 25, 2023 from http://rocksdb.org.Google ScholarGoogle Scholar
  18. [18] Fang Ru, Hsiao Hui-I., He Bin, Mohan C., and Wang Yun. 2011. High performance database logging using storage class memory. IEEE 27th International Conference on Data Engineering, IEEE Computer Society, Hannover, 1221–1231.Google ScholarGoogle Scholar
  19. [19] Google. 2011. LevelDB. Retrieved February 3, 2023 from https://github.com/google/leveldb.Google ScholarGoogle Scholar
  20. [20] Hitz Dave, Lau James, and Malcolm Michael A.. 1994. File system design for an NFS file server appliance. In USENIX Winter, Vol. 94.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Intel. 2018. Intel Optane Technology. Retrieved January 25, 2023 from https://www.intel.com/content/www/us/en/architecture-and-technology/intel-optane-technology.html.Google ScholarGoogle Scholar
  22. [22] Intel. 2020. Intel optane DC persistent memory. Retrieved January 25, 2023 from https://www.intel.com/content/www/us/en/architecture-and-technology/optane-dc-persistent-memory.html.Google ScholarGoogle Scholar
  23. [23] Izraelevitz Joseph, Yang Jian, Zhang Lu, Kim Juno, Liu Xiao, Memaripour Amirsaman, Soh Yun Joon, Wang Zixuan, Xu Yi, Dulloor Subramanya R., et al. 2019. Basic performance measurements of the Intel Optane DC persistent memory module. arXiv preprint arXiv:1903.05714 (2019).Google ScholarGoogle Scholar
  24. [24] Kaczmarski Michael, Jiang Tricia, and Pease David A.. 2003. Beyond backup toward storage management. IBM Systems Journal 42, 2 (2003), 322337.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Kadekodi Rohan, Lee Se Kwon, Kashyap Sanidhya, Kim Taesoo, Kolli Aasheesh, and Chidambaram Vijay. 2019. SplitFS: Reducing software overhead in file systems for persistent memory. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, Association for Computing Machinery, Huntsville, 494508.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Kannan Sudarsun, Arpaci-Dusseau Andrea C., Arpaci-Dusseau Remzi H., Wang Yuangang, Xu Jun, and Palani Gopinath. 2018. Designing a true direct-access file system with DevFS. In 16th USENIX Conference on File and Storage Technologies, USENIX Association, Oakland, CA, 241.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Kawahara Takayuki. 2010. Scalable spin-transfer torque ram technology for normally-off computing. IEEE Design & Test of Computers 28 (2010), 5263.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Krish K. R., Anwar Ali, and Butt Ali R.. 2014. hats: A heterogeneity-aware tiered storage for Hadoop. In 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid’14). IEEE, Chicago, IL, 502511.Google ScholarGoogle Scholar
  29. [29] Kwon Youngjin, Fingler Henrique, Hunt Tyler, Peter Simon, Witchel Emmett, and Anderson Thomas. 2017. Strata: A cross media file system. In Proceedings of the 26th Symposium on Operating Systems Principles, Association for Computing Machinery, New York, NY, 460477.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Lee Benjamin C., Ipek Engin, Mutlu Onur, and Burger Doug. 2009. Architecting phase change memory as a scalable DRAM alternative. In Proceedings of the 36th Annual International Symposium on Computer Architecture, Association for Computing Machinery, Austin, TX, 2–13.Google ScholarGoogle Scholar
  31. [31] Li Cheng, Shilane Philip, Douglis Fred, Shim Hyong, Smaldone Stephen, and Wallace Grant. 2014. Nitro: A capacity-optimized SSD cache for primary storage. In USENIX Annual Technical Conference, USENIX Association, Philadelphia, PA, 501512.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Micron. 2017. Battery-backed NVDIMMs. (2017). Retrieved January 25, 2023 from https://www.micron.com/products/dram-modules/nvdimm/.Google ScholarGoogle Scholar
  33. [33] Narayanan Dushyanth and Hodson Orion. 2012. Whole-system persistence. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, Association for Computing Machinery, London, 401–410.Google ScholarGoogle Scholar
  34. [34] Qureshi Moinuddin K., Srinivasan Vijayalakshmi, and Rivers Jude A.. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture, Association for Computing, Austin, TX, 24–33.Google ScholarGoogle Scholar
  35. [35] Tadakamadla Rajesh, Patocka Mikulas, Kani Toshi, and Norton Scott J.. 2019. Accelerating database workloads with DM-writecache and persistent memory. In Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering, Association for Computing Machinery, Mumbai, 255263.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Tarasov Vasily, Zadok Erez, and Shepler Spencer. 2016. Filebench: A flexible framework for file system benchmarking. USENIX; Login 41, 1 (2016), 612.Google ScholarGoogle Scholar
  37. [37] Wang Zixuan, Liu Xiao, Yang Jian, Michailidis Theodore, Swanson Steven, and Zhao Jishen. 2020. Characterizing and modeling non-volatile memory systems. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, Athens, 496508.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Wilcox Matthew. 2014. Add support for NV-DIMMs to ext4. Retrieved January 25, 2023 from https://lwn.net/Articles/613384/.Google ScholarGoogle Scholar
  39. [39] Wilcox M.. 2017. Add support for NV-DIMMs to ext4. Retrieved February 3, 2023 from https://lwn.net/Articles/613384/.Google ScholarGoogle Scholar
  40. [40] Wu Kan, Guo Zhihan, Hu Guanzhou, Tu Kaiwei, Alagappan Ramnatthan, Sen Rathijit, Park Kwanghyun, Arpaci-Dusseau Andrea C., and Arpaci-Dusseau Remzi H.. 2021. The storage hierarchy is not a hierarchy: Optimizing caching on modern storage devices with Orthus. In 19th USENIX Conference on File and Storage Technologies (FAST’21), USENIX Association, virtual event, 307323.Google ScholarGoogle Scholar
  41. [41] Wu Xiaojian and Reddy A. L.. 2011. SCMFS: A file system for storage class memory. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, Association for Computing Machinery, Denver, CO, 1–11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Xu Cong, Dong Xiangyu, Jouppi Norman P., and Xie Yuan. 2011. Design implications of memristor-based RRAM cross-point structures. In Design, Automation & Test in Europe Conference & Exhibition (DATE’11). IEEE, Grenoble, 16.Google ScholarGoogle Scholar
  43. [43] Xu Jian and Swanson Steven. 2016. NOVA: A log-structured file system for hybrid volatile/non-volatile main memories. In Proceeding of the 14th USENIX Conference on File and Storage Technologies (FAST’16). USENIX Association, Santa Clara, CA, 323338.Google ScholarGoogle Scholar
  44. [44] Xu Jian, Zhang Lu, Memaripour Amirsaman, Gangadharaiah Akshatha, Borase Amit, Silva Tamires Brito Da, Swanson Steven, and Rudoff Andy. 2017. Nova-Fortis: A fault-tolerant non-volatile main memory file system. In Proceedings of the 26th Symposium on Operating Systems Principles, Association for Computing Machinery, Shanghai, 478496.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Yang Jian, Kim Juno, Hoseinzadeh Morteza, Izraelevitz Joseph, and Swanson Steve. 2020. An empirical guide to the behavior and use of scalable persistent memory. In 18th USENIX Conference on File and Storage Technologies (FAST’20), USENIX Association, Santa Clara, CA, 169182.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Yang J. Joshua, Strukov Dmitri B., and Stewart Duncan R.. 2013. Memristive devices for computing. Nature Nanotechnology 8, 1 (2013), 13.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Yang Zhengyu, Hoseinzadeh Morteza, Andrews Allen, Mayers Clay, Evans David Thomas, Bolt Rory Thomas, Bhimani Janki, Mi Ningfang, and Swanson Steven. 2017. AutoTiering: Automatic data placement manager in multi-tier all-flash datacenter. In IEEE 36th International Performance Computing and Communications Conference (IPCCC’17). IEEE, San Diego, CA, 18.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Zhang Gong, Chiu Lawrence, and Liu Ling. 2010. Adaptive data migration in multi-tiered storage based cloud environment. In IEEE 3rd International Conference on Cloud Computing (CLOUD’10). IEEE, Miami, FL, 148155.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Zhang Yiying, Yang Jian, Memaripour Amirsaman, and Swanson Steven. 2015. Mojim: A reliable and highly-available non-volatile memory system. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems, Association for Computing Machinery, Istanbul, 3–18.Google ScholarGoogle Scholar

Index Terms

  1. TPFS: A High-Performance Tiered File System for Persistent Memories and Disks

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Storage
          ACM Transactions on Storage  Volume 19, Issue 2
          May 2023
          269 pages
          ISSN:1553-3077
          EISSN:1553-3093
          DOI:10.1145/3585541
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 6 March 2023
          • Online AM: 13 January 2023
          • Accepted: 15 December 2022
          • Revised: 21 June 2022
          • Received: 7 October 2021
          Published in tos Volume 19, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format