skip to main content
article

The Conquest file system: Better performance through a disk/persistent-RAM hybrid design

Published:01 August 2006Publication History
Skip Abstract Section

Abstract

Modern file systems assume the use of disk, a system-wide performance bottleneck for over a decade. Current disk caching and RAM file systems either impose high overhead to access memory content or fail to provide mechanisms to achieve data persistence across reboots.The Conquest file system is based on the observation that memory is becoming inexpensive, which enables all file system services to be delivered from memory, except for providing large storage capacity. Unlike caching, Conquest uses memory with battery backup as persistent storage, and provides specialized and separate data paths to memory and disk. Therefore, the memory data path contains no disk-related complexity. The disk data path consists of optimizations only for the specialized disk usage pattern.Compared to a memory-based file system, Conquest incurs little performance overhead. Compared to several disk-based file systems, Conquest achieves 1.3x to 19x faster memory performance, and 1.4x to 2.0x faster performance when exercising both memory and disk.Conquest realizes most of the benefits of persistent RAM at a fraction of the cost of a RAM-only solution. It also demonstrates that disk-related optimizations impose high overheads for accessing memory content in a memory-rich environment.

References

  1. APC. 2005. SMART-UPS. http://www.apc.com.]]Google ScholarGoogle Scholar
  2. Anderson, D., Chase, J., and Vahdat, A. 2000. Interposed request routing for scalable network storage. In Proceedings of the 4th Symposium on Operating System Design and Implementation. San Diego, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Baker, M. G., Hartman, J. H., Kupfer, M. D., Shirriff, K. W., and Ousterhout, J. K. 1991. Measurements of a distributed file system. In Proceedings of the 13th Symposium on Operating Systems Principles. Pacific Grove, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Baker, M., Asami, S., Deprit, E., Ousterhout, J., and Seltzer, M. 1992. Non-volatile memory for fast, reliable file systems. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems. Boston, MA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. BITMICRO. 2005. High-End solid state disk. http://www.bitmicro.com/products_edisk_25_scsin.php.]]Google ScholarGoogle Scholar
  6. Boeve, H., Bruynseraede, C., Das, J., Dessein, K., Borghs, G., de Boeck, J., Sousa, R., Melo, L., and Freitas, P. 1999. Technology assessment for the implementation of magnetoresistive elements with semiconductor components in magnetic random access memory (MRAM) architectures. IEEE Trans. Magnet. 35, 5, 2820--2825.]]Google ScholarGoogle ScholarCross RefCross Ref
  7. Bolosky, W. J., Fitzgerald, R. P., and Douceur, J. R. 1997. Distributed schedule management in the Tiger video fileserver. In Proceedings of the 16th ACM Symposium on Operating Systems Principles. Saint-Malo, France.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bonwick, J. 1994. The slab allocator: An object-caching kernel memory allocator. In Proceedings of the USENIX Summer Technical Conference. Boston, MA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bozman, G. P., Ghannad, H. H., and Weinberger, E. D. 1991. A trace-driven study of CMS file references. IBM J. Res. Dev. 35, 5--6, 815--828.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cáceres, R., Douglis, F., Li, K., and Marsh, B. 1993. Operating system implications of solid-state mobile computers. Tech. rep. MITL-TR-56-93, Matsushita Information Technology Laboratory, United States.]]Google ScholarGoogle Scholar
  11. Card, R., Ts'o, T., and Tweedie, S. 1994. Design and implementation of the second extended filesystem. In Proceedings of the 1st Dutch International Symposium on Linux. ISBN 90-367-0385-9.]]Google ScholarGoogle Scholar
  12. Chen, P. M., Ng, W. T., Chandra, S., Aycock, C., Rajamani, G., and Lowell, D. 1996. The Rio file cache: Surviving operating system crashes. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Cambridge, MA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chen, S. and Thapar, M. 1997. A novel video layout strategy for near-video-on-demand servers. Tech. rep. HPL-97-52. Hewlett-Packard Laboratories.]]Google ScholarGoogle Scholar
  14. DELL. 2002. Determining the availability and reliability of storage configurations. http://www1.us.dell.com/content/topics/global.aspx/power/en/ps3q02_shetty?c=us&l=en&s=corp. Google keywords: Dell, reliability, MTBF, hours.]]Google ScholarGoogle Scholar
  15. Dewitt, D. J., Katz, R. H., Olken, F., Shapiro, L. D., Stonebraker, M., and Wood, D. A. 1984. Implementation techniques for main memory database systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Douceur, J. R. and Bolosky, W. J. 1999. A large-scale study of file-system contents. In Proceedings of the ACM Sigmetrics International Conference on Measurement and Modeling of Computer Systems. Atlanta, GA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Douglis, F., Cáceres, R., Kaashoek, F., Li, K., Marsh, B., and Tauber, J. A. 1994. Storage alternatives for mobile computers. In Proceedings of the 1st Symposium on Operating Systems Design and Implementation. Monterey, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Edel, N. K., Tuteja, D., Miller, M. L., and Brandt, S. A. 2004. MRAMFS: A compressing file system for non-volatile RAM. In Proceedings of the 12th IEEE/ACM International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems. Volendam, the Netherlands.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Eich, M. H. 1987. A classification and comparison of main memory database recovery techniques. In Proceedings of the 3rd International Conference on Data Engineering. Los Angeles, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Evans, K. M. and Kuenning, G. K. 2002. A study of irregularities in file-size distributions. In Proceedings of the International Symposium on Performance Evaluation of Computer and Telecommunication Systems. San Diego, CA.]]Google ScholarGoogle Scholar
  21. Fagin, R., Nievergelt, J., Pippenger, N., and Strong, H. R. 1979. Extensible hashing---A fast access method for dynamic files. ACM Trans. Datab. Syst. 4, 3, 315--344.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Gal, E. and Toledo, S. 2005. A transactional flash file system for microcontrollers. In Proceedings of the USENIX Annual Technical Conference. Anaheim, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ganger, G. R. and Patt, Y. N. 1994. Metadata update performance in file systems. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ganger, G. R., Mckusick, M. K., Soules, C. A. N., and Patt, Y. N. 2000. Soft updates: A solution to the metadata update problem in file systems. ACM Trans. Comput. Syst. 18, 2, 127--153.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Garcia-Molina, H. and Salem, K. 1987. High performance transaction processing with memory resident data. In Proceedings of the 2nd International Workshop on High Performance Transaction Systems. Pacific Grove, CA.]]Google ScholarGoogle Scholar
  26. Garcia-Molina, H. and Salem, K. 1992. Main memory database systems: An overview. IEEE Trans. Know. Data Eng. 4, 6, 509--516.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Gawlick, D. and Kinkade, D. 1985. Varieties of concurrency control in MIS/VS fast path. IEEE Datab. Eng. 8, 2, 3--10.]]Google ScholarGoogle Scholar
  28. Gibson, G. A. and Patterson, D. A. 1993. Designing disk arrays for high data reliability. J. Parallel. Distribut. Comput. 17, 1--2, 4--27.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Grochowski, E. and Halem, R. D. 2003. Technological impact of magnetic hard disk drives on storage systems. IBM Syst. J. 42, 2. http://www.research.ibm.com/journal/sj/422/grochowski.html.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Hitz, D., Lau, J., and Malcolm, M. File system design for an NFS file server appliance. In Proceedings of the USENIX Winter Technical Conference. San Francisco, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Howard, J., Kazar, M., Menees, S., Nichols, D., Satyanarayanan, M., Sidebotham, R., and West, M. 1988. Scale and performance in a distributed file system. ACM Trans. Comput. Syst. 6, 1, 51--81.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. IBM. 2003. IBM iSeries storage overview. http://www-1.ibm.com/servers/eserver/iseries/hardware/storage/overview.html.]]Google ScholarGoogle Scholar
  33. Irlam, G. 1993. UNIX file size survey---1993. http://www.base.com/gordoni/ufs93.html.]]Google ScholarGoogle Scholar
  34. Katcher, J. 1997. PostMark: A new file system benchmark. Tech. Rep. TR3022. Network Appliance, Inc.]]Google ScholarGoogle Scholar
  35. Kawaguichi, A., Nishioka, S., and Motoda, H. 1995. A flash-memory-based file system. In Proceedings of the USENIX Winter Technical Conference. New Orleans, LA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Kerekes, Z. 2005. Charting the rise of the solid state disk market. http://www.storagesearch.com/chartingtheriseofssds.html.]]Google ScholarGoogle Scholar
  37. Kleiman, S. R. 1986. Vnodes: An architecture for multiple file system types in Sun UNIX. In Proceedings of the Summer USENIX Conference. Atlanta, GA.]]Google ScholarGoogle Scholar
  38. Lehman, T. J. and Carey, M. J. 1987. A recovery algorithm for a high-performance memory-resident database system. In Proceedings of the ACM SIGMOD Conference. San Francisco, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Li, K. and Naughton, J. F. 1988. Multiprocessor main memory transaction processing. In Proceedings of the International Symposium on Databases in Parallel and Distributed Systems. Austin, TX.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Liebert Cooperation. 2005. Field MTBF numbers: What do they really mean? http://www.liebert.com/support/whitepapers/documents/techmtbf.asp.]]Google ScholarGoogle Scholar
  41. Mahanti, A., Williamson, C., and Eager, D. 2000. Traffic analysis of a web proxy caching hierarchy. IEEE Netw. Magazine: Special Issue on Web Performance 14, 3, 16--23.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. McKusick, M. K., Joy, W. N., Leffler, S. J., and Fabry, R. S. 1984. A fast file system for UNIX. ACM Trans. Comput. Syst. 2, 3, 181--197.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. McKusick, M. K., Karels, M. J., and Bostic, K. 1990. A pageable memory based filesystem. In Proceedings of the Summer USENIX Conference. Anaheim, CA.]]Google ScholarGoogle Scholar
  44. McKusick, M. K. and Ganger, G. R. 1991. Soft updates: A technique for eliminating most synchronous writes in the fast filesystem. In Proceedings of the USENIX Annual Technical Conference.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. McKusick, M. K. 2002. Running “fsck” in the background. In Proceedings of the BSDCon Conference. San Francisco, CA.]]Google ScholarGoogle Scholar
  46. MICRON. 1997. Module mean time between failures (MTBF). Tech. Note TN-04-45. http://download.micron.com/pdf/technotes/DT45.pdf.]]Google ScholarGoogle Scholar
  47. MICROSOFT. 2003. Microsoft Windows CE 3.0: Files, databases, and persistent storage. MSDN Online Library. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dncenet/html/systemmemorymgmtwince.asp.]]Google ScholarGoogle Scholar
  48. Miles, J. B. 2000. Thin clients. Government Comput. News 6, 11. http://appserv.gcn.com/state/vol6_no11/guide/893-1.html.]]Google ScholarGoogle Scholar
  49. Miller, E. L., Brandt, S. A., and Long, D. D. E. 2001. HerMES: High-performance reliable MRAM-enabled storage. In Proceedings of the 8th IEEE Workshop on Hot Topics in Operating Systems. Schloss Elmau, Germany.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. NAMESYS. 2005. http://www.namesys.com.]]Google ScholarGoogle Scholar
  51. Ng, N. T., Aycock, C. M., Rajamani, G., and Chen, P. M. 1996. Comparing disk and memory's resistance to operating system crashes. In Proceedings of the International Symposium on Software Reliability Engineering. Hong Kong, China.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Ng, N. T. and Chen, P. M. 2001. The design and verification of the Rio file cache. IEEE Trans. Comput. 50, 4, 322--337.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Niijima, H. 1995. Design of a solid-state file using flash EEPROM. IBM J. Res. Dev. 39, 5, 531--546.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Ousterhout, J. K., Da Costa, H., Harrison, D., Kunze, A., Kupfer, M., and Thompson, J. G. 1985. A trace driven analysis of the UNIX 4.2 BSD file systems. In Proceedings of the 10th ACM Symposium on Operating Systems Principles. Orcas Island, WA, 15--24.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. PALM. 2004. Introduction to palm OS memory use. Palm OS Programmer's Companion, Vol. I. http://www.palmos.com/dev/support/docs/palmos/PalmOSCompanion/Memory.html.]]Google ScholarGoogle Scholar
  56. PC WORLD. 2005. IRam speeds Windows XP startup. PC World. http://www.pcworld.com/news/article/0,aid,121105,00.asp.]]Google ScholarGoogle Scholar
  57. Peacock, J. K., Kamaraju, A. and Agrawal, S. 1998. Fast consistency checking for the solaris file system. In Proceedings of the USENIX Annual Technical Conference. New Orleans, LA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Peterson, J. L. and Norman, T. A. 1997. Buddy systems. Commun. ACM 20, 6, 421--431.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. PRICE WATCH. 2005. Memory---System. http://www.pricewatch.com.]]Google ScholarGoogle Scholar
  60. QUANTUM. 2003. Achieving real-time multimedia performance with multistream solid-state disk. http://uk.builder.com/whitepapers/0,39026692,60018746p-39000844q,00.htm.]]Google ScholarGoogle Scholar
  61. Riedel, E. 1998. A performance study of sequential I/O on Windows NT 4. In Proceedings of the 2nd USENIX Windows NT Symposium. Seattle, WA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Roselli, D., Lorch, J. R., and Anderson, T. E. 2000. A comparison of file system workloads. In Proceedings of the USENIX Annual Technical Conference. San Diego, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Rosenblum, M. and Ousterhout, J. 1991. The design and implementation of a log-structured file system. In Proceedings of the 13th ACM Symposium on Operating Systems Principles. Pacific Grove, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Schindler, J., Griffin, J. L., Lumb, C. R., and Ganger, G. R. 2002. Track-aligned extents: Matching access patterns to disk drive characteristics. In Proceedings of the USENIX File and Storage Technologies Conference. Monterey, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. SEAGATE. 2003. Cheetah 10K.6 reliability, performance, and low ownership cost. http://www.seagate.com.]]Google ScholarGoogle Scholar
  66. Seltzer, M. I., Ganger, G. R., McKusick, M. K., Smith, K. A., Soules, C. A. N., and Stein, C. A. 2000. Journaling versus soft updates: Asynchronous meta-data protection in file systems. In Proceedings of the USENIX Annual Technical Conference. San Diego, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Shankland, S. 2001. Transmeta taking Linux gadgets mobile. CNET News.com http://news.com.com/2100-1001-254020.html?legacy=cnet.]]Google ScholarGoogle Scholar
  68. Sweeney, A., Doucette, D., Hu, W., Anderson, C., Nishimoto, M., and Peck, G. 1996. Scalability in the XFS file system. In Proceedings of the USENIX Annual Technical Conference. San Digeo, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Thompson, K. 1978. UNIX implementation. Bell Syst. Tech. J. 57, 6, 1931--1946.]]Google ScholarGoogle ScholarCross RefCross Ref
  70. Torelli, P. 1995. The Microsoft flash file system. Dr. Dobb's J. Feb, 63--70.]]Google ScholarGoogle Scholar
  71. Vogels, W. 1999. File system usage in Windows NT 4.0. In Proceedings of the 17th Symposium on Operating Systems Principles. Kiawah Island, SC.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Wang, A. I. A., Kuenning, G. H., Reiher P., and Popek, G. 2003. The effects of memory-rich environments on file system microbenchmarks. In Proceedings of the International Symposium on Performance Evaluation of Computer and Telecommunication Systems. Montreal, Canada.]]Google ScholarGoogle Scholar
  73. Woodhouse, D. 2001. JFFS: The journaling flash file system. http://sources.redhat.com/jffs2/jffs2-html/.]]Google ScholarGoogle Scholar
  74. Wu, M. and Zwaenepoel, W. 1994. eNVy: A non-volatile, main memory storage system. In Proceedings of the 6th Conference on Architectural Support for Programming Languages and Operating Systems. San Jose, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The Conquest file system: Better performance through a disk/persistent-RAM hybrid design

          Recommendations

          Reviews

          Suma Adabala

          As dynamic random access memory (DRAM) gets cheaper, larger memories are typically used as buffers to hide input/output (I/O) latency to disk. The Conquest file system is a novel approach for the more effective use of cheap DRAM. It is designed so that battery-backed DRAM serves as a persistent store for small files and file system services, while the slower disks serve as a store for large files. Rather than adapting existing file system solutions, as in the case of random access memory (RAM) file systems or RAM-based disk emulators, the authors make a case for the need to redesign a file system optimized for persistent RAMs. The Conquest file system has a simpler datapath to small files and metadata in memory that bypasses the I/O buffer and disk management found in conventional disk-based file systems. The performance evaluation of Conquest shows up to a 19-times improvement in memory performance compared to file systems designed for disks, supporting the need for file system redesign to better exploit memory performance. Based on a variety of prior studies of file access patterns and file size distribution, the strategy for delegating files to a storage medium has a filesize threshold. Files with sizes below the threshold are delegated to persistent RAM, while those larger than the threshold are stored on disk. The performance gain achieved with Conquest for workloads that exercise both disk and memory supports this simple design decision. The large file layout on disk is optimized for sequential rather than random access, making Conquest disk access optimal for multimedia files, a significant component of current and future workloads. An implementation of Conquest as a loadable module in the Linux 2.4.2 kernel is available; however, due to issues such as lower reliability and lack of a garbage collector implementation, persistent DRAM must be cost-effective before it can be deployed. This paper is definitely worth reading for operating system designers. It demonstrates a successful redesign of a system component, the Conquest file system, after re-evaluating underlying assumptions, namely, file system optimizations for disks, in the context of changes to the system organization, namely, a memory-rich storage hierarchy. Online Computing Reviews Service

          Access critical reviews of Computing literature here

          Become a reviewer for Computing Reviews.

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Storage
            ACM Transactions on Storage  Volume 2, Issue 3
            August 2006
            149 pages
            ISSN:1553-3077
            EISSN:1553-3093
            DOI:10.1145/1168910
            Issue’s Table of Contents

            Copyright © 2006 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 August 2006
            Published in tos Volume 2, Issue 3

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader