skip to main content
10.1145/1654059.1654086acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Adaptive and scalable metadata management to support a trillion files

Published: 14 November 2009 Publication History

Abstract

Nowadays more and more applications require file systems to efficiently maintain million or more files. How to provide high access performance with such a huge number of files and such large directories is a big challenge for cluster file systems. Limited by static directory structures, existing file systems will be prohibitively inefficient for this use. To address this problem, we present a scalable and adaptive metadata management system which aims to maintain a trillion files efficiently. Firstly, our system exploits an adaptive two-level directory partitioning based on extendible hashing to manage very large directories. Secondly, our system utilizes fine-grained parallel processing within a directory and greatly improves performance of file creation or deletion. Thirdly, our system uses multiple-layered metadata cache management which improves memory utilization on the servers. And finally, our system uses a dynamic loadbalance mechanism based on consistent hashing which enables our system to scale up and down easily.
Our performance results on 32 metadata servers show that our user-level prototype implementation can create more than 74 thousand files per second and can get more than 270 thousand files' attributes per second in a single directory with 100 million files. Moreover, it delivers a peak throughput of more than 60 thousand file creates/second in a single directory with 1 billion files.

References

[1]
Large synoptic survey telescope. http://www.lsst.org/lsst, 2008.
[2]
P. Braam, M. Callahan, and P. Schwan. The intermezzo file system. In In Proceedings of the 3rd of the Perl Conference, O'Reilly Open Source Convention, Monterey, 1999.
[3]
Peter J. Braam. The lustre storage architecture. 2004.
[4]
Peter F. Corbett and Dror G Feitelson. The vesta parallel file system. ACM Trans. Comput. Syst., 14(3):225--264, 1996.
[5]
DARPA/IPTO. Exascale computing study: Technology challenges in achieving exascale systems. 2008.
[6]
John R. Douceur and Jon Howell. Distributed directory service in the farsite file system. In OSDI '06: Proceedings of the 7th symposium on Operating systems design and implementation, pages 321--334, Berkeley, CA, USA, 2006. USENIX Association.
[7]
Ronald Fagin, Jurg Nievergelt, Nicholas Pippenger, and H. Raymond Strong. Extendible hashing---a fast access method for dynamic files. ACM Trans. Database Syst., 4(3):315--344, 1979.
[8]
Gregory R. Ganger and M. Frans Kaashoek. Embedded inodes and explicit grouping: exploiting disk bandwidth for small files. In ATEC '97: Proceedings of the annual conference on USENIX Annual Technical Conference, pages 1--17, Berkeley, CA, USA, 1997. USENIX Association.
[9]
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The google file system. In SOSP '03: Proceedings of the nineteenth ACM symposium on Operating systems principles, pages 29--43, New York, NY, USA, 2003. ACM.
[10]
Todd Hoff. Flickr architecture. http://highscalability.com/flickr-architecture, 2007.
[11]
Yu Hua, Yifeng Zhu, Hong Jiang, Dan Feng, and Lei Tian. Scalable and adaptive metadata management in ultra large-scale file systems. In ICDCS '08: Proceedings of the 2008 The 28th International Conference on Distributed Computing Systems, pages 403--410, Washington, DC, USA, 2008. IEEE Computer Society.
[12]
Lawrence Livermore National Laboratory. mdtest-1.7.4. http://sourceforge.net/projects/mdtest/, 2007.
[13]
E. L. Miller and R. H. Katz. Rama: An easy-to-use, high-performance parallel file system. In Parallel Computing, pages 419--446, 1997.
[14]
John K. Ousterhout, Andrew R. Cherenson, Frederick Douglis, Michael N. Nelson, and Brent B. Welch. The sprite network operating system. Computer, 21(2):23--36, 1988.
[15]
Swapnil V. Patil and Garth Gibson. Giga+: scalable directories for shared file systems. http://highscalability.com/flickr-architecture, 2008.
[16]
Swapnil V. Patil, Garth Gibson, Sam Lang, and Milo Polte. Giga+: scalable directories for shared file systems. In PDSW '07: Proceedings of the 2nd international workshop on Petascale data storage, pages 26--29, 2007.
[17]
Brian Pawlowski, Chet Juszczak, Peter Staubach, Carl Smith, Diane Lebel, and David Hitz. Nfs version 3 - design and implementation. In In Proceedings of the Summer USENIX Conference, pages 137--152, 1994.
[18]
Daniel Phillips. A directory index for ext2. In ALS '01: Proceedings of the 5th annual Linux Showcase&Conference, pages 20--20, Berkeley, CA, USA, 2001. USENIX Association.
[19]
O. Rodeh and A. Teperman. zfs - a scalable distributed file system using object disks. In In Proceedings of the 20th IEEE / 11th NASA Goddard Conference on Mass Storage Systems and Technologies, pages 207--218, 2003.
[20]
Frank Schmuck and Roger Haskin. Gpfs: A shared-disk file system for large computing clusters. In FAST '02: Proceedings of the 1st USENIX Conference on File and Storage Technologies, pages 231--244, Berkeley, CA, USA, 2002. USENIX Association.
[21]
Adam Sweeney, Doug Doucette, Wei Hu, Curtis Anderson, Mike Nishimoto, and Geoff Peck. Scalability in the xfs file system. Proceedings. of the USENIX 1996 Annual Technical Conference, 1996.
[22]
Peter Vajgel. Needle in a haystack: efficient storage of billions of photos. http://www.facebook.com/note. php?note_id=76191543919, 2009.
[23]
Sage A. Weil, Kristal T. Pollack, Scott A. Brandt, and Ethan L. Miller. Dynamic metadata management for petabyte-scale file systems. In SC '04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing, page 4, Washington, DC, USA, 2004. IEEE Computer Society.
[24]
Yifeng Zhu, Hong Jiang, Jun Wang, and Feng Xian. Hba: Distributed metadata management for large cluster-based storage systems. IEEE Trans. Parallel Distrib. Syst., 19(6):750--763, 2008.

Cited By

View all
  • (2023)CFS: Scaling Metadata Service for Distributed File System via Pruned Scope of Critical SectionsProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3587443(331-346)Online publication date: 8-May-2023
  • (2023)MUSE: A Programmable Metadata Load Estimation Interface for Ceph File System2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00267(1951-1958)Online publication date: 17-Dec-2023
  • (2021)LunuleProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476196(1-16)Online publication date: 14-Nov-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
November 2009
778 pages
ISBN:9781605587448
DOI:10.1145/1654059
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 November 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

SC '09
Sponsor:

Acceptance Rates

SC '09 Paper Acceptance Rate 59 of 261 submissions, 23%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)2
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)CFS: Scaling Metadata Service for Distributed File System via Pruned Scope of Critical SectionsProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3587443(331-346)Online publication date: 8-May-2023
  • (2023)MUSE: A Programmable Metadata Load Estimation Interface for Ceph File System2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00267(1951-1958)Online publication date: 17-Dec-2023
  • (2021)LunuleProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476196(1-16)Online publication date: 14-Nov-2021
  • (2021)DeltaFSProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476148(1-15)Online publication date: 14-Nov-2021
  • (2021)Workload Evaluation Tool for Metadata Distribution MethodSimulation Tools and Techniques10.1007/978-3-030-72792-5_63(796-810)Online publication date: 27-Apr-2021
  • (2021)Ad-Hoc File Systems At Extreme ScalesHigh Performance Computing in Science and Engineering '1910.1007/978-3-030-66792-4_36(537-549)Online publication date: 30-May-2021
  • (2020)DelveFS - An Event-Driven Semantic File System for Object Stores2020 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER49012.2020.00014(35-46)Online publication date: Sep-2020
  • (2020)GekkoFS — A Temporary Burst Buffer File System for HPC ApplicationsJournal of Computer Science and Technology10.1007/s11390-020-9797-635:1(72-91)Online publication date: 17-Jan-2020
  • (2020)HSM$$^{2}$$2: A Hybrid and Scalable Metadata Management Method in Distributed File SystemsParallel Architectures, Algorithms and Programming10.1007/978-981-15-2767-8_19(195-206)Online publication date: 26-Jan-2020
  • (2020)ADA-FS—Advanced Data Placement via Ad hoc File Systems at Extreme ScalesSoftware for Exascale Computing - SPPEXA 2016-201910.1007/978-3-030-47956-5_4(29-59)Online publication date: 31-Jul-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media