skip to main content
10.1145/2538542.2538562acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Performance and scalability evaluation of the Ceph parallel file system

Published: 17 November 2013 Publication History

Abstract

Ceph is an emerging open-source parallel distributed file and storage system. By design, Ceph leverages unreliable commodity storage and network hardware, and provides reliability and fault-tolerance via controlled object placement and data replication. This paper presents our file and block I/O performance and scalability evaluation of Ceph for scientific high-performance computing (HPC) environments. Our work makes two unique contributions. First, our evaluation is performed under a realistic setup for a large-scale capability HPC environment using a commercial high-end storage system. Second, our path of investigation, tuning efforts, and findings made direct contributions to Ceph's development and improved code quality, scalability, and performance. These changes should benefit both Ceph and the HPC community at large.

References

[1]
IOR HPC benchmark. https://github.com/chaos/ior.
[2]
D. Karger, E. Lehman, T. Leighton, R. Panigrahy, M. Levine, and D. Lewin. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, pages 654--663. ACM, 1997.
[3]
J. Schutt. Understanding delays due to throttling under very heavy write load. http://marc.info/?l=ceph-devel&m=133009796706284&w=2, 2012.
[4]
F. Wang, M. Nelson, S. Oral, D. Fuller, S. Atchley, B. Caldwell, B. Settlemyer, J. Hill, and S. Weil. Ceph parallel file system evaluation report. Technical Report ORNL/TM-2013/151, Oak Ridge National Laboratory, 2013.
[5]
S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long, and C. Maltzahn. Ceph: a scalable, high-performance distributed file system. In Proceedings of the 7th symposium on Operating systems design and implementation, OSDI '06, pages 307--320, Berkeley, CA, USA, 2006. USENIX Association.
[6]
S. A. Weil, S. A. Brandt, E. L. Miller, and C. Maltzahn. Crush: controlled, scalable, decentralized placement of replicated data. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing, SC '06, New York, NY, USA, 2006. ACM.
[7]
S. A. Weil, K. T. Pollack, S. A. Brandt, and E. L. Miller. Dynamic metadata management for petabyte-scale file systems. In Proceedings of the 2004 ACM/IEEE conference on Supercomputing, page 4. IEEE Computer Society, 2004.

Cited By

View all
  • (2021)Supporting SLA via Adaptive Mapping and Heterogeneous Storage Devices in CephElectronics10.3390/electronics1007084710:7(847)Online publication date: 2-Apr-2021
  • (2020)MDLB: a metadata dynamic load balancing mechanism based on reinforcement learningFrontiers of Information Technology & Electronic Engineering10.1631/FITEE.190012121:7(1034-1046)Online publication date: 29-Jul-2020
  • (2020)SLA-Aware Adaptive Mapping Scheme in Bigdata Distributed Storage SystemsThe 9th International Conference on Smart Media and Applications10.1145/3426020.3426053(135-140)Online publication date: 17-Sep-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PDSW '13: Proceedings of the 8th Parallel Data Storage Workshop
November 2013
55 pages
ISBN:9781450325059
DOI:10.1145/2538542
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2013

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SC13

Acceptance Rates

PDSW '13 Paper Acceptance Rate 8 of 16 submissions, 50%;
Overall Acceptance Rate 17 of 41 submissions, 41%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)2
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Supporting SLA via Adaptive Mapping and Heterogeneous Storage Devices in CephElectronics10.3390/electronics1007084710:7(847)Online publication date: 2-Apr-2021
  • (2020)MDLB: a metadata dynamic load balancing mechanism based on reinforcement learningFrontiers of Information Technology & Electronic Engineering10.1631/FITEE.190012121:7(1034-1046)Online publication date: 29-Jul-2020
  • (2020)SLA-Aware Adaptive Mapping Scheme in Bigdata Distributed Storage SystemsThe 9th International Conference on Smart Media and Applications10.1145/3426020.3426053(135-140)Online publication date: 17-Sep-2020
  • (2020)On Fault Tolerance, Locality, and Optimality in Locally Repairable CodesACM Transactions on Storage10.1145/338183216:2(1-32)Online publication date: 22-May-2020
  • (2020)A Content Fingerprint-Based Cluster-Wide Inline Deduplication for Shared-Nothing Storage SystemsIEEE Access10.1109/ACCESS.2020.30390568(209163-209180)Online publication date: 2020
  • (2020)Performance analysis of distributed storage clusters based on kernel and userspace tracesSoftware: Practice and Experience10.1002/spe.288951:1(5-24)Online publication date: 7-Sep-2020
  • (2019)A New Approach to Double I/O Performance for Ceph Distributed File System in Cloud Computing2019 2nd International Conference on Data Intelligence and Security (ICDIS)10.1109/ICDIS.2019.00018(68-75)Online publication date: Jun-2019
  • (2019)Towards Self-Managing Cloud Storage with Reinforcement Learning2019 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E.2019.000-9(34-44)Online publication date: Jun-2019
  • (2019)Optimizing communication performance in scale-out storage systemCluster Computing10.1007/s10586-018-2831-622:2(335-346)Online publication date: 1-Jun-2019
  • (2018)Cudele: An API and Framework for Programmable Consistency and Durability in a Global Namespace2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2018.00105(960-969)Online publication date: May-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media