Abstract:
The past decade has witnessed the rapid boom of cloud computing. Many public cloud infrastructures have been implemented and serve millions of tenants. Cloud file systems...View moreMetadata
Abstract:
The past decade has witnessed the rapid boom of cloud computing. Many public cloud infrastructures have been implemented and serve millions of tenants. Cloud file systems, which take charge of petabyte-scale data storage, play a crucial role in the performance of cloud infrastructures. Typical cloud file systems, including GFS, HDFS and Ceph, have attracted notable research efforts for performance evaluation and optimization. However, due to the heterogeneity and complexity of I/O workload characteristics in cloud environments, it is still challenging to conduct an accurate and efficient performance evaluation. To address this problem, we collected a two-week I/O workload trace from a 2,500-node production cluster in AliCloud, which is one of the largest cloud providers in Asia. Using the AliCloud trace, we characterized the I/O workload and data distribution, and compared two cloud services in multiple perspectives, including the request arrival pattern, request size, data population and so on. A list of observations and implications were derived and applied to help design a cloud file system benchmarking suite, called Porcupine. Porcupine aims to deploy a scalable and efficient performance evaluation on cloud file systems using realistic I/O workloads. We conducted a group of validation experiments, which demonstrated that Porcupine can achieve high accuracy and scalability. This paper provides our experiences and lessons in generating I/O workloads and deploying performance tests on cloud file systems, which we believe will be insightful to the cloud computing community in general.
Published in: IEEE Transactions on Parallel and Distributed Systems ( Volume: 28, Issue: 11, 01 November 2017)