ABSTRACT
Recent years have witnessed the fast development of file and storage systems. Many improvements of file and storage systems are inspired by Workload analysis, which reveals the characteristics of I/O behavior. Although cloud storage systems are becoming increasingly prominent, few real-world and large-scale cloud storage workload studies are presented. Alibaba Cloud is one of the world's largest cloud providers, and we have collected and analyzed workloads from Alibaba for an extended period. We observe that modern cloud network architecture can easily handle the peak load during busy festivals. However, the client layer is the system bottleneck during the peak period, which calls for further optimization. We also find that the workload is heavily skewed toward a small percentage of virtual disks, and its distribution conforms 80/20 rule. In summary, the characteristics of such a large-scale cloud storage system in production environments are important for future cloud storage system modifications.
- Hrishikesh Dewan and RC Hansdah. A survey of cloud storage facilities. In 2011 IEEE World Congress on Services, pages 224--231. IEEE, 2011.Google ScholarDigital Library
- Andrew W. Leung, Shankar Pasupathy, Garth R. Goodson, and Ethan L. Miller. Measurement and analysis of large-scale network file system workloads. In 2008 USENIX Annual Technical Conference (USENIX ATC 08), 2008.Google ScholarDigital Library
- Tirthak Patel, Suren Byna, Glenn K. Lockwood, Nicholas J. Wright, Philip Carns, Robert Ross, and Devesh Tiwari. Uncovering access, reuse, and sharing characteristics of i/o-intensive files on large-scale production HPC systems. In 18th USENIX Conference on File and Storage Technologies (FAST 20), pages 91--101, 2020.Google ScholarDigital Library
- ROBERT E. GRUBER. Bigtable: A distributed storage system for structured data. Acm Transactions on Computer Systems, 26(2):p.1--26, 2008.Google ScholarDigital Library
- Brad Calder, Ju Wang, Aaron Ogus, Niranjan Nilakantan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, Shashwat Srivastav, Jiesheng Wu, Huseyin Simitci, et al. Windows azure storage: a highly available cloud storage service with strong consistency. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pages 143--157. ACM, 2011.Google ScholarDigital Library
- Youyou Lu, Jiwu Shu, Youmin Chen, and Tao Li. Octopus: an rdma-enabled distributed persistent memory file system. In 2017 USENIX Annual Technical Conference (USENIX ATC 17), pages 773--785, July 2017.Google ScholarDigital Library
- Hojin Park, Gregory R Ganger, and George Amvrosiadis. More {IOPS} for less: Exploiting burstable storage in public clouds. In 12th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 20), 2020.Google Scholar
- Dulcardo Arteaga and Ming Zhao. Client-side flash caching for cloud systems. In Proceedings of International Conference on Systems and Storage, pages 1--11. ACM, 2014.Google ScholarDigital Library
- Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Marcus Fontoura, and Ricardo Bianchini. Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In Proceedings of the 26th Symposium on Operating Systems Principles, pages 153--167, 2017.Google ScholarDigital Library
- Zhonghong Ou, Meina Song, Zhen-Huan Hwang, Antti Ylä-Jääski, Ren Wang, Yong Cui, and Pan Hui. Is cloud storage ready? performance comparison of representative ip-based storage systems. Journal of Systems and Software, 138:206--221, 2018.Google ScholarCross Ref
- Enrico Bocchi, Idilio Drago, and Marco Mellia. Personal cloud storage benchmarks and comparison. IEEE Transactions on Cloud Computing, 99(99):1--1, 2015.Google Scholar
- Alexandru Uta, Alexandru Custura, Dmitry Duplyakin, Ivo Jimenez, Jan Rellermeyer, Carlos Maltzahn, Robert Ricci, and Alexandru Iosup. Is big data performance reproducible in modern cloud networks? In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), pages 513--527, 2020.Google Scholar
- Lalith Suresh, Marco Canini, Stefan Schmid, and Anja Feldmann. C3: Cutting tail latency in cloud data stores via adaptive replica selection. In Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation, NSDI'15, page 513--527, 2015.Google ScholarDigital Library
- Bin Li, Aditya Ramamoorthy, and R. Srikant. Mean-field-analysis of coding versus replication in cloud storage systems. In IEEE INFOCOM 2016 - IEEE Conference on Computer Communications, 2016.Google ScholarDigital Library
- Yadi Ma, Thyaga Nandagopal, Krishna PN Puttaswamy, and Suman Banerjee. An ensemble of replication and erasure codes for cloud file systems. In 2013 Proceedings IEEE INFOCOM, pages 1276--1284. IEEE, 2013.Google ScholarCross Ref
- Zaoxing Liu, Zhihao Bai, Zhenming Liu, Xiaozhou Li, Changhoon Kim, Vladimir Braverman, Xin Jin, and Ion Stoica. Distcache: Provable load balancing for large-scale storage systems with distributed caching. In 17th USENIX Conference on File and Storage Technologies (FAST 19), pages 143--157, 2019.Google Scholar
- George Amvrosiadis, Jun Woo Park, Gregory R Ganger, Garth A Gibson, Elisabeth Baseman, and Nathan DeBardeleben. On the diversity of cluster workloads and its impact on research results. In 2018 USENIX Annual Technical Conference (USENIX ATC 18), pages 533--546, 2018.Google Scholar
- Songbin Liu, Xiaomeng Huang, Haohuan Fu, and Guangwen Yang. Understanding data characteristics and access patterns in a cloud storage system. In IEEE/ACM International Symposium on Cluster, 2013.Google Scholar
- Jinhong Li, Qiuping Wang, Patrick PC Lee, and Chao Shi. An in-depth analysis of cloud block storage workloads in large-scale production. In 2020 IEEE International Symposium on Workload Characterization (IISWC), pages 37--47. IEEE, 2020.Google ScholarCross Ref
- Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang. Legoos: A disseminated, distributed OS for hardware resource disaggregation. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 69--87, 2018.Google Scholar
- Peter X. Gao, Akshay Narayan, Sagar Karandikar, Joao Carreira, Sangjin Han, Rachit Agarwal, Sylvia Ratnasamy, and Scott Shenker. Network requirements for resource disaggregation. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 249--264, 2016.Google ScholarDigital Library
- Xingda Wei, Zhiyuan Dong, Rong Chen, and Haibo Chen. Deconstructing rdma-enabled distributed transactions: Hybrid is better! In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 233--251, 2018.Google Scholar
- Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou. Apollo: Scalable and coordinated scheduling for cloud-scale computing. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 285--300, 2014.Google ScholarDigital Library
- David MW Powers. Applications and explanations of zipf's law. In New methods in language processing and computational natural language learning, 1998.Google ScholarCross Ref
Index Terms
- Client layer becomes bottleneck: workload analysis of an ultra-large-scale cloud storage system
Recommendations
SSD-based Workload Characteristics and Their Performance Implications
Special Section on Usenix Fast 2020Storage systems are designed and optimized relying on wisdom derived from analysis studies of file-system and block-level workloads. However, while SSDs are becoming a dominant building block in many storage systems, their design continues to build on ...
A Workload Aware Storage Platform for Large Scale Computing Environments: Challenges and Proposed Directions
ScienceCloud '16: Proceedings of the ACM 7th Workshop on Scientific Cloud ComputingTaking advantage of recent developments in Software Defined Storage and Cloud Computing, in this article we present our on-going effort, a Workload Aware Storage Platform (WASP), which aims to provide optimal storage backend assignment for given ...
S2Logger: End-to-End Data Tracking Mechanism for Cloud Data Provenance
TRUSTCOM '13: Proceedings of the 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and CommunicationsThe inability to effectively track data in cloud computing environments is becoming one of the top concerns for cloud stakeholders. This inability is due to two main reasons. Firstly, the lack of data tracking tools built for clouds. Secondly, current ...
Comments