skip to main content
10.1145/3492323.3495625acmconferencesArticle/Chapter ViewAbstractPublication PagesuccConference Proceedingsconference-collections
research-article

Client layer becomes bottleneck: workload analysis of an ultra-large-scale cloud storage system

Published:07 February 2022Publication History

ABSTRACT

Recent years have witnessed the fast development of file and storage systems. Many improvements of file and storage systems are inspired by Workload analysis, which reveals the characteristics of I/O behavior. Although cloud storage systems are becoming increasingly prominent, few real-world and large-scale cloud storage workload studies are presented. Alibaba Cloud is one of the world's largest cloud providers, and we have collected and analyzed workloads from Alibaba for an extended period. We observe that modern cloud network architecture can easily handle the peak load during busy festivals. However, the client layer is the system bottleneck during the peak period, which calls for further optimization. We also find that the workload is heavily skewed toward a small percentage of virtual disks, and its distribution conforms 80/20 rule. In summary, the characteristics of such a large-scale cloud storage system in production environments are important for future cloud storage system modifications.

References

  1. Hrishikesh Dewan and RC Hansdah. A survey of cloud storage facilities. In 2011 IEEE World Congress on Services, pages 224--231. IEEE, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Andrew W. Leung, Shankar Pasupathy, Garth R. Goodson, and Ethan L. Miller. Measurement and analysis of large-scale network file system workloads. In 2008 USENIX Annual Technical Conference (USENIX ATC 08), 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Tirthak Patel, Suren Byna, Glenn K. Lockwood, Nicholas J. Wright, Philip Carns, Robert Ross, and Devesh Tiwari. Uncovering access, reuse, and sharing characteristics of i/o-intensive files on large-scale production HPC systems. In 18th USENIX Conference on File and Storage Technologies (FAST 20), pages 91--101, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. ROBERT E. GRUBER. Bigtable: A distributed storage system for structured data. Acm Transactions on Computer Systems, 26(2):p.1--26, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Brad Calder, Ju Wang, Aaron Ogus, Niranjan Nilakantan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, Shashwat Srivastav, Jiesheng Wu, Huseyin Simitci, et al. Windows azure storage: a highly available cloud storage service with strong consistency. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pages 143--157. ACM, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Youyou Lu, Jiwu Shu, Youmin Chen, and Tao Li. Octopus: an rdma-enabled distributed persistent memory file system. In 2017 USENIX Annual Technical Conference (USENIX ATC 17), pages 773--785, July 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Hojin Park, Gregory R Ganger, and George Amvrosiadis. More {IOPS} for less: Exploiting burstable storage in public clouds. In 12th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 20), 2020.Google ScholarGoogle Scholar
  8. Dulcardo Arteaga and Ming Zhao. Client-side flash caching for cloud systems. In Proceedings of International Conference on Systems and Storage, pages 1--11. ACM, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Marcus Fontoura, and Ricardo Bianchini. Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In Proceedings of the 26th Symposium on Operating Systems Principles, pages 153--167, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Zhonghong Ou, Meina Song, Zhen-Huan Hwang, Antti Ylä-Jääski, Ren Wang, Yong Cui, and Pan Hui. Is cloud storage ready? performance comparison of representative ip-based storage systems. Journal of Systems and Software, 138:206--221, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  11. Enrico Bocchi, Idilio Drago, and Marco Mellia. Personal cloud storage benchmarks and comparison. IEEE Transactions on Cloud Computing, 99(99):1--1, 2015.Google ScholarGoogle Scholar
  12. Alexandru Uta, Alexandru Custura, Dmitry Duplyakin, Ivo Jimenez, Jan Rellermeyer, Carlos Maltzahn, Robert Ricci, and Alexandru Iosup. Is big data performance reproducible in modern cloud networks? In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), pages 513--527, 2020.Google ScholarGoogle Scholar
  13. Lalith Suresh, Marco Canini, Stefan Schmid, and Anja Feldmann. C3: Cutting tail latency in cloud data stores via adaptive replica selection. In Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation, NSDI'15, page 513--527, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Bin Li, Aditya Ramamoorthy, and R. Srikant. Mean-field-analysis of coding versus replication in cloud storage systems. In IEEE INFOCOM 2016 - IEEE Conference on Computer Communications, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yadi Ma, Thyaga Nandagopal, Krishna PN Puttaswamy, and Suman Banerjee. An ensemble of replication and erasure codes for cloud file systems. In 2013 Proceedings IEEE INFOCOM, pages 1276--1284. IEEE, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  16. Zaoxing Liu, Zhihao Bai, Zhenming Liu, Xiaozhou Li, Changhoon Kim, Vladimir Braverman, Xin Jin, and Ion Stoica. Distcache: Provable load balancing for large-scale storage systems with distributed caching. In 17th USENIX Conference on File and Storage Technologies (FAST 19), pages 143--157, 2019.Google ScholarGoogle Scholar
  17. George Amvrosiadis, Jun Woo Park, Gregory R Ganger, Garth A Gibson, Elisabeth Baseman, and Nathan DeBardeleben. On the diversity of cluster workloads and its impact on research results. In 2018 USENIX Annual Technical Conference (USENIX ATC 18), pages 533--546, 2018.Google ScholarGoogle Scholar
  18. Songbin Liu, Xiaomeng Huang, Haohuan Fu, and Guangwen Yang. Understanding data characteristics and access patterns in a cloud storage system. In IEEE/ACM International Symposium on Cluster, 2013.Google ScholarGoogle Scholar
  19. Jinhong Li, Qiuping Wang, Patrick PC Lee, and Chao Shi. An in-depth analysis of cloud block storage workloads in large-scale production. In 2020 IEEE International Symposium on Workload Characterization (IISWC), pages 37--47. IEEE, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  20. Yizhou Shan, Yutong Huang, Yilun Chen, and Yiying Zhang. Legoos: A disseminated, distributed OS for hardware resource disaggregation. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 69--87, 2018.Google ScholarGoogle Scholar
  21. Peter X. Gao, Akshay Narayan, Sagar Karandikar, Joao Carreira, Sangjin Han, Rachit Agarwal, Sylvia Ratnasamy, and Scott Shenker. Network requirements for resource disaggregation. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 249--264, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Xingda Wei, Zhiyuan Dong, Rong Chen, and Haibo Chen. Deconstructing rdma-enabled distributed transactions: Hybrid is better! In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 233--251, 2018.Google ScholarGoogle Scholar
  23. Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou. Apollo: Scalable and coordinated scheduling for cloud-scale computing. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 285--300, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. David MW Powers. Applications and explanations of zipf's law. In New methods in language processing and computational natural language learning, 1998.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Client layer becomes bottleneck: workload analysis of an ultra-large-scale cloud storage system

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        UCC '21: Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing Companion
        December 2021
        256 pages
        ISBN:9781450391634
        DOI:10.1145/3492323

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 February 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate38of125submissions,30%
      • Article Metrics

        • Downloads (Last 12 months)22
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader