ABSTRACT
Representative modeling of I/O activity is crucial when designing large-scale distributed storage systems. Particularly important use cases are counterfactual "what-if" analyses that assess the impact of anticipated or hypothetical new storage policies or hardware prior to deployment. We propose Thesios, a methodology to accurately synthesize such hypothetical full-resolution I/O traces by carefully combining down-sampled I/O traces collected from multiple disks attached to multiple storage servers. Applying this approach to real-world traces that are already routinely sampled at Google, we show that our synthesized traces achieve 95--99.5% accuracy in read/write request numbers, 90--97% accuracy in utilization, and 80--99.8% accuracy in read latency compared to metrics collected from actual disks. We demonstrate how Thesios enables diverse counterfactual I/O trace synthesis and analyses of hypothetical policy, hardware, and server changes through four case studies: (1) studying the effects of changing disk's utilization, fullness, and capacity, (2) evaluating new data placement policy, (3) analyzing the impact on power and performance of deploying disks with reduced rotations-per-minute (RPM), and (4) understanding the impact of increased buffer cache size on a storage server. Without Thesios, such counterfactual analyses would require costly and potentially risky A/B experiments in production.
- Cristina L Abad, Huong Luu, Nathan Roberts, Kihwal Lee, Yi Lu, and Roy H Campbell. Metadata traces and workload models for evaluating big storage systems. In 2012 IEEE fifth international conference on utility and cloud computing, pages 125--132. IEEE, 2012.Google Scholar
- Nitin Agrawal, Andrea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. Generating realistic impressions for file-system benchmarking. ACM Transactions on Storage (TOS), 5(4):1--30, 2009.Google Scholar
- Eric Anderson, Mahesh Kallahalla, Mustafa Uysal, and Ram Swaminathan. Buttress: A toolkit for flexible and high fidelity i/o benchmarking. In FAST, volume 4, pages 45--58, 2004.Google Scholar
- Akshat Aranya, Charles P Wright, and Erez Zadok. Tracefs: A file system to trace them all. In FAST, pages 129--145, 2004.Google Scholar
- Akshat Aranya, Charles P. Wright, and Erez Zadok. TraceFS sample traces (SNIA IOTTA trace 3). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, February 2007.Google Scholar
- Jens Axboe. Flexible I/O Tester, https://fio.readthedocs.io.Google Scholar
- Shobana Balakrishnan, Richard Black, Austin Donnelly, Paul England, Adam Glass, Dave Harper, Sergey Legtchenko, Aaron Ogus, Eric Peterson, and Antony Rowstron. Pelican: A building block for exascale cold data storage. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 351--365, Broomfield, CO, October 2014. USENIX Association.Google Scholar
- Timothy Bisson, Scott A Brandt, and Darrell DE Long. A hybrid disk-aware spin-down algorithm with i/o subsystem support. In 2007 IEEE International Performance, Computing, and Communications Conference, pages 236--245. IEEE, 2007.Google Scholar
- Peter Bodik, Armando Fox, Michael J Franklin, Michael I Jordan, and David A Patterson. Characterizing, modeling, and generating workload spikes for stateful services. In Proceedings of the 1st ACM symposium on Cloud computing, pages 241--252, 2010.Google ScholarDigital Library
- Dhruba Borthakur et al. Hdfs architecture guide. Hadoop apache project, 53(1-13):2, 2008.Google Scholar
- John S Bucy, Gregory R Ganger, et al. The DiskSim simulation environment version 3.0 reference manual. School of Computer Science, Carnegie Mellon University, 2003.Google Scholar
- Daniel Campello, Hector Lopez, Luis Useche, Ricardo Koller, and Raju Rangaswami. FIU filesystem syscall traces (SNIA IOTTA trace set 5198). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, September 2014.Google Scholar
- Cloud Native Computing Foundation. OpenTelemetry: High-quality, ubiquitous, and portable telemetry to enable effective observability. https://opentelemetry.io/, Accessed Dec 29, 2023.Google Scholar
- D. Colarelli and D. Grunwald. Massive arrays of idle disks for storage archives. In SC '02: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, pages 47--47, 2002.Google ScholarCross Ref
- Junwei Da. Netapp autosupport analysis. 2012.Google Scholar
- Alex Davies and Alessandro Orsaria. Scale out with glusterfs. Linux Journal, 2013(235):1, 2013.Google ScholarDigital Library
- Fred Douglis, Padmanabhan Krishnan, Brian Bershad, et al. Adaptive disk spin-down policies for mobile computers. Computing Systems, 8(4):381--413, 1995.Google Scholar
- Xixhou Feng, Rong Ge, and Kirk W Cameron. Power and energy profiling of scientific applications on distributed systems. In 19th IEEE International Parallel and Distributed Processing Symposium, pages 10--pp. IEEE, 2005.Google Scholar
- Archana Ganapathi, Yanpei Chen, Armando Fox, Randy Katz, and David Patterson. Statistics-driven workload modeling for the cloud. In 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010), pages 87--92. IEEE, 2010.Google ScholarCross Ref
- Gregory R Ganger. Generating representative synthetic workloads: An unsolved problem. In Proc. Computer Measurement Group (CMG) Conference, Dec. 1995, 1995.Google Scholar
- Richard Golding, Peter Bosch, John Wilkes, et al. Idleness is not sloth. In USENIX, pages 201--212. Citeseer, 1995.Google Scholar
- María Engracia Gomez and Vicente Santonja. A new approach in the modeling and generation of synthetic disk workload. In Proceedings 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Cat. No. PR00728), pages 199--206. IEEE, 2000.Google ScholarCross Ref
- Paul M Greenawalt. Modeling power management for hard disks. In Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pages 62--66. IEEE, 1994.Google ScholarCross Ref
- Alireza Haghdoost, Weiping He, Jerry Fredin, and David HC Du. On the accuracy and scalability of intensive i/o workload replay. In FAST, volume 510, pages 315--328, 2017.Google Scholar
- Tyler Harter, Chris Dragga, Michael Vaughn, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. iBench traces (SNIA IOTTA trace 416). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, March 2011.Google Scholar
- Tyler Harter, Brandon Salmon, Rose Liu, Andrea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. Slacker: Fast distribution with lazy docker containers. In 14th USENIX Conference on File and Storage Technologies (FAST 16), pages 181--195, 2016.Google Scholar
- Bo Hong and Tara M Madhyastha. The relevance of long-range dependence in disk traffic and implications for trace synthesis. In 22nd IEEE/13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST'05), pages 316--326. IEEE, 2005.Google ScholarDigital Library
- Bo Hong, Tara M Madhyastha, and Bing Zhang. Cluster-based input/output trace synthesis. In PCCC 2005. 24th IEEE International Performance, Computing, and Communications Conference, 2005., pages 91--98. IEEE, 2005.Google ScholarCross Ref
- Sooman Jeong, Kisung Lee, Seongjin Lee, Seoungbum Son, Samsung Electronics, and Youjip Won. MobiGen traces (SNIA IOTTA trace set 5189). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, January 2013.Google Scholar
- Nikolai Joukov, Timothy Wong, and Erez Zadok. Accurate and efficient replaying of file system traces. In FAST, volume 5, pages 25--25, 2005.Google ScholarDigital Library
- Saurabh Kadekodi, Francisco Maturana, Suhas Jayaram Subramanya, Juncheng Yang, KV Rashmi, and Gregory R Ganger. PACEMAKER: Avoiding heart attacks in storage clusters with disk-adaptive redundancy. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2020.Google Scholar
- Saurabh Kadekodi, Vaishnavh Nagarajan, and Gregory R Ganger. Geriatrix: Aging what you see and what you {don't} see. a file system aging approach for modern storage systems. In 2018 USENIX Annual Technical Conference (USENIX ATC 18), pages 691--704, 2018.Google Scholar
- Saurabh Kadekodi, K V Rashmi, and Gregory R Ganger. Cluster storage systems gotta have HeART: improving storage efficiency by exploiting disk-reliability heterogeneity. In USENIX File and Storage Technologies (FAST), 2019.Google Scholar
- Swaroop Kavalanekar and Bruce Worthington. Microsoft enterprise traces (SNIA IOTTA trace set 130). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, February 2008.Google Scholar
- Swaroop Kavalanekar, Bruce Worthington, Qi Zhang, and Vishal Sharda. Characterization of storage workload traces from production windows servers. In 2008 IEEE International Symposium on Workload Characterization, pages 119--128. IEEE, 2008.Google ScholarCross Ref
- Tracy Kimbrel, Andrew Tomkins, R Hugo Patterson, Brian Bershad, Pei Cao, Edward W Felten, Garth A Gibson, Anna R Karlin, and Kai Li. A trace-driven comparison of algorithms for parallel prefetching and caching. In OSDI, pages 19--34, 1996.Google ScholarDigital Library
- Geoffrey H. Kuenning. Seer traces (ASCII) (SNIA IOTTA trace 4925). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, April 1997.Google Scholar
- Geoffrey H. Kuenning. Seer traces (binary) (SNIA IOTTA trace 1). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, April 1997.Google Scholar
- Geoffrey H. Kuenning. LASR traces (ASCII) (SNIA IOTTA trace set 4924). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, August 2001.Google Scholar
- Geoffrey H. Kuenning. LASR traces (binary) (SNIA IOTTA trace set 4926). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, August 2001.Google Scholar
- Geoffrey H Kuenning, Gerald Popek, and Peter L Reiher. An analysis of trace data for predictive file caching in mobile computing. Computer Science Department, University of California, 1994.Google Scholar
- Chunghan Lee, Tatsuo Kumano, Tatsuma Matsuki, Hiroshi Endo, Naoto Fukumoto, and Mariko Sugawara. Understanding storage traffic characteristics on enterprise virtual desktop infrastructure. In Proceedings of the 10th ACM International Systems and Storage Conference, pages 1--11, 2017.Google ScholarDigital Library
- Sai-Qin Long, Yue-Long Zhao, and Wei Chen. Morm: A multi-objective optimized replication management strategy for cloud storage cluster. J. Syst. Archit., 60(2):234--244, feb 2014.Google ScholarDigital Library
- Yung-Hsiang Lu and Giovanni De Micheli. Adaptive hard disk power management on personal computers. In Proceedings Ninth Great Lakes Symposium on VLSI, pages 50--53. IEEE, 1999.Google Scholar
- Christopher R Lumb, Jiri Schindler, Gregory R Ganger, et al. Freeblock scheduling outside of disk firmware. In USENIX File and Storage Technologies (FAST), 2002.Google Scholar
- Christopher R Lumb, Jiri Schindler, Gregory R Ganger, David F Nagle, and Erik Riedel. Towards higher disk head utilization: extracting free bandwidth from busy disk drives. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2000.Google ScholarCross Ref
- Michael P Mesnier, Matthew Wachs, Raja R Simbasivan, Julio Lopez, James Hendricks, Gregory R Ganger, and David R O'Hallaron. //trace: parallel trace replay with approximate causal events. 2007.Google Scholar
- Lily B. Mummert and Mahadev Satyanarayanan. CMU DFS traces (ASCII) (SNIA IOTTA trace set 5144). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, December 1993.Google Scholar
- Lily B. Mummert and Mahadev Satyanarayanan. CMU DFS traces (binary) (SNIA IOTTA trace set 384). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, December 1993.Google Scholar
- Dushyanth Narayanan, Austin Donnelly, and Antony Rowstron. MSR Cambridge traces (SNIA IOTTA trace set 388). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, March 2007.Google Scholar
- John K Ousterhout, Herve Da Costa, David Harrison, John A Kunze, Mike Kupfer, and James G Thompson. A trace-driven analysis of the unix 4.2 bsd file system. In Proceedings of the tenth ACM symposium on Operating systems principles, pages 15--24, 1985.Google ScholarDigital Library
- Eduardo Pinheiro and Ricardo Bianchini. Energy conservation techniques for disk array-based servers. ICS '04, page 68--78, New York, NY, USA, 2004. Association for Computing Machinery.Google ScholarDigital Library
- Eduardo Pinheiro, Wolf-Dietrich Weber, and Luiz André Barroso. Failure Trends in a Large Disk Drive Population. In USENIX File and Storage Technologies (FAST), 2007.Google ScholarDigital Library
- Jiri Schindler and Gregory R Ganger. Automated disk drive characterization. ACM SIGMETRICS Performance Evaluation Review, 28(1):112--113, 2000.Google ScholarDigital Library
- Bianca Schroeder and Garth A Gibson. Understanding failures in petascale computers. In Journal of Physics: Conference Series. IOP Publishing, 2007.Google Scholar
- Bianca Schroeder, Arif Merchant, and Raghav Lagisetty. Reliability of nand-based ssds: What field studies tell us. Proceedings of the IEEE, 105(9):1751--1769, 2017.Google ScholarCross Ref
- Vishal Sharda, Swaroop Kavalanekar, and Bruce Worthington. Microsoft production server traces (SNIA IOTTA trace set 158). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, March 2008.Google Scholar
- Anton Shilov. Seagate's Roadmap: The Path to 120 TB Hard Drives, https://www.anandtech.com/show/16544/seagates-roadmap-120-tb-hdds.Google Scholar
- Benjamin H Sigelman, Luiz Andre Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag. Dapper, a large-scale distributed systems tracing infrastructure. 2010.Google Scholar
- Tajana Simunic, Luca Benini, Peter Glynn, and Giovanni De Micheli. Dynamic power management of laptop hard disk. In Proceedings Design, Automation and Test in Europe Conference and Exhibition 2000 (Cat. No. PR00537), page 736. IEEE, 2000.Google ScholarCross Ref
- Keith A Smith and Margo I Seltzer. File system aging---increasing the relevance of file system benchmarks. In Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pages 203--213, 1997.Google ScholarDigital Library
- Vasily Tarasov, Koundinya Santhosh Kumar, Erez Zadok, and Geoff Kuenning. T2m: Converting i/o traces to workload models.Google Scholar
- Vasily Tarasov, Santhosh Kumar, Jack Ma, Dean Hildebrand, Anna Povzner, Geoff Kuenning, and Erez Zadok. Extracting flexible, replayable models from large block traces. In FAST, volume 12, page 22, 2012.Google Scholar
- Mojtaba Tarihi, Hossein Asadi, and Hamid Sarbazi-Azad. Diskaccel: Accelerating disk-based experiments by representative sampling. In Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pages 297--308, 2015.Google ScholarDigital Library
- Eno Thereska, Austin Donnelly, and Dushyanth Narayanan. Sierra: Practical power-proportionality for data center storage. In Proceedings of the Sixth Conference on Computer Systems, EuroSys '11, page 169--182, New York, NY, USA, 2011. Association for Computing Machinery.Google ScholarDigital Library
- Eno Thereska, Brandon Salmon, John Strunk, Matthew Wachs, Michael Abd-El-Malek, Julio Lopez, and Gregory R Ganger. Stardust: Tracking activity in a distributed storage system. ACM SIGMETRICS Performance Evaluation Review, 34(1):3--14, 2006.Google ScholarDigital Library
- Beth Trushkowsky, Peter Bodík, Armando Fox, Michael J Franklin, Michael I Jordan, and David A Patterson. The scads director: Scaling a distributed storage system under stringent performance requirements. In FAST, volume 11, pages 163--176, 2011.Google Scholar
- Marc-André Vef, Vasily Tarasov, Dean Hildebrand, and André Brinkmann. Challenges and solutions for tracing storage systems: A case study with spectrum scale. ACM Transactions on Storage (TOS), 14(2):1--24, 2018.Google Scholar
- Alistair Veitch. HP FSTraces (SNIA IOTTA trace set 27419). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, December 2000.Google Scholar
- Akshat Verma, Ricardo Koller, Luis Useche, and Raju Rangaswami. FIU traces (SNIA IOTTA trace set 390). In Geoff Kuenning, editor, SNIA IOTTA Trace Repository. Storage Networking Industry Association, March 2009.Google Scholar
- Mengzhi Wang, Anastassia Ailamaki, and Christos Faloutsos. Capturing the spatio-temporal behavior of real traffic data. Performance Evaluation, 49(1-4):147--163, 2002.Google ScholarDigital Library
- Mengzhi Wang, Tara Madhyastha, Ngai Hang Chan, Spiros Papadimitriou, and Christos Faloutsos. Data mining meets performance evaluation: Fast algorithms for modeling bursty traffic. In Proceedings 18th International Conference on Data Engineering, pages 507--516. IEEE, 2002.Google ScholarCross Ref
- Charles Weddle, Mathew Oldham, Jin Qian, An-I Andy Wang, Peter Reiher, and Geoff Kuenning. Paraid: A gear-shifting power-aware raid. ACM Transactions on Storage (TOS), 3(3):13--es, 2007.Google Scholar
- Sage A Weil, Scott A Brandt, Ethan L Miller, Darrell DE Long, and Carlos Maltzahn. Ceph: A scalable, high-performance distributed file system. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2006.Google ScholarDigital Library
- Jake Wires, Stephen Ingram, Zachary Drudi, Nicholas JA Harvey, and Andrew Warfield. Characterizing storage workloads with counter stacks. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 335--349, 2014.Google Scholar
- Mingli Wu, Zhongmei Zhang, and Yebai Li. Application research of hadoop resource monitoring system based on ganglia and nagios. In 2013 IEEE 4th International Conference on Software Engineering and Service Science, pages 684--688. IEEE, 2013.Google Scholar
- Tao Xie. Sea: A striping-based energy-aware strategy for data placement in raid-structured storage systems. IEEE Transactions on Computers, 57(6):748--761, 2008.Google ScholarDigital Library
- Gala Yadgar, MOSHE Gabel, Shehbaz Jaffer, and Bianca Schroeder. Ssd-based workload characteristics and their performance implications. ACM Transactions on Storage (TOS), 17(1):1--26, 2021.Google Scholar
- Bin Yang, Wei Xue, Tianyu Zhang, Shichao Liu, Xiaosong Ma, Xiyang Wang, and Weiguo Liu. End-to-end i/o monitoring on leading supercomputers. ACM Transactions on Storage, 19(1):1--35, 2023.Google ScholarDigital Library
- John Zedlewski, Sumeet Sobti, Nitin Garg, Fengzhou Zheng, Arvind Krishnamurthy, and Randolph Wang. Modeling {Hard-Disk} power consumption. In 2nd USENIX Conference on File and Storage Technologies (FAST 03), 2003.Google Scholar
- Jianyong Zhang, Anand Sivasubramaniam, Hubertus Franke, Natarajan Gautam, Yanyong Zhang, and Shailabh Nagar. Synthesizing representative i/o workloads for tpc-h. In 10th International Symposium on High Performance Computer Architecture (HPCA'04), pages 142--142. IEEE, 2004.Google ScholarDigital Library
- Ningning Zhu, Jiawu Chen, Tzi-cker Chiueh, and Daniel Ellard. An nfs trace player for file system evaluation. Technical report, Citeseer, 2003.Google Scholar
- Ningning Zhu, Jiawu Chen, Tzi-Cker Chiueh, and Daniel Ellard. Tbbt: Scalable and accurate trace replay for file server evaluation. ACM SIGMETRICS Performance Evaluation Review, 33(1):392--393, 2005.Google ScholarDigital Library
- Qingbo Zhu, Zhifeng Chen, Lin Tan, Yuanyuan Zhou, Kimberly Keeton, and John Wilkes. Hibernator: Helping disk arrays sleep through the winter. In Proceedings of the Twentieth ACM Symposium on Operating Systems Principles, SOSP '05, page 177--190, New York, NY, USA, 2005. Association for Computing Machinery.Google ScholarDigital Library
Recommendations
Accurate and efficient replaying of file system traces
FAST'05: Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4Replaying traces is a time-honored method for benchmarking, stress-testing, and debugging systems--and more recently--forensic analysis. One benefit to replaying traces is the reproducibility of the exact set of operations that were captured during a ...
Stratified sampling of execution traces: Execution phases serving as strata
The understanding of the behavioral aspects of a software system is an important enabler for many reverse engineering activities. The behavior of software is typically represented in the form of execution traces. Traces, however, can be overwhelmingly ...
A Framework for Estimating Execution Times of IO Traces on SSDs
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge ManagementWith the NAND flash memory technology of solid-state drives (SSDs), the usage of SSDs is expanded to various devices. Due to the cost and time limitations of measuring the actual execution time of each application on SSDs, it is difficult for users to ...
Comments