Abstract
Modern, large-scale scientific computing runs on complex exascale storage systems that support even more complex data workloads. Understanding the data access and movement patterns is vital for informing the design of future iterations of existing systems and next-generation systems. Yet we are lacking in publicly available traces and tools to help us understand even one system in depth, let alone correlate long-term cross-system trends.
- CERN Annual report 2017. Tech. rep., CERN, Geneva, 2018.Google Scholar
- Adams, I., Madden, B., Frank, J., Storer, M. W., and Miller, E. L. Usage behavior of a large-scale scientific archive. In Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis (SC12) (Nov. 2012).Google ScholarDigital Library
- Adams, I. F. Understanding Long-Term Storage Access Patterns. PhD thesis, University of California, Santa Cruz, 2013.Google Scholar
- Adams, I. F., Storer, M. W., and Miller, E. L. Analysis of workload behavior in scientific and historical long-term data repositories. ACM Transactions on Storage 8, 2 (2012).Google ScholarDigital Library
- Agrawal, N., Bolosky, W. J., Douceur, J. R., and Lorch, J. R. A five-year study of file-system metadata. In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST '07) (Feb. 2007), pp. 31--45.Google ScholarDigital Library
- Bel, O., Chang, K., Tallent, N., Duellman, D., Miller, E. L., Nawab, F., and Long, D. D. E. Geomancy: Automated performance enhancement through data layout optimization. In Proceeding of the Conference on Mass Storage Systems and Technologies (MSST '20) (Oct. 2020).Google ScholarCross Ref
- Breslau, L., Cao, P., Fan, L., Phillips, G., and Shenker, S. On the Implications of Zipf's Law for Web Caching. In 3rd International WWW Caching Workshop (June 1998).Google Scholar
- Colarelli, D., and Grunwald, D. Massive arrays of idle disks for storage archives. In Proceedings of the 2002 ACM/IEEE Conference on Supercomputing (SC '02) (Nov. 2002).Google ScholarDigital Library
- Grawinkel, M., Nagel, L., Masker, M., Padua, F., Brinkmann, A., and Sorth, L. Analysis of the ECMWF storage landscape. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST '15) (Feb. 2015), pp. 15--26.Google ScholarDigital Library
- Grawinkel, M., Pargmann, M., Domer, H., and Brinkmann, A. Lonestar: an energy-aware disk based long-term archival storage system. In Proceedings of the 17th International Conference on Parallel and Distributed Systems (ICPADS '11) (2011), pp. 380--387.Google ScholarDigital Library
- Jaffe, E., and Kirkpatrick, S. Architecture of the Internet Archive. In Proceedings of The Israeli Experimental Systems Conference (SYSTOR '09) (May 2009).Google ScholarDigital Library
- Jensen, D.W., and Reed, D. A. File archive activity in a supercomputer environment. Tech. Rep. UIUCDCS-R-91--1672, University of Illinois at Urbana-Champaign, Apr. 1991.Google Scholar
- Lamanna, M. The LHC computing grid project at CERN. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 534, 1--2 (2004), 1--6.Google ScholarCross Ref
- Li, Y., Bel, O., Chang, K., Miller, E. L., and Long, D. D. E. CAPES: Unsupervised storage performance tuning using neural network-based deep reinforcement learning. In Proceedings of the 2015 International Conference for High Performance Computing, Networking, Storage and Analysis (SC17) (Nov. 2017).Google ScholarDigital Library
- Miller, E., and Katz, R. An analysis of file migration in a Unix supercomputing environment. In Proceedings of the Winter 1993 USENIX Technical Conference (Jan. 1993), pp. 421--433.Google Scholar
- Peters, A. J., and Janyst, L. Exabyte scale storage at CERN. Journal of Physics: Conference Series 331, 5 (dec 2011), 052015.Google ScholarCross Ref
- Storer, M. W., Greenan, K. M., Miller, E. L., and Voruganti, K. Pergamum: Replacing tape with energy efficient, reliable, disk-based archival storage. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST '08) (Feb. 2008).Google Scholar
Index Terms
- Analysis and Workload Characterization of the CERN EOS Storage System
Recommendations
Analysis and workload characterization of the CERN EOS storage system
CHEOPS '22: Proceedings of the Workshop on Challenges and Opportunities of Efficient and Performant Storage SystemsModern, large-scale scientific computing runs on complex exascale storage systems that support even more complex data workloads. Understanding the data access and movement patterns is vital for informing the design of future iterations of existing ...
Comparative I/O workload characterization of two leadership class storage clusters
PDSW '15: Proceedings of the 10th Parallel Data Storage WorkshopThe Oak Ridge Leadership Computing Facility (OLCF) is a leader in large-scale parallel file system development, design, deployment and continuous operation. For the last decade, the OLCF has designed and deployed two large center-wide parallel file ...
The RAMCloud Storage System
RAMCloud is a storage system that provides low-latency access to large-scale datasets. To achieve low latency, RAMCloud stores all data in DRAM at all times. To support large capacities (1PB or more), it aggregates the memories of thousands of servers ...
Comments