ABSTRACT
System provisioning, resource allocation, and system configuration decisions for I/O-intensive workflow applications are complex even for expert users. Users face choices at multiple levels: allocating resources to individual sub-systems (e.g., the application layer, the storage layer) and configuring each of these optimally (e.g., replication level, chunk size, caching policies in case of storage) all having a large impact on overall application performance. This paper presents our progress on addressing the problem of supporting these provisioning, allocation and configuration decisions for workflow applications. To enable selecting a good choice in a reasonable time, we propose an approach that accelerates the exploration of the configuration space based on a low-cost performance predictor that estimates total execution time of a workflow application in a given setup. Our evaluation shows that: (i) the predictor is effective in identifying the desired system configuration, (ii) it can scale to model a workflow application run on an entire cluster, while (iii) using over 2000x less resources (machines x time) than running the actual application.
- DiskSim. http://www.pdl.cmu.edu/DiskSim/.Google Scholar
- The network simulator NS2. http://www.isi.edu/nsnam/ns/, 2012.Google Scholar
- M. Abd-El-Malek, W. V. C. II, C. Cranor, G. R. Ganger, J. Hendricks, A. J. Klosterman, M. P. Mesnier, M. Prasad, B. Salmon, R. R. Sambasivan, S. Sinnamohideen, J. D. Strunk, E. Thereska, M. Wachs, and J. J. Wylie. Ursa minor: Versatile cluster-based storage. In Proc. of the Conf. on File and Storage Technologies, Dec. 2005. Google ScholarDigital Library
- S. Al-Kiswany, A. Gharaibeh, and M. Ripeanu. The case for a versatile storage system. SIGOPS Oper. Syst. Rev., 44:10--14, March 2010. Google ScholarDigital Library
- S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic Local Alignment Search Tool. J. of Molecular Biology, 215(3):403--410, Oct. 1990.Google ScholarCross Ref
- E. Anderson, M. Hobbs, K. Keeton, S. Spence, M. Uysal, and A. Veitch. Hippodrome: Running circles around storage administration. In Proc. of the Conf. on File and Storage Technologies, pages 175--188, 2002. Google ScholarDigital Library
- E. Anderson, S. Spence, R. Swaminathan, M. Kallahalla, and Q. Wang. Quickly finding near-optimal storage designs. ACM Trans. Comput. Syst., 23(4):337--374, Nov 2005. Google ScholarDigital Library
- B. Behzad, H. V. T. Luu, J. Huchette, S. Byna, Prabhat, R. A. Aydt, Q. Koziol, and M. Snir. Taming Parallel I/O Complexity with Auto-Tuning. In SC, 2013. Google ScholarDigital Library
- S. Bharathi, A. Chervenak, E. Deelman, G. Mehta, M.-H. Su, and K. Vahi. Characterization of scientific workflows. In Workflows in Support of Large-Scale Science, 2008. WORKS 2008. 3rd Workshop on, pages 1--10, 2008.Google ScholarCross Ref
- F. Cappello, E. Caron, M. Daydé, F. Desprez, Y. Jégou, P. Primet, E. Jeannot, S. Lanteri, J. Leduc, N. Melab, G. Mornet, R. Namyst, B. Quetier, and O. Richard. Grid'5000: a large scale and highly reconfigurable grid experimental testbed. In Grid Comp.Proc of the 6th IEEE/ACM Intl. Workshop on, 2005. Google ScholarDigital Library
- L. B. Costa, S. Al-Kiswany, R. V. Lopes, and M. Ripeanu. Assessing data deduplication trade-offs from an energy and performance perspective. In 2011 Intl. Green Computing Conf. and Workshops, 2011. Google ScholarDigital Library
- L. B. Costa, A. Barros, S. Al-Kiswany, E. Vairavanathan, and M. Ripeanu. Predicting intermediate storage performance for workflow applications. CoRR, abs/1302.4760, 2013.Google Scholar
- L. B. Costa, J. Brunet, L. Hattori, and M. Ripeanu. Experience on Applying Performance Prediction during Development: a Distributed Storage System Tale. Technical report, UBC/ECE/NetSysLab, Sep. 13. http://www.ece.ubc.ca/~lauroc/tr/tech2.pdf.Google Scholar
- L. B. Costa and M. Ripeanu. Towards Automating the Configuration of a Distributed Storage System. In 11th ACM/IEEE Intl. Conf. on Grid Computing - Grid 2010, Oct. 2010.Google ScholarCross Ref
- I. F. Haddad. Pvfs: A parallel virtual file system for linux clusters. Linux Journal, 2000(80es), Nov. 2000. Google ScholarDigital Library
- A. C. Laity, N. Anagnostou, G. B. Berriman, J. C. Good, J. C. Jacob, D. S. Katz, and T. Prince. Montage: An Astronomical Image Mosaic Service for the NVO. In Astronomical Data Analysis Software and Systems XIV, volume 347 of Astronomical Society of the Pacific Conf. Series, page 34, Dec 2005.Google Scholar
- N. Liu, C. Carothers, J. Cope, P. Carns, R. Ross, A. Crume, and C. Maltzahn. Modeling a leadership scale storage system. In Proc. of the 9th Intl. Conf. on Parallel Processing and Applied Mathematics - Vol. Part I, PPAM'11, pages 10--19, 2012. Google ScholarDigital Library
- N. Liu, J. Cope, P. Carns, C. Carothers, R. Ross, G. Grider, A. Crume, and C. Maltzahn. On the role of burst buffers in leadership class storage systems. In Mass Storage Systems and Technologies (MSST), 2012 IEEE 28th Symp. on, pages 1--11, 2012.Google ScholarCross Ref
- E. Molina-Estolano, C. Maltzahn, J. Bent, and S. Brandt. Building a Parallel File System Simulator. 180(1):012050, 2009.Google Scholar
- A. Montresor and M. Jelasity. PeerSim: A scalable P2P simulator. In Proc. of the 9th Int. Conf. on Peer-to-Peer (P2P'09), pages 99--100, Sep 2009.Google ScholarCross Ref
- T. Shibata, S. Choi, and K. Taura. File-access patterns of data-intensive workflow applications and their implications to distributed filesystems. In Proc. of the 19th ACM Intl. Symp. on High Perf. Distributed Computing, HPDC '10, pages 746--755, 2010. Google ScholarDigital Library
- J. D. Strunk, E. Thereska, C. Faloutsos, and G. R. Ganger. Using utility to provision storage systems. In 6th USENIX Conf. on File and Storage Technologies, FAST, pages 313--328, 2008. Google ScholarDigital Library
- E. Thereska, M. Abd-El-Malek, J. J. Wylie, D. Narayanan, and G. R. Ganger. Informed datamdistribution selection in a self-predicting storagemsystem. In Proc. of the 3rd Intl. Conf. on AutonomicmComputing, pages 187--198, 2006. Google ScholarDigital Library
- E. Thereska, B. Salmon, J. D. Strunk, M. Wachs,M. Abd-El-Malek, J. Lopez, and G. R. Ganger.Stardust: tracking activity in a distributed storagesystem. In SIGMETRICS/Perf., pages 3--14, 2006. Google ScholarDigital Library
- E. Vairavanathan, S. Al-Kiswany, L. B. Costa,Z. Zhang, D. S. Katz, M. Wilde, and M. Ripeanu. Aworkflow-aware storage system: An opportunity study. In Cluster Computing and the Grid, IEEE Intl. Symp. on, pages 326--334, 2012. Google ScholarDigital Library
- A. Varga. Using the OMNeT++ Discrete Event Simulation System in Education. Education, IEEETrans. on, 42(4), 1999. Google ScholarDigital Library
- M. Wilde, M. Hategan, J. M. Wozniak, B. Clifford, D. S. Katz, and I. T. Foster. Swift: A language for distributed parallel scripting. Parallel Computing, 37(9):633--652, 2011. Google ScholarDigital Library
- J. M. Wozniak and M. Wilde. Case studies in storage access by loosely coupled petascale applications. InProc. of the 4th Workshop on Petascale Data Storage,PDSW'09, pages 16--20, 2009. Google ScholarDigital Library
- Z. Zhang, D. S. Katz, M. Wilde, J. M. Wozniak, and I. Foster. MTC Envelope: Defining the Capability ofLarge Scale Computers in the Context of ParallelScripting Applications. In Proc. of the 22Nd Intl. Symp. on High Perf. Parallel and Distributed Computing, HPDC'13, pages 37--48, 2013. Google ScholarDigital Library
- Z. Zhang, D. S. Katz, J. M. Wozniak, A. Espinosa, and I. Foster. Design and analysis of data management in scalable parallel scripting. In Proc. of the Intl. Conf. on High Performance Computing, Networking, Storage and Analysis, SC'12, pages 85:1--85:11, 2012. Google ScholarDigital Library
Index Terms
- Supporting storage configuration for I/O intensive workflows
Recommendations
Support for Provisioning and Configuration Decisions for Data Intensive Workflows
System provisioning, resource allocation, and configuration decisions for I/O-intensive workflow applications are complex even for expert users. Users face choices at multiple levels: allocating resources to individual sub-systems (e.g., the application ...
Adapting scientific workflows on networked clouds using proactive introspection
UCC '15: Proceedings of the 8th International Conference on Utility and Cloud ComputingRecent advances in cloud technologies and on-demand network circuits have created an unprecedented opportunity to enable complex data-intensive scientific applications to run on dynamic, networked cloud infrastructure. However, there is a lack of tools ...
Windows Azure Storage: a highly available cloud storage service with strong consistency
SOSP '11: Proceedings of the Twenty-Third ACM Symposium on Operating Systems PrinciplesWindows Azure Storage (WAS) is a cloud storage system that provides customers the ability to store seemingly limitless amounts of data for any duration of time. WAS customers have access to their data from anywhere at any time and only pay for what they ...
Comments