Abstract
We present a scientific workflow data management solution that combines global data access with a block-level optimization of data transfer, wherein only the data blocks that are used by a remote job are transferred over the network, significantly reducing data movement for specific common data access patterns. We propose the implementation of the solution based on the HyperFlow workflow management system and the Onedata data management platform. Preliminary results confirm the advantages of the proposed solution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Balis, B.: Hyperflow: a model of computation, programming approach and enactment engine for complex distributed workflows. Future Gener. Comput. Syst. 55, 147–162 (2016)
Costa, L.B., et al.: The case for workflow-aware storage:an opportunity study. J. Grid Comput. 13(1), 95–113 (2014). https://doi.org/10.1007/s10723-014-9307-6
Deelman, E., Chervenak, A.: Data management challenges of data-intensive scientific workflows. In: 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid, CCGRID, pp. 687–692. IEEE (2008)
Deelman, E., Vahi, K., Juve, G., et al.: Pegasus, a workflow management system for science automation. Future Gener. Comput. Syst. 46, 17–35 (2015)
Dutka, L., et al.: Onedata–a step forward towards globalization of data access for computing infrastructures. Proc. Comput. Sci. 51, 2843–2847 (2015). https://doi.org/10.1016/j.procs.2015.05.445
Ebrahimi, M., Mohan, A., Kashlev, A., Lu, S.: Bdap: a big data placement strategy for cloud-based scientific workflows. In: 2015 IEEE First International Conference on Big Data Computing Service and Applications, pp. 105–114. IEEE (2015)
Ghoshal, D., Ramakrishnan, L.: Madats: managing data on tiered storage for scientific workflows. In: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, pp. 41–52. ACM (2017)
Shibata, T., Choi, S., Taura, K.: File-access patterns of data-intensive workflow applications and their implications to distributed filesystems, pp. 746–755. ACM (2010)
Acknowledgements
This work is supported by the following grants. BB: National Science Centre, Poland, grant 2016/21/B/ST6/01497. LD: 2018–2020’s research funds in the scope of the co-financed international projects framework (projects: XDC 3958/H2020/2018/2 and EOSC-hub 3905/H2020/2018/2). This work was partially supported by the Polish Ministry of Science and Higher Education under subvention funds for the AGH University of Science and Technology.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Orzechowski, M., Baliś, B., Dutka, Ł., Słota, R.G., Kitowski, J. (2020). Transparent Data Access for Scientific Workflows Across Clouds. In: Schwardmann, U., et al. Euro-Par 2019: Parallel Processing Workshops. Euro-Par 2019. Lecture Notes in Computer Science(), vol 11997. Springer, Cham. https://doi.org/10.1007/978-3-030-48340-1_62
Download citation
DOI: https://doi.org/10.1007/978-3-030-48340-1_62
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-48339-5
Online ISBN: 978-3-030-48340-1
eBook Packages: Computer ScienceComputer Science (R0)