Abstract
This paper studies intermediate datasets storage problem with linear dataflow in multiple clouds. The proliferation of cloud computing allows users to flexibly store, re-compute or transfer large generated datasets with multiple cloud service providers. However, due to the pay-as-you-go model, the total cost of using cloud services depends on the consumption of storage, computation and bandwidth resources. Given cloud service providers with different pricing models on their resources, users can flexibly choose a cloud service to store a generated dataset, or delete it and then regenerate it when needed, or transfer it to another cloud service in order to reduce the total cost for datasets storage and re-computation. The current best algorithm for finding an optimal strategy of a linear dataflow in multiple clouds takes \(O\left( m^4n^3\right) \), where m is the number of the clouds and n is the number of datasets in a dataflow. In this paper, we present an improved algorithm for the linear dataflow with time complexity \(O\left( m^3n^3\right) \).
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Yuan, D., Yang, Y., Liu, X., et al.: On-demand minimum cost benchmarking for intermediate data storage in scientific cloud workflow systems. J. Parallel Distrib. Comput. 71(2), 316–332 (2011)
Cheng, J., Zhu, D., Zhu, B.: Improved algorithms for intermediate dataset storage in a cloud-based dataflow. Theor. Comput. Sci. 657, 48–53 (2017)
Yuan, D., Yang, Y., Liu, X., et al.: A data dependency based strategy for intermediate data storage in scientific cloud workflow systems. Concurr. Comput.: Pract. Exp. 24(9), 956–976 (2010)
Yuan, D., Cui, L., Li, W., et al.: An algorithm for finding the minimum cost of storing and regenerating datasets in multiple clouds. IEEE Trans. Cloud Comput. (99), 1 (2015)
Deelman, E., Chervenak, A.: Data management challenges of data-intensive scientific workflows. In: IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), Lyon, France, pp. 687–692 (2008). https://doi.org/10.1109/CCGRID.2008.24
Adams, I., Long, D.D.E., Miller, E.L., et al.: Maximizing efficiency by trading storage for computation. In: Workshop on Hot Topics in Cloud Computing (HotCloud 2009), San Diego, CA, pp. 1–5 (2009)
Acknowledgement
The author thanks reviewers for their constructive suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Wang, Y., Cheng, K., Li, Z. (2018). Improved Algorithm for Finding the Minimum Cost of Storing and Regenerating Datasets in Multiple Clouds. In: Wang, L., Zhu, D. (eds) Computing and Combinatorics. COCOON 2018. Lecture Notes in Computer Science(), vol 10976. Springer, Cham. https://doi.org/10.1007/978-3-319-94776-1_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-94776-1_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94775-4
Online ISBN: 978-3-319-94776-1
eBook Packages: Computer ScienceComputer Science (R0)