Abstract
The reach of Cloud Computing technologies approved distributing with massive data applications such as Scientific Workflows, which processing huge scientific data in dispersed computing infrastructures. Among the characteristics of Cloud Computing, we mention the elasticity that allows workflows to dynamically stipulate necessary resources for tasks execution. The processing of massive data with scientific workflows increase the data transmission, rise execution delay and it request huge bandwidth cost. So, to reduce the execution cost of workflows and the data movements, data placement optimization technics must be taken into consideration. While placing datasets during execution of tasks for a job in a workflow, there are dependencies between datasets and between tasks. In this paper, we propose a data placement approach based on heuristic genetic algorithm which takes into accounts control and data flow dependency, in order to reduce data movements and so the utilization of resources in cloud environments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Mell, P., Grance, T.: The NIST definition of cloud computing recommendations of the National Institute of Standards and Technology. NIST Special Publication, vol. 145, p. 7 (2011)
Haghighat, M., Zonouz, S., Abdel-Mottaleb, M.: Expert systems with applications CloudID: trustworthy cloud-based and cross-enterprise biometric identification. Expert Syst. Appl. 42, 7905–7916 (2015)
Coalition, W.M.: Workflow management coalition terminology and glossary
Ebrahimi, M., Mohan, A., Kashlev, A., Lu, S.: BDAP: a big data placement strategy for cloud-based scientific workflows. In: 2015 IEEE First International Conference on Big Data Computing Service and Applications, pp. 105–114 (2015)
Cui, L., Zhang, J., Yue, L., Shi, Y., Li, H., Yuan, D.: A genetic algorithm based data replica placement strategy for scientific applications in clouds. Trans. Serv. Comput. 1374, 1–13 (2015)
Er-Dun, Z., Yong-Qiang, Q., Xing-Xing, X., Yi, C.: A data placement strategy based on genetic algorithm for scientific workflows. In: 2012 Eighth International Conference on Computational Intelligence and Security, pp. 146–149 (2012)
Ebrahimi, M., Mohan, A., Lu, S., Reynolds, R.: TPS : a task placement strategy for big data workflows, pp. 523–530 (2015)
Dean, J., Ghemawat, S.: MapReduce : simplified data processing on large clusters. In: OSDI 2004: Proceedings of 6th Symposium Conference on Operating Systems Design and Implementation, pp. 6, 1–13 (2004)
Song, J., He, H., Wang, Z., Yu, G., Pierson, J.-M.: Modulo based data placement algorithm for energy consumption optimization of MapReduce system. J. Grid Comput. (2016)
Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-Science: an overview of workflow system features and capabilities. Futur. Gener. Comput. Syst. 25, 528–540 (2009)
Kelly, P.M.: Applying Functional Programming Theory to the Design of Workflow Engines. Science (2011)
Tang, Z., Liu, M., Li, K., Xu, Y.: A MapReduce-enabled scientific workflow framework with optimization scheduling algorithm. In: Proceedings of Parallel and Distributed Computing, Applications and Technologies. PDCAT, pp. 599–604 (2012)
Mitchell, M.: Genetic algorithms: an overview. Complexity 1, 31–39 (1995)
Atay, Y., Kodaz, H.: Intell. Evol. Syst. 5, 43–55 (2016)
Wang, J., Shang, P., Yin, J.: DRAW: a new data-gRouping-aware data placement scheme for data intensive applications with interest locality. In: Li, X., Qiu, J. (eds.) Cloud Computing for Data-Intensive Applications, pp. 149–174. Springer, New York (2014)
Maheshwari, N., Nanduri, R., Varma, V.: Dynamic energy efficient data placement and cluster reconfiguration algorithm for MapReduce framework. Futur. Gener. Comput. Syst. 28, 119–127 (2012)
He, Y., Lee, R., Huai, Y., Shao, Z., Jain, N., Zhang, X., Xu, Z.: RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems. In: Proceedings of International Conference on Data Engineering, pp. 1199–1208 (2011)
Mohamed, N., Maji, N., Zhang, J., Timoshevskaya, N., Feng, W.C.: Aeromancer: a workflow manager for large-scale MapReduce-based scientific workflows. In: Proceedings of 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications. TrustCom 2014, pp. 739–746 (2015)
Fei, X., Lu, S.: A dataflow-based scientific workflow composition framework. IEEE Trans. Serv. Comput. 5, 45–58 (2012)
Fei, X.F.X., Lu, S.L.S., Lin, C.L.C.: A MapReduce-enabled scientific workflow composition framework. In: 2009 IEEE International Conference on Web Services, pp. 663–670 (2009)
Nguyen, P., Halem, M.: A MapReduce workflow system for architecting scientific data intensive applications. In: Proceedings of the 2nd International Workshop on Software Engineering for Cloud Computing, pp. 57–63 (2011)
Deng, K., Ren, K., Zhu, M., Song, J.: A data and task co-scheduling algorithm for scientific cloud workflows. IEEE Trans. Cloud Comput. 7161, 1 (2015)
Ma, F., Yang, Y., Li, T.: A Data placement method based on bayesian network for data-intensive scientific workflows. In: 2012 International Conference on Computer Science & Service System, pp. 1811–1814 (2012)
Zeng, L., Veeravalli, B., Li, X.: SABA: a security-aware and budget-aware workflow scheduling strategy in clouds. J. Parallel Distrib. Comput. 75, 141–151 (2015)
Yuan, D., Yang, Y., Liu, X., Chen, J.: A data placement strategy in scientific cloud workflows. Futur. Gener. Comput. Syst. 26, 1200–1214 (2010)
Yuan, D., Yang, Y., Liu, X., Chen, J.: On-demand minimum cost benchmarking for intermediate dataset storage in scientific cloud workflow systems. J. Parallel Distrib. Comput. 71, 316–332 (2011)
Zhao, Q., Xiong, C., Zhao, X., Yu, C., Xiao, J.: A data placement strategy for data-intensive scientific workflows in cloud. In: Proceedings of IEEE/ACM 15th International Symposium on Cluster, Cloud and Grid Computing. CCGrid 2015, pp. 928–934 (2015)
Hadoop. http://hadoop.apache.org/. Accessed 10 Oct 2016
Acknowledgments
The authors would like to acknowledge the financial support of this work by grants from General Direction of Scientific Research (DGRST), Tunisia, under the ARUB program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Kchaou, H., Kechaou, Z., Alimi, A.M. (2017). A New Data Placement Approach for Scientific Workflows in Cloud Computing Environments. In: Madureira, A., Abraham, A., Gamboa, D., Novais, P. (eds) Intelligent Systems Design and Applications. ISDA 2016. Advances in Intelligent Systems and Computing, vol 557. Springer, Cham. https://doi.org/10.1007/978-3-319-53480-0_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-53480-0_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-53479-4
Online ISBN: 978-3-319-53480-0
eBook Packages: EngineeringEngineering (R0)