Skip to main content

A New Data Placement Approach for Scientific Workflows in Cloud Computing Environments

  • Conference paper
  • First Online:
Intelligent Systems Design and Applications (ISDA 2016)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 557))

Abstract

The reach of Cloud Computing technologies approved distributing with massive data applications such as Scientific Workflows, which processing huge scientific data in dispersed computing infrastructures. Among the characteristics of Cloud Computing, we mention the elasticity that allows workflows to dynamically stipulate necessary resources for tasks execution. The processing of massive data with scientific workflows increase the data transmission, rise execution delay and it request huge bandwidth cost. So, to reduce the execution cost of workflows and the data movements, data placement optimization technics must be taken into consideration. While placing datasets during execution of tasks for a job in a workflow, there are dependencies between datasets and between tasks. In this paper, we propose a data placement approach based on heuristic genetic algorithm which takes into accounts control and data flow dependency, in order to reduce data movements and so the utilization of resources in cloud environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Mell, P., Grance, T.: The NIST definition of cloud computing recommendations of the National Institute of Standards and Technology. NIST Special Publication, vol. 145, p. 7 (2011)

    Google Scholar 

  2. Haghighat, M., Zonouz, S., Abdel-Mottaleb, M.: Expert systems with applications CloudID: trustworthy cloud-based and cross-enterprise biometric identification. Expert Syst. Appl. 42, 7905–7916 (2015)

    Article  Google Scholar 

  3. Coalition, W.M.: Workflow management coalition terminology and glossary

    Google Scholar 

  4. Ebrahimi, M., Mohan, A., Kashlev, A., Lu, S.: BDAP: a big data placement strategy for cloud-based scientific workflows. In: 2015 IEEE First International Conference on Big Data Computing Service and Applications, pp. 105–114 (2015)

    Google Scholar 

  5. Cui, L., Zhang, J., Yue, L., Shi, Y., Li, H., Yuan, D.: A genetic algorithm based data replica placement strategy for scientific applications in clouds. Trans. Serv. Comput. 1374, 1–13 (2015)

    Google Scholar 

  6. Er-Dun, Z., Yong-Qiang, Q., Xing-Xing, X., Yi, C.: A data placement strategy based on genetic algorithm for scientific workflows. In: 2012 Eighth International Conference on Computational Intelligence and Security, pp. 146–149 (2012)

    Google Scholar 

  7. Ebrahimi, M., Mohan, A., Lu, S., Reynolds, R.: TPS : a task placement strategy for big data workflows, pp. 523–530 (2015)

    Google Scholar 

  8. Dean, J., Ghemawat, S.: MapReduce : simplified data processing on large clusters. In: OSDI 2004: Proceedings of 6th Symposium Conference on Operating Systems Design and Implementation, pp. 6, 1–13 (2004)

    Google Scholar 

  9. Song, J., He, H., Wang, Z., Yu, G., Pierson, J.-M.: Modulo based data placement algorithm for energy consumption optimization of MapReduce system. J. Grid Comput. (2016)

    Google Scholar 

  10. Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-Science: an overview of workflow system features and capabilities. Futur. Gener. Comput. Syst. 25, 528–540 (2009)

    Article  Google Scholar 

  11. Kelly, P.M.: Applying Functional Programming Theory to the Design of Workflow Engines. Science (2011)

    Google Scholar 

  12. Tang, Z., Liu, M., Li, K., Xu, Y.: A MapReduce-enabled scientific workflow framework with optimization scheduling algorithm. In: Proceedings of Parallel and Distributed Computing, Applications and Technologies. PDCAT, pp. 599–604 (2012)

    Google Scholar 

  13. Mitchell, M.: Genetic algorithms: an overview. Complexity 1, 31–39 (1995)

    Article  MATH  Google Scholar 

  14. Atay, Y., Kodaz, H.: Intell. Evol. Syst. 5, 43–55 (2016)

    Article  Google Scholar 

  15. Wang, J., Shang, P., Yin, J.: DRAW: a new data-gRouping-aware data placement scheme for data intensive applications with interest locality. In: Li, X., Qiu, J. (eds.) Cloud Computing for Data-Intensive Applications, pp. 149–174. Springer, New York (2014)

    Google Scholar 

  16. Maheshwari, N., Nanduri, R., Varma, V.: Dynamic energy efficient data placement and cluster reconfiguration algorithm for MapReduce framework. Futur. Gener. Comput. Syst. 28, 119–127 (2012)

    Article  Google Scholar 

  17. He, Y., Lee, R., Huai, Y., Shao, Z., Jain, N., Zhang, X., Xu, Z.: RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems. In: Proceedings of International Conference on Data Engineering, pp. 1199–1208 (2011)

    Google Scholar 

  18. Mohamed, N., Maji, N., Zhang, J., Timoshevskaya, N., Feng, W.C.: Aeromancer: a workflow manager for large-scale MapReduce-based scientific workflows. In: Proceedings of 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications. TrustCom 2014, pp. 739–746 (2015)

    Google Scholar 

  19. Fei, X., Lu, S.: A dataflow-based scientific workflow composition framework. IEEE Trans. Serv. Comput. 5, 45–58 (2012)

    Article  Google Scholar 

  20. Fei, X.F.X., Lu, S.L.S., Lin, C.L.C.: A MapReduce-enabled scientific workflow composition framework. In: 2009 IEEE International Conference on Web Services, pp. 663–670 (2009)

    Google Scholar 

  21. Nguyen, P., Halem, M.: A MapReduce workflow system for architecting scientific data intensive applications. In: Proceedings of the 2nd International Workshop on Software Engineering for Cloud Computing, pp. 57–63 (2011)

    Google Scholar 

  22. Deng, K., Ren, K., Zhu, M., Song, J.: A data and task co-scheduling algorithm for scientific cloud workflows. IEEE Trans. Cloud Comput. 7161, 1 (2015)

    Article  Google Scholar 

  23. Ma, F., Yang, Y., Li, T.: A Data placement method based on bayesian network for data-intensive scientific workflows. In: 2012 International Conference on Computer Science & Service System, pp. 1811–1814 (2012)

    Google Scholar 

  24. Zeng, L., Veeravalli, B., Li, X.: SABA: a security-aware and budget-aware workflow scheduling strategy in clouds. J. Parallel Distrib. Comput. 75, 141–151 (2015)

    Article  Google Scholar 

  25. Yuan, D., Yang, Y., Liu, X., Chen, J.: A data placement strategy in scientific cloud workflows. Futur. Gener. Comput. Syst. 26, 1200–1214 (2010)

    Article  Google Scholar 

  26. Yuan, D., Yang, Y., Liu, X., Chen, J.: On-demand minimum cost benchmarking for intermediate dataset storage in scientific cloud workflow systems. J. Parallel Distrib. Comput. 71, 316–332 (2011)

    Article  MATH  Google Scholar 

  27. Zhao, Q., Xiong, C., Zhao, X., Yu, C., Xiao, J.: A data placement strategy for data-intensive scientific workflows in cloud. In: Proceedings of IEEE/ACM 15th International Symposium on Cluster, Cloud and Grid Computing. CCGrid 2015, pp. 928–934 (2015)

    Google Scholar 

  28. Hadoop. http://hadoop.apache.org/. Accessed 10 Oct 2016

Download references

Acknowledgments

The authors would like to acknowledge the financial support of this work by grants from General Direction of Scientific Research (DGRST), Tunisia, under the ARUB program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamdi Kchaou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Kchaou, H., Kechaou, Z., Alimi, A.M. (2017). A New Data Placement Approach for Scientific Workflows in Cloud Computing Environments. In: Madureira, A., Abraham, A., Gamboa, D., Novais, P. (eds) Intelligent Systems Design and Applications. ISDA 2016. Advances in Intelligent Systems and Computing, vol 557. Springer, Cham. https://doi.org/10.1007/978-3-319-53480-0_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-53480-0_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-53479-4

  • Online ISBN: 978-3-319-53480-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics