Skip to main content
Log in

ACO-DPDGW: an ant colony optimization algorithm for data placement of data-intensive geospatial workflow

  • Methodology Article
  • Published:
Earth Science Informatics Aims and scope Submit manuscript

Abstract

Massive data transmission between distributed data centers is the major efficiency bottleneck of geospatial workflow. Although many data placement methods have been proposed to overcome this problem, few researches have considered the impact of the structure of the workflow. In this paper, we define the problem of data placement for data-intensive geospatial workflow aiming to minimize the data transfer time. An algorithm called ant colony optimization based data placement of data-intensive geospatial workflow (ACO-DPDGW) is proposed to handle this problem. By taking advantage of the node vector to represent the traditional workflow model, the ants could place datasets and tasks in appropriate data centers according to the combination of pheromone information and heuristic information, when they visit the nodes randomly. To prevent premature convergence, a variable neighborhood search operation is embedded into ACO-DPDGW. The experiments show that our algorithm can reduce data transfer volume and data transfer time even as the numbers of datasets, tasks, and data centers increase.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  • Altintas I, Berkley C, Jaeger E, et al. (2004) Kepler: an extensible system for design and execution of scientific workflows[C]//proceedings. 16th international conference on scientific and statistical database management, 2004. IEEE, 423–424

  • Altintas I, Block J, De Callafon R et al (2015) Towards an integrated cyberinfrastructure for scalable data-driven monitoring, dynamic prediction and resilience of wildfires[J]. Procedia Comput Sci 51:1633–1642

    Article  Google Scholar 

  • Atrey A, Van Seghbroeck G, Volckaert B, et al. (2018) Scalable data placement of data-intensive Services in geo-distributed Clouds[C]//CLOSER2018, the 8th international conference on cloud computing and services science. SCITEPRESS-Science and Technology Publications, 497–508

  • Bousrih A, Brahmi Z. (2015) Optimizing cost and response time for data intensive services' composition based on ABC algorithm[C]//Information & Communication Technology and accessibility (ICTA), 2015 5th international conference on. IEEE, 1–6

  • Chen W, Paik I, Li Z (2016) Tology-aware optimal data placement algorithm for network traffic optimization[J]. IEEE Trans Comput 65(8):2603–2617

    Article  Google Scholar 

  • Chen J, Zhang J, Song A. (2017) Efficient data and task co-scheduling for scientific workflow in geo-distributed datacenters[C]//advanced cloud and big data (CBD), 2017 fifth international conference on. IEEE, 63–68

  • Cowart C, Block J, Crawl D, et al. (2015) geoKepler Workflow Module for Computationally Scalable and Reproducible Geoprocessing and Modeling[C]//AGU Fall Meeting Abstracts

  • Davies DK, Ilavajhala S, Wong MM et al (2009) Fire information for resource management system: archiving and distributing MODIS active fire data[J]. IEEE Trans Geosci Remote Sens 47(1):72–79

    Article  Google Scholar 

  • Davila CC, Reinhart CF, Bemis JL (2016) Modeling Boston: a workflow for the efficient generation and maintenance of urban building energy models from existing geospatial datasets[J]. Energy 117:237–250

    Article  Google Scholar 

  • Deelman E, Chervenak A. (2008) Data management challenges of data-intensive scientific workflows[C]//cluster computing and the grid, 2008. CCGRID'08. 8th IEEE international symposium on. IEEE, 687–692

  • Deelman E, Gannon D, Shields M, Taylor I (2009) Workflows and e-science: an overview of workflow system features and capabilities[J]. Futur Gener Comput Syst 25(5):528–540

    Article  Google Scholar 

  • Deng K, Ren K, Song J, Yuan D, Xiang Y, Chen J (2013) A clustering based coscheduling strategy for efficient scientific workflow execution in cloud computing[J]. Concurr Comput: Pract E 25(18):2523–2539

    Article  Google Scholar 

  • Deng K, Ren K, Zhu M, et al. (2015) A data and task co-scheduling algorithm for scientific cloud workflows[J]. IEEE Trans Cloud Comput (1): 1–1

  • Dorigo M (1996) The any system optimization by a colony of cooperating agents[J]. IEEE Trans Syst Man Cybern B 26:1): 1–1):13

    Article  Google Scholar 

  • Ebrahimi M, Mohan A, Kashlev A, et al. (2015) BDAP: a big data placement strategy for cloud-based scientific workflows[C]//big data computing service and applications (BigDataService), 2015 IEEE first international conference on. IEEE, 105–114

  • Er-Dun Z, Yong-Qiang Q, Xing-Xing X, et al. (2012) A data placement strategy based on genetic algorithm for scientific workflows[C]//computational intelligence and security (CIS), 2012 eighth international conference on IEEE, 146–149

  • Gao Y, Guan H, Qi Z et al (2013) A multi-objective ant colony system algorithm for virtual machine placement in cloud computing[J]. J Comput Syst Sci 79(8):1230–1242

    Article  Google Scholar 

  • Gutjahr WJ (2002) ACO algorithms with guaranteed convergence to the optimal solution[J]. Inf Process Lett 82(3):145–153

    Article  Google Scholar 

  • Hamrouni T, Slimani S, Charrada FB (2015) A data mining correlated patterns-based periodic decentralized replication strategy for data grids[J]. J Syst Softw 110:10–27

    Article  Google Scholar 

  • Jiang L, Yue P, Kuhn W, Zhang C, Yu C, Guo X (2018) Advancing interoperability of geospatial data provenance on the web: gap analysis and strategies[J]. Comput Geosci 117:21–31

    Article  Google Scholar 

  • Kalra M, Singh S (2015) A review of metaheuristic scheduling techniques in cloud computing[J]. Egypt Inf J 16(3):275–295

    Article  Google Scholar 

  • Lee JG, Kang M (2015) Geospatial big data: challenges and opportunities[J]. Big Data Research 2(2):74–81

    Article  Google Scholar 

  • Li S, Dragicevic S, Castro FA, Sester M, Winter S, Coltekin A, Pettit C, Jiang B, Haworth J, Stein A, Cheng T (2016a) Geospatial big data handling theory and methods: a review and research challenges[J]. ISPRS J Photogramm Remote Sens 115:119–133

    Article  Google Scholar 

  • Li X, Zhang L, Wu Y, et al. (2016b) A novel workflow-level data placement strategy for data-sharing scientific cloud workflows[J]. IEEE Trans Serv Comput

  • Liu XF, Zhan ZH, Deng Jeremiah D et al An energy efficient ant Colony system for virtual machine placement in cloud computing[J]. IEEE Trans Evol Comput 22(1):113–128

    Article  Google Scholar 

  • Mladenović N, Hansen P (1997) Variable neighborhood search[J]. Comput Oper Res 24(11):1097–1100

    Article  Google Scholar 

  • Pisinger D (2005) Where are the hard knapsack problems?[J]. Comput Oper Res 32(9):2271–2284

    Article  Google Scholar 

  • Shabeera TP, Kumar SDM, Salam SM et al (2016) Optimizing VM Allocation and Data Placement for Data-Intensive Applications in Cloud using ACO Metaheuristic Algorithm[J]. Eng Sci Technol Int J 20(2):616–628

    Article  Google Scholar 

  • Shibata T, Choi S J, Taura K. (2010) File-access patterns of data-intensive workflow applications and their implications to distributed filesystems[C]//proceedings of the 19th ACM international symposium on high performance distributed computing. ACM, 746–755

  • Shirasuna S, Gannon D (2006) Xbaya: a graphical workflow composer for the web services architecture[J]. Indiana University

  • Tawfeek MA, El-Sisi AB, Keshk AE et al (2014) Virtual machine placement based on ant colony optimization for minimizing resource wastage[C]//international conference on advanced machine learning technologies and applications. Springer, Cham, pp 153–164

    Google Scholar 

  • Teylo L, de Paula U, Frota Y, de Oliveira D, Drummond LMA (2017) A hybrid evolutionary algorithm for task scheduling and data assignment of data-intensive scientific workflows on clouds[J]. Futur Gener Comput Syst 76:1–17

    Article  Google Scholar 

  • van Der Aalst WMP, Ter Hofstede AHM, Kiepuszewski B et al (2003) Workflow patterns[J]. Distrib Parallel Databases 14(1):5–51

    Article  Google Scholar 

  • Wang L, Shen J, Beydoun G (2013) Enhanced ant colony algorithm for cost-aware data-intensive service provision[C]//2013 IEEE ninth world congress on services. IEEE, 227–234

  • Wang T, Yao S, Xu Z, Jia S (2016) DCCP: an effective data placement strategy for data-intensive computations in distributed cloud computing systems[J]. J Supercomput 72(7):2537–2564

    Article  Google Scholar 

  • Wei-Neng CHEN, Zhang J (2008) An ant Colony optimization approach to a grid workflow scheduling problem with various QoS requirements[J]. IEEE Tran Syst Man Cybern C 39(1):29–43

    Article  Google Scholar 

  • Xu Q, Xu Z, Wang T (2015) A data-placement strategy based on genetic algorithm in cloud computing[J]. Int J Intell Sci 5(03):145–157

    Article  Google Scholar 

  • Yuan D, Yang Y, Liu X, Chen J (2010) A data placement strategy in scientific cloud workflows[J]. Futur Gener Comput Syst 26(8):1200–1214

    Article  Google Scholar 

  • Yue P, Zhang M, Tan Z (2015) A geoprocessing workflow system for environmental monitoring and integrated modelling[J]. Environ Model Softw 69:128–140

    Article  Google Scholar 

  • Zeng L, Veeravalli B, Zomaya AY (2015) An integrated task computation and data management scheduling strategy for workflow applications in cloud environments[J]. J Netw Comput Appl 50:39–48

    Article  Google Scholar 

  • Zhang XL, Chen XF, He ZJ (2010) An ACO-based algorithm for parameter optimization of support vector machines[J]. Expert Syst Appl 37(9):6618–6628

    Article  Google Scholar 

  • Zhang J, Wang M, Luo J, Dong F, Zhang J (2015) Towards optimized scheduling for data-intensive scientific workflow in multiple datacenter environment[J]. Concurr Comput: Pract E 27(18):5606–5622

    Article  Google Scholar 

  • Zhao Q, Xiong C, Zhao X, et al. (2015) A data placement strategy for data-intensive scientific workflows in cloud[C]//cluster, cloud and grid computing (CCGrid), 2015 15th IEEE/ACM international symposium on. IEEE, 928–934

  • Zhao Q, Xiong C, Wang P (2016) Heuristic data placement for data-intensive applications in heterogeneous cloud[J]. J Electr Comput Eng 2016:1–8

    Google Scholar 

Download references

Acknowledgments

The research was supported by Key Science and Technology Plan Projects of Fujian Province (2015H0015), Education and Technology Plan Projects of Fujian Province (JAT160088), and Foundation of China Scholarship Council (201706655035).

Author information

Authors and Affiliations

Authors

Contributions

Xiaozhu Wu and Ying Liu conceived, designed and performed the experiments. All of the authors analyzed the data. Xiaozhu Wu wrote the paper. Xiaozhu Wu and Ying Liu revised the paper.

Corresponding author

Correspondence to Xiaozhu Wu.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, X., Liu, Y. & Chen, C. ACO-DPDGW: an ant colony optimization algorithm for data placement of data-intensive geospatial workflow. Earth Sci Inform 12, 641–658 (2019). https://doi.org/10.1007/s12145-019-00401-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12145-019-00401-3

Keywords

PACS

Navigation