Abstract
Selecting appropriate services for task execution in workflows should not only consider budget and deadline constraints, but also ensure the best probability that workflow will succeed and minimize the potential loss in case of exceptions. This requirement is more critical for data-intensive applications in grids or clouds since any failure is costly. Therefore, we design a fine-grained risk evaluation model customized for workflows to precisely compute the cost of failure for selected services. In comparison with current course-grained model, ours takes the relation of task dependency into consideration and assigns higher impact factor to tasks at the end. Thereafter, we design the utility function with the model and apply a genetic algorithm to find the optimized service allocations, thereby maximizing the robustness of the workflow while minimizing the possible risk of failure. Experiments and analysis show that the application of customized risk evaluation model into service selection can generally improve the successful probability of a workflow while reducing its exposure to the risk.
Similar content being viewed by others
References
Cardoso J, Sheth A, Miller J, Arnold J, Kochut K (2004) Quality of service for workflows and web service processes. Web Semant Sci Serv Agents World Wide Web 1(3):281–308
Deelman E, Gannon D, Shields M, Taylor I (2009) Workflows and e-science: an overview of workflow system features and capabilities. Futur Gener Comput Syst 25(5):528–540
Hoffa C, Mehta G, Freeman T, Deelman E, Keahey K, Berriman B, Good J (2008) On the use of cloud computing for scientific workflows. In: Proceedings of the 2008 fourth IEEE international conference on eScience. IEEE computer society, Washington, DC, USA, pp 640–645
Kllapi H, Sitaridi E, Tsangaris MM, Ioannidis Y (2011) Schedule optimization for data processing flows on the cloud. In: Proceedings of the 2011 ACM international conference on management of data. ACM, New York, pp 289–300
Kokash N, D’Andrea V (2007) Evaluating quality of web services: a risk-driven approach. In: Abramowicz W (ed) Business information systems. Lecture Notes in Computer Science, vol 4439. Springer, Berlin, pp 180–194
Kolisch R, Sprecher A, Drexl A (2005) PSPLIB—project scheduling problem library V2.1. http://129.187.106.231/psplib/. Accessed 28 Mar 2013
Lin C, Lu S (2011) Scheduling scientific workflows elastically for cloud computing. In: Proceedings of 2011 IEEE international conference on cloud, Computing, pp 746–747
Ma H, Schewe KD, Thalheim B, Wang Q (2009) A theory of data-intensive software services. Serv Orient Comput Appl 3(4):263–283
Meffert K, Rotstan N, Knowles C, Sangiorgi UB (2012) JGAP—Java genetic algorithms and genetic programming package V3.6. http://jgap.sourceforge.net/. Accessed 28 Mar 2013
Olston C, Chiou G, Chitnis L, Liu F, Han Y, Larsson M, Neumann A, Rao VB, Sankarasubramanian V, Seth S, Tian C, ZiCornell T, Wang X (2011) Nova: continuous pig/hadoop workflows. In: Proceedings of the 2011 ACM international conference on management of data. ACM, New York,, pp 1081–1090
Pettifer S, Ison J, Kalas M, Thorne D, McDermott P, Jonassen I, Liaquat A, Fernandez JM, Rodriguez JM, Partners I, Pisano DG, Blanchet C, Uludag M, Rice P, Bartaseviciute E, Rapacki K, Hekkelman M, Sand O, Stockinger H, Clegg AB, Bongcam-Rudloff E, Salzemann J, Breton V, Attwood TK, Cameron G, Vriend G (2010) The embrace web service collection. Nucleic Acids Res 38:683–688
Qi L, Lin W, Dou W, Jiang J, Chen J (2011) A QoS-aware exception handling method in scientific workflow execution. Concurr Comput Pract Exp 23(16):1951–1968
Rahman M, Ranjan R, Buyya R (2010) Reputation-based dependable scheduling of workflow applications in peer-to-peer grids. Comput Netw 54:3341–3359
Skene J, Raimondi F, Emmerich W (2010) Service-level agreements for electronic services. IEEE Trans Softw Eng 36(2):288–304
Vanhatalo J, Völzer H, Leymann F, Moser S (2008) Automatic workflow graph refactoring and completion. In: Proceedings of the 6th international conference on service-oriented computing. Springer, Berlin, pp 100–115
Wang M, Ramamohanarao K, Chen J (2009) Trust-based robust scheduling and runtime adaptation of scientific workflow. Concurr Comput Pract Exp 21(16):1982–1998
Wang X, Yeo CS, Buyya R, Su J (2011) Optimizing the makespan and reliability for workflow applications with reputation and a look-ahead genetic algorithm. Futur Gener Comput Syst 27(8):1124–1134
Weißbach M, Zimmermann W (2010) Termination analysis of business process workflows. In: Proceedings of the 5th international workshop on enhanced web service technologies. ACM, New York, pp 18–25
Yeo CS, Buyya R (2007) Integrated risk analysis for a commercial computing service. In: IEEE international parallel and distributed processing symposium, pp 1–10.
Zhang X, Liu C, Nepal S, Chen J (2013a) An efficient quasi-identifier index based approach for privacy preservation over incremental data sets on cloud. J Comput Syst Sci 79(5):542–555
Zhang X, Liu C, Nepal S, Pandey S, Chen J (2013b) A privacy leakage upper-bound constraint based approach for cost-effective privacy preserving of intermediate datasets in cloud. IEEE Trans Parallel Distrib Syst 24(6):1192–1202
Zhang X, Yang LT, Liu C, Chen J (2013c), A scalable two-phase top-down specialization approach for data anonymization using mapreduce on cloud. IEEE Trans Parallel Distrib Syst 99 (PrePrints)
Acknowledgments
The research work reported in this paper is supported by National Science Foundation of China under Grant No. 61100172 and No. 61272512. A preliminary version of this paper appeared in 2012 IPDPS Workshop of Large Scale Distributed Service-oriented Systems.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, M., Zhu, L. & Ramamohanarao, K. Reasoning task dependencies for robust service selection in data intensive workflows. Computing 97, 337–355 (2015). https://doi.org/10.1007/s00607-013-0381-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-013-0381-6