Skip to main content

Advertisement

Log in

Reasoning task dependencies for robust service selection in data intensive workflows

  • Published:
Computing Aims and scope Submit manuscript

Abstract

Selecting appropriate services for task execution in workflows should not only consider budget and deadline constraints, but also ensure the best probability that workflow will succeed and minimize the potential loss in case of exceptions. This requirement is more critical for data-intensive applications in grids or clouds since any failure is costly. Therefore, we design a fine-grained risk evaluation model customized for workflows to precisely compute the cost of failure for selected services. In comparison with current course-grained model, ours takes the relation of task dependency into consideration and assigns higher impact factor to tasks at the end. Thereafter, we design the utility function with the model and apply a genetic algorithm to find the optimized service allocations, thereby maximizing the robustness of the workflow while minimizing the possible risk of failure. Experiments and analysis show that the application of customized risk evaluation model into service selection can generally improve the successful probability of a workflow while reducing its exposure to the risk.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Cardoso J, Sheth A, Miller J, Arnold J, Kochut K (2004) Quality of service for workflows and web service processes. Web Semant Sci Serv Agents World Wide Web 1(3):281–308

    Article  Google Scholar 

  2. Deelman E, Gannon D, Shields M, Taylor I (2009) Workflows and e-science: an overview of workflow system features and capabilities. Futur Gener Comput Syst 25(5):528–540

    Article  Google Scholar 

  3. Hoffa C, Mehta G, Freeman T, Deelman E, Keahey K, Berriman B, Good J (2008) On the use of cloud computing for scientific workflows. In: Proceedings of the 2008 fourth IEEE international conference on eScience. IEEE computer society, Washington, DC, USA, pp 640–645

  4. Kllapi H, Sitaridi E, Tsangaris MM, Ioannidis Y (2011) Schedule optimization for data processing flows on the cloud. In: Proceedings of the 2011 ACM international conference on management of data. ACM, New York, pp 289–300

  5. Kokash N, D’Andrea V (2007) Evaluating quality of web services: a risk-driven approach. In: Abramowicz W (ed) Business information systems. Lecture Notes in Computer Science, vol 4439. Springer, Berlin, pp 180–194

  6. Kolisch R, Sprecher A, Drexl A (2005) PSPLIB—project scheduling problem library V2.1. http://129.187.106.231/psplib/. Accessed 28 Mar 2013

  7. Lin C, Lu S (2011) Scheduling scientific workflows elastically for cloud computing. In: Proceedings of 2011 IEEE international conference on cloud, Computing, pp 746–747

  8. Ma H, Schewe KD, Thalheim B, Wang Q (2009) A theory of data-intensive software services. Serv Orient Comput Appl 3(4):263–283

    Article  Google Scholar 

  9. Meffert K, Rotstan N, Knowles C, Sangiorgi UB (2012) JGAP—Java genetic algorithms and genetic programming package V3.6. http://jgap.sourceforge.net/. Accessed 28 Mar 2013

  10. Olston C, Chiou G, Chitnis L, Liu F, Han Y, Larsson M, Neumann A, Rao VB, Sankarasubramanian V, Seth S, Tian C, ZiCornell T, Wang X (2011) Nova: continuous pig/hadoop workflows. In: Proceedings of the 2011 ACM international conference on management of data. ACM, New York,, pp 1081–1090

  11. Pettifer S, Ison J, Kalas M, Thorne D, McDermott P, Jonassen I, Liaquat A, Fernandez JM, Rodriguez JM, Partners I, Pisano DG, Blanchet C, Uludag M, Rice P, Bartaseviciute E, Rapacki K, Hekkelman M, Sand O, Stockinger H, Clegg AB, Bongcam-Rudloff E, Salzemann J, Breton V, Attwood TK, Cameron G, Vriend G (2010) The embrace web service collection. Nucleic Acids Res 38:683–688

    Article  Google Scholar 

  12. Qi L, Lin W, Dou W, Jiang J, Chen J (2011) A QoS-aware exception handling method in scientific workflow execution. Concurr Comput Pract Exp 23(16):1951–1968

    Article  Google Scholar 

  13. Rahman M, Ranjan R, Buyya R (2010) Reputation-based dependable scheduling of workflow applications in peer-to-peer grids. Comput Netw 54:3341–3359

    Article  Google Scholar 

  14. Skene J, Raimondi F, Emmerich W (2010) Service-level agreements for electronic services. IEEE Trans Softw Eng 36(2):288–304

    Article  Google Scholar 

  15. Vanhatalo J, Völzer H, Leymann F, Moser S (2008) Automatic workflow graph refactoring and completion. In: Proceedings of the 6th international conference on service-oriented computing. Springer, Berlin, pp 100–115

  16. Wang M, Ramamohanarao K, Chen J (2009) Trust-based robust scheduling and runtime adaptation of scientific workflow. Concurr Comput Pract Exp 21(16):1982–1998

    Article  Google Scholar 

  17. Wang X, Yeo CS, Buyya R, Su J (2011) Optimizing the makespan and reliability for workflow applications with reputation and a look-ahead genetic algorithm. Futur Gener Comput Syst 27(8):1124–1134

    Article  Google Scholar 

  18. Weißbach M, Zimmermann W (2010) Termination analysis of business process workflows. In: Proceedings of the 5th international workshop on enhanced web service technologies. ACM, New York, pp 18–25

  19. Yeo CS, Buyya R (2007) Integrated risk analysis for a commercial computing service. In: IEEE international parallel and distributed processing symposium, pp 1–10.

  20. Zhang X, Liu C, Nepal S, Chen J (2013a) An efficient quasi-identifier index based approach for privacy preservation over incremental data sets on cloud. J Comput Syst Sci 79(5):542–555

    Article  MATH  MathSciNet  Google Scholar 

  21. Zhang X, Liu C, Nepal S, Pandey S, Chen J (2013b) A privacy leakage upper-bound constraint based approach for cost-effective privacy preserving of intermediate datasets in cloud. IEEE Trans Parallel Distrib Syst 24(6):1192–1202

    Article  Google Scholar 

  22. Zhang X, Yang LT, Liu C, Chen J (2013c), A scalable two-phase top-down specialization approach for data anonymization using mapreduce on cloud. IEEE Trans Parallel Distrib Syst 99 (PrePrints)

Download references

Acknowledgments

The research work reported in this paper is supported by National Science Foundation of China under Grant No. 61100172 and No. 61272512. A preliminary version of this paper appeared in 2012 IPDPS Workshop of Large Scale Distributed Service-oriented Systems.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingzhong Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, M., Zhu, L. & Ramamohanarao, K. Reasoning task dependencies for robust service selection in data intensive workflows. Computing 97, 337–355 (2015). https://doi.org/10.1007/s00607-013-0381-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-013-0381-6

Keywords

Mathematics Subject Classification (2010)

Navigation