Skip to main content
Log in

SpeQuloS: a QoS service for hybrid and elastic computing infrastructures

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

The large choice of Distributed Computing Infrastructures (DCIs) available allows users to select and combine their preferred architectures amongst Clusters, Grids, Clouds, Desktop Grids and more. In these hybrid DCIs, elasticity is emerging as a key property. In elastic infrastructures, resources available to execute application continuously vary, either because of application requirements or because of constraints on the infrastructure, such as node volatility.

In the former case, there is no guarantee that the computing resources will remain available during the entire execution of an application. In this paper, we show that Bag-of-Tasks (BoT) execution on these “Best-Effort” infrastructures suffer from a drop of the task completion rate at the end of the execution.

The SpeQuloS service presented in this paper improves the Quality of Service (QoS) of BoT applications executed on hybrid and elastic infrastructures. SpeQuloS monitors the execution of the BoT, and dynamically supplies fast and reliable Cloud resources when the critical part of the BoT is executed. SpeQuloS offers several features to hybrid DCIs users, such as estimating completion time and execution speedup. Performance evaluation shows that BoT executions can be accelerated by a factor 2, while offloading less than 2.5 % of the workload to the Cloud.

We report on several scenarios where SpeQuloS is deployed on hybrid infrastructures featuring a large variety of infrastructures combinations. In the context of the European Desktop Grid Initiative (EDGI), SpeQuloS is operated to improve QoS of Desktop Grids using resources from private Clouds. We present a use case where SpeQuloS uses both EC2 regular and spot instances to decrease the cost of computation while preserving a similar QoS level. Finally, in the last scenario SpeQuloS allows to optimize Grid5000 resources utilization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Algorithm 2
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Algorithm 3
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. http://graal.ens-lyon.fr/~sdelamar/spequlos/.

  2. https://api.grid5000.fr/.

References

  1. Agmon Ben-Yehuda, O., Schuster, A., Sharov, A., Silberstein, M., Iosup, A.: ExPERT: Pareto-efficient task replication on grids and clouds. Technical Report CS-2011-03, Technion (2011)

  2. Amazon Web Services: An introduction to spot instances. Technical Report, Amazon Elastic Compute Cloud (2009)

  3. Ananthanarayanan, G., Kandula, S., Greenberg, A., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in map-reduce clusters using Mantri. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI’10 (2010)

    Google Scholar 

  4. Anderson, D.: BOINC: a system for public-resource computing and storage. In: Proceedings of the 5th IEEE/ACM International GRID Workshop, Pittsburgh, USA (2004)

    Google Scholar 

  5. Andrade, N., Brasileiro, F., Cirne, W., Mowbray, M.: Automatic grid assembly by promoting collaboration in peer-to-peer grids. J. Parallel Distrib. Comput. 67(8), 957–966 (2007)

    Article  MATH  Google Scholar 

  6. Andrade, N., Cirne, W., Brasileiro, F., Roisenberg, P.: OurGrid: an approach to easily assemble grids with equitable resource sharing. In: Proceedings of the 9th Workshop on Job Scheduling Strategies for Parallel Processing (2003)

    Google Scholar 

  7. Anglano, C., Brevik, J., Canonico, M., Nurmi, D., Wolski, R.: Fault-aware scheduling for bag-of-tasks applications on desktop grids. In: Proceedings of the 7th IEEE/ACM International Conference on Grid Computing, GRID ’06 (2006)

    Google Scholar 

  8. Bolze, R., et al.: Grid5000: a large scale highly reconfigurable experimental grid testbed. Int. J. High Perform. Comput. Appl. 20(4), 481–494 (2006)

    Article  Google Scholar 

  9. Brasileiro, F., Duarte, A., Carvalho, D., Barber, R., Scardaci, D.: An approach for the co-existence of service and opportunistic grids: the EELA-2 case. In: Latin-American Grid Workshop (2008)

    Google Scholar 

  10. Calheiros, R.N., Vecchiola, C., Karunamoorthy, D., Buyya, R.: The Aneka platform and QoS-driven resource provisioning for elastic applications on hybrid clouds. Future Gener. Comput. Syst. 28(6), 861–870 (2011)

    Article  Google Scholar 

  11. Capit, N., Da Costa, G., Georgiou, Y., Huard, G., Martin, C., Mounie, G., Neyron, P., Richard, O.: A batch scheduler with high level components. In: Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid’05), Washington, DC, USA (2005)

    Google Scholar 

  12. Delamare, S., Fedak, G., Kondo, D., Lodygensky, O.: SpeQuloS: a QoS service for BoT applications using best effort distributed computing infrastructures. In: Proceedings of the 21st ACM International Symposium on High Performance Distributed Computing (HPDC’12), Delft, The Netherlands, pp. 173–186 (2012)

    Google Scholar 

  13. Dong, F., Akl, S.G.: Scheduling algorithms for grid computing: State of the art and open problems. Technical Report, Queen’s University Kingston (2006)

  14. European desktop grid infrastructure (2010). http://edgi-project.eu/

  15. Estrada, T., Reed, K., Taufer, M.: Modeling job lifespan delays in volunteer computing projects. In: 9th IEEE International Symposium on Cluster Computing and Grid (CCGrid) (2009)

    Google Scholar 

  16. Fedak, G., Germain, C., Neri, V., Cappello, F.: XtremWeb: a generic global computing platform. In: CCGRID’2001 Special Session Global Computing on Personal Devices (2001)

    Google Scholar 

  17. Fishelson, M., Geiger, D.: Exact genetic linkage computations for general pedigrees. Bioinformatics 18(Suppl 1), S189–S198 (2002)

    Article  Google Scholar 

  18. Heien, E., Kondo, D., David, A.: Correlated resource models of Internet end hosts. In: 31st International Conference on Distributed Computing Systems (ICDCS), Minneapolis, Minnesota, USA (2011)

    Google Scholar 

  19. Iosup, A., Li, H., Jan, M., Anoep, S., Dumitrescu, C., Wolters, L., Epema, D.H.: The grid workloads archive. Future Gener. Comput. Syst. 24(7), 672–686 (2008)

    Article  Google Scholar 

  20. Iosup, A., Sonmez, O., Anoep, S., Epema, D.: The performance of bags-of-tasks in large-scale distributed systems. In: Proceedings of the 17th International Symposium on High Performance Distributed Computing, HPDC ’08 (2008)

    Google Scholar 

  21. Islam, M., Balaji, P., Sadayappan, P., Panda, D.: QoPS: a QoS based scheme for parallel job scheduling. In: Job Scheduling Strategies for Parallel Processing. Lecture Notes in Computer Science. Springer, Berlin (2003)

    Google Scholar 

  22. Javadi, B., Kondo, D., Vincent, J., Anderson, D.: Mining for statistical availability models in large-scale distributed systems: an empirical study of SETI@home. In: 17th IEEE/ACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS) (2009)

    Google Scholar 

  23. Kondo, D., Chien, A., Casanova, H.: Resource management for rapid application turnaround on enterprise desktop grids. In: ACM Conference on High Performance Computing and Networking, SC 2004, USA (2004)

    Google Scholar 

  24. Kondo, D., Javadi, B., Iosup, A., Epema, D.: The Failure Trace Archive: enabling comparative analysis of failures in diverse distributed systems. In: 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) (2010)

    Google Scholar 

  25. Kondo, D., Javadi, B., Malecot, P., Cappello, F., Anderson, D.: Cost-benefit analysis of cloud computing versus desktop grids. In: 18th International Heterogeneity in Computing Workshop (2009)

    Google Scholar 

  26. Litzkow, M., Livny, M., Mutka, M.: Condor—a hunter of idle workstations. In: Proceedings of the 8th International Conference of Distributed Computing Systems (ICDCS) (1988)

    Google Scholar 

  27. Mao, M., Humphrey, M.: Auto-scaling to minimize cost and meet application deadlines in cloud workflows. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’11. ACM, New York (2011)

    Google Scholar 

  28. Marosi, A.C., Kacsuk, P.: Workers in the clouds. In: Euromicro Conference on Parallel, Distributed, and Network-Based Processing (2011)

    Google Scholar 

  29. Marshall, P., Keahey, K., Freeman, T.: Elastic site: using clouds to elastically extend site resources. In: Proceedings of CCGrid’2010, Melbourne, Australia (2010)

    Google Scholar 

  30. Marshall, P., Keahey, K., Freeman, T.: Improving utilization of infrastructure clouds. In: IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2011) (2011)

    Google Scholar 

  31. Minh, T.N., Wolters, L.: Towards a profound analysis of bags-of-tasks in parallel systems and their performance impact. In: High-Performance Parallel and Distributed Computing (2011)

    Google Scholar 

  32. Nurmi, D.C., Brevik, J., Wolski, R.: QBETS: queue bounds estimation from time series. In: Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS ’07 (2007)

    Google Scholar 

  33. Oprescu, A.M., Kielmann, T.: Bag-of-tasks scheduling under budget constraints. In: CloudCom (2010)

    Google Scholar 

  34. Palankar, M.R., Iamnitchi, A., Ripeanu, M., Garfinkel, S.: Amazon S3 for science grids: a viable solution? In: Proceedings of the 2008 International Workshop on Data-Aware Distributed Computing, DADC ’08 (2008)

    Google Scholar 

  35. Rood, B., Lewis, M.J.: Multi-state grid resource availability characterization. In: 8th Grid Computing Conference (2007)

    Google Scholar 

  36. Silberstein, M., Sharov, A., Geiger, D., Schuster, A.: GridBot: execution of bags of tasks in multiple grids. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC ’09 (2009)

    Google Scholar 

  37. Urbah, E., Kacsuk, P., Farkas, Z., Fedak, G., Kecskemeti, G., Lodygensky, O., Marosi, A., Balaton, Z., Caillat, G., Gombas, G., Kornafeld, A., Kovacs, J., He, H., Lovas, R.: EDGeS: bridging EGEE to BOINC and XtremWeb. J. Grid Comput. 7, 335–354 (2009)

    Article  Google Scholar 

  38. Vázquez, C., Huedo, E., Montero, R.S., Llorente, I.M.: On the use of clouds for grid resource provisioning. Future Gener. Comput. Syst. 27(5), 600–605 (2011)

    Article  Google Scholar 

  39. Weng, C., Lu, X.: Heuristic scheduling for bag-of-tasks applications in combination with QoS in the computational grid. Future Gener. Comput. Syst. 21(2), 271–280 (2005)

    Article  Google Scholar 

  40. Zaharia, M., Konwinski, A., Joseph, A., Katz, R., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: OSDI’08 (2008)

    Google Scholar 

Download references

Acknowledgements

Authors would like to thank Peter Kacsuk, Jozsef Kovacs, Michela Taufer, Trilce Estrada and Kate Keahey for their insightful comments and suggestions throughout our research and development of SpeQuloS.

Some of the experiments presented in this paper were carried out using the Grid5000 experimental testbed, being developed under the INRIA ALADDIN development action with support from CNRS, RENATER and several Universities as well as other funding bodies.

This work was funded by the EDGI project, supported by the European Commission FP7 Capacities Programme under grant agreement RI-261556.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gilles Fedak.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Delamare, S., Fedak, G., Kondo, D. et al. SpeQuloS: a QoS service for hybrid and elastic computing infrastructures. Cluster Comput 17, 79–100 (2014). https://doi.org/10.1007/s10586-013-0283-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-013-0283-6

Keywords

Navigation