Abstract
Modern computing environments such as clouds, grids or HPC clusters are both complex and costly installations. Therefore, it has always been a major challenge to utilize them properly. Workload scheduling is a critical process in every production system with an unwanted potential to hamper overall performance if the given scheduler is not adequate or properly configured. Therefore, researchers as well as system administrators are frequently using historic workload traces to model/analyze the behavior of real systems in order to improve existing scheduling approaches. In this work we provide such real-life workload traces from the CERIT-SC system. Importantly, our traces describe a “mixed” workload consisting of both cloud VMs and grid jobs executed over a shared computing infrastructure. Provided workloads represent an interesting scheduling problem. First, these mixed workloads involving both “grid jobs” and cloud VMs increase the complexity of required (co)scheduling necessary to efficiently use the underlying physical infrastructure. Second, we also provide a detailed description of the setup of the system, its operational constraints and unresolved issues, putting the observed workloads into a broader context. Last but not least, the workloads are made freely available to the scientific community allowing for further independent research and analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Nice overview can be found at: http://bit.ly/2kLf44d.
- 2.
VM overcommitment factor is computed as \( \text {vCPUs}/\text {CPUs}\).
- 3.
- 4.
It is the dukan cluster which is not part of the CERIT-SC infrastructure but it executes similar workloads from the same user-base.
- 5.
This log is available at: https://github.com/CERIT-SC/cerit-maintenance.
- 6.
For example, real schedulers must limit the number of concurrently running licensed applications (jobs using licensed SW) with respect to the number of available software licenses, i.e., even if resources are free some jobs must wait until a license is available. Such information is not usually recorded in the workload.
- 7.
References
Adaptive Computing Enterprises, Inc.: Torque 6.1.0 Administrator Guide, February 2017. http://docs.adaptivecomputing.com
CERIT Scientific Cloud, February 2017. http://www.cerit-sc.cz
Ernemann, C., Hamscher, V., Yahyapour, R.: Benefits of global Grid computing for job scheduling. In: Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing, GRID 2004, pp. 374–379. IEEE (2004)
Feitelson, D.G.: Parallel workloads archive, February 2017. http://www.cs.huji.ac.il/labs/parallel/workload/
Feitelson, D.G., Rudolph, L., Schwiegelshohn, U., Sevcik, K.C., Wong, P.: Theory and practice in parallel job scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1997. LNCS, vol. 1291, pp. 1–34. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63574-2_14
Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A.D., Katz, R., Shenker, S., Stoica, I.: Mesos: a platform for fine-grained resource sharing in the data center. In: Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI 2011, pp. 295–308, Berkeley, CA, USA. USENIX Association (2011)
Iosup, A., Li, H., Jan, M., Anoep, S., Dumitrescu, C., Wolters, L., Epema, D.H.J.: The Grid workloads archive. Future Gener. Comput. Syst. 24(7), 672–686 (2008)
Jackson, D., Snell, Q., Clement, M.: Core algorithms of the Maui scheduler. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 2001. LNCS, vol. 2221, pp. 87–102. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45540-X_6
Jackson, K.: OpenStack Cloud Computing Cookbook. Packt Publishing, Birmingham (2012)
Jones, C., Wilkes, J., Murphy, N., Smith, C., Beyer, B.: Service level objectives. In: Beyer, B., Jones, C., Petoff, J., Murphy, N. (eds.), Site Reliability Engineering: How Google Runs Production Systems, Chap. 4. O’Reilly Media (2016). https://landing.google.com/sre/book.html
Introducing JSON, February 2017. http://www.json.org/
Klusáček, D.: Workload traces from CERIT Scientific Cloud, February 2017. http://jsspp.org/workload/
Klusáček, D., Chlumský, V.: Planning and metaheuristic optimization in production job scheduler. In: Desai, N., Cirne, W. (eds.) JSSPP 2015-2016. LNCS, vol. 10353, pp. 198–216. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61756-5_11
Klusáček, D., Tóth, Š.: On interactions among scheduling policies: finding efficient queue setup using high-resolution simulations. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014. LNCS, vol. 8632, pp. 138–149. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09873-9_12
Klusáček, D., Tóth, Š., Podolníková, G.: Real-life experience with major reconfiguration of job scheduling system. In: Desai, N., Cirne, W. (eds.) JSSPP 2015-2016. LNCS, vol. 10353, pp. 83–101. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61756-5_5
Krakov, D., Feitelson, D.G.: High-resolution analysis of parallel job workloads. In: Cirne, W., Desai, N., Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2012. LNCS, vol. 7698, pp. 178–195. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35867-8_10
Merkel, D.: Docker: lightweight Linux containers for consistent development and deployment. Linux J. 2014(239), 2 (2014)
MetaCentrum, February 2017. http://www.metacentrum.cz/
Montero, R.S., Llorente, I.M., Miloji, D.: OpenNebula: a cloud management tool. IEEE Internet Comput. 15(2), 11–14 (2011)
Mu’alem, A.W., Feitelson, D.G.: Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Trans. Parallel Distrib.Syst. 12(6), 529–543 (2001)
Managing virtual machines, February 2017. https://archives.opennebula.org/documentation:rel4.4:vm_guide_2
Reiss, C., Wilkes, J., Hellerstein, J.L.: Google cluster-usage traces: format+schema. Technical report, Google Inc., Mountain View, CA, USA, November 2011. Version 2.1. Posted at https://github.com/google/cluster-data. Accessed 17 Nov 2014
Singh, K.: Ceph Cookbook. Packt Publishing, Birmingham (2016)
SWIM workload repository, February 2017. https://github.com/SWIMProjectUCB/SWIM/wiki/Workloads-repository
Wolski, R., Brevik, J.: Using parametric models to represent private cloud workloads. IEEE Trans. Serv. Comput. 7(4), 714–725 (2014)
Acknowledgments
We kindly acknowledge the support and computational resources provided by the MetaCentrum under the program LM2015042 and the CERIT Scientific Cloud under the program LM2015085, provided under the programme “Projects of Large Infrastructure for Research, Development, and Innovations” and the project Reg. No. CZ.02.1.01/0.0/0.0/16_013/0001797 co-funded by the Ministry of Education, Youth and Sports of the Czech Republic. We also highly appreciate the access to CERIT Scientific Cloud workload traces.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Klusáček, D., Parák, B. (2018). Analysis of Mixed Workloads from Shared Cloud Infrastructure. In: Klusáček, D., Cirne, W., Desai, N. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2017. Lecture Notes in Computer Science(), vol 10773. Springer, Cham. https://doi.org/10.1007/978-3-319-77398-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-77398-8_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77397-1
Online ISBN: 978-3-319-77398-8
eBook Packages: Computer ScienceComputer Science (R0)