Characterizing and modeling cloud applications/jobs on a Google data center

Di, Sheng; Kondo, Derrick; Cappello, Franck

doi:10.1007/s11227-014-1131-z

Characterizing and modeling cloud applications/jobs on a Google data center

Published: 29 April 2014

Volume 69, pages 139–160, (2014)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Sheng Di¹,
Derrick Kondo¹ &
Franck Cappello²

905 Accesses
39 Citations
Explore all metrics

Abstract

In this paper, we characterize and model Google applications and jobs, based on a 1-month Google trace from a large-scale Google data center. We address four contributions: (1) we compute the valuable statistics about task events and resource utilization for Google applications, based on various types of resources and execution types; (2) we analyze the classification of applications via a K-means clustering algorithm with optimized number of sets, based on task events and resource usage; (3) we study the correlation of Google application properties and running features (e.g., job priority and scheduling class); (4) we finally build a model that can simulate Google jobs/tasks and dynamic events, in accordance with Google trace. Experiments show that the tasks simulated based on our model exhibit fairly analogous features with those in Google trace. 95+ % of tasks’ simulation errors are \(<\)20 %, confirming a high accuracy of our simulation model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis and Modeling of Heterogeneity from Google Cluster Traces

PerficientCloudSim: a tool to simulate large-scale computation in heterogeneous clouds

Article 10 September 2020

Resource Distribution Estimation for Data-Intensive Workloads: Give Me My Share & No One Gets Hurt!

Notes

Scheduling class (0–3), according to [3], roughly represents how latency sensitive a job/task is, with 3 representing a more latency-sensitive task and 0 representing a non-production task.
Google trace does not expose the exact memory size used by jobs but their scaled values compared to the maximum memory capacity of each node. For example, suppose the maximum memory capacity on a host is 64 GB, 0.05 memory size means \(0.05 \times 64=3.2\) GB.
According to Google trace [4], there are different factors for task interruptions: (1) failure event: a task or job was descheduled (or, in rare cases, ceased to be eligible for scheduling while it was pending) due to a task failure; (2) evict event: a task or job was descheduled because of a higher priority task or job, because the scheduler overcommitted and the actual demand exceeded the machine capacity, because the machine on which it was running became unusable, or because a disk holding the task’s data was lost; (3) kill event: a task or job was canceled or another job or task on which this job was dependent died; (4) lost event: a task or job was presumably terminated with a missing record.

References

Armbrust M, Fox A, Griffith R, Joseph A et al (2009), Above the clouds: a Berkeley view of cloud computing. EECS, University of California, Berkeley, Technical Report. UCB/EECS-2009-28
Vaquero L, Rodero-Merino L, Caceres J, Lindner M (2009) A break in the clouds: towards a cloud definition. SIGCOMM Comput Commun Rev 39(1):50–55
Article Google Scholar
Wilkes J (2011) More Google cluster data. Google research blog. http://googleresearch.blogspot.com/2011/11/more-google-cluster-data.html
Reiss C, Wilkes J, Hellerstein J (2012) Google cluster-usage traces: format + schema. Google Inc., Mountain View, USA, Technical Report
Di S, Kondo D, Cirne W (2012) Characterization and comparison of cloud versus grid workloads. IEEE international conference on cluster computing (cluster’12), pp 230–238
Meng X, Isci C, Kephart J, Zhang L, Bouillet E, Pendarakis D (2010) Efficient resource provisioning in compute clouds via vm multiplexing. In: Proceedings of the 7th international conference on autonomic computing (ICAC’10), New York, ACM, pp 11–20
Buyya R, Ranjan R, Calheiros R (2010) Intercloud: utility-oriented federation of cloud computing environments for scaling of application services. In: 10th international conference on algorithms and architectures for parallel processing (ICA3PP’10), pp 13–31
Stillwell M, Vivien F, Casanova H (2012) Virtual machine resource allocation for service hosting on heterogeneous distributed platforms. In: Proceedings of IEEE 26th international conference on parallel distributed processing symposium (IPDPS’12), pp 786–797
Calheiros R, Ranjan R, Beloglazov A, De-Rose C, Buyya R (2011) Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw Pract Exp 41(1):23–50
Article Google Scholar
Di S, Wang C-L (2013) Dynamic optimization of multi-attribute resource allocation in self-organizing clouds. IEEE Trans Parallel Distrib Syst (TPDS) 24(3):464–478
Article Google Scholar
Dean J, Ghemawat S (2004) MapReduce: Simplified data processing on large clusters. In: 5th USENIX symposium on operating systems design and implementation (OSDI’04), pp 137–150
Reiss C, Tumanov A, Ganger G, Katz R, Kozuch M (2012) Towards understanding heterogeneous clouds at scale: Google trace analysis. Intel science and technology center for cloud computing. Carnegie Mellon University, Pittsburgh, Technical Report ISTC-CC-TR-12-101
Feitelson D (2011) Workload modeling for computer systems performance evaluation. http://www.cs.huji.ac.il/~feit/wlmod/
Koch R (1997) The 80/20 principle: the secret of achieving more with less. Nicholas Brealey
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp 281–297
Okabe A, Boots B, Sugihara K, Chiu S (2000) Spatial tessellations: concepts and applications of voronoi diagrams, 2nd edn. Series in probability and statistics. Wiley, England
Ross S (2010) Introduction to probability models, 10th edn. Academic Press, Burlington
MATH Google Scholar
Sharma B, Chudnovsky V, Hellerstein J, Rifaat R, Das C (2011) Modeling and synthesizing task placement constraints in google compute clusters. In: Proceedings of the 2nd ACM symposium on cloud computing (SOCC’11), New York, ACM, pp 3:1–3:14
Mishra A, Hellerstein J, Cirne W, Das C-R (2010) Towards characterizing cloud backend workloads: insights from Google compute clusters. SIGMETRICS Perform Eval Rev 37(4):34–41
Article Google Scholar
Zhang Q, Hellerstein J.L., Boutaba R (2011) Characterizing task usage shapes in google compute clusters. Large scale distributed systems and middleware, workshop (LADIS’11)
Liu Z, Cho S (2012) Characterizing machines and workloads on a Google cluster. In: 8th international workshop on scheduling and resource management for parallel and distributed systems (SRMPDS’12), pp 397–403
Ganapathi A, Chen Y, Fox A, Katz RH, Patterson DA (2010) Statistics-driven workload modeling for the cloud. ICDE workshops’10, pp 87–92
Shvachko K, Kuang H, Radia S, and Chansler R (2010) The hadoop distributed file system. In: IEEE 26th symposium on mass storage systems and technologies (MSST’10), pp 1–10
Li A, Zong X, Kandula S, Yang X, Zhang M (2011) Cloudprophet: Towards application performance prediction in cloud. ACM SIGCOMM student poster, pp 426–427
Jackson K.R., Ramakrishnan L, Muriki K at al (2010) Performance analysis of high performance computing applications on the amazon web services cloud. In: Proceedings of the IEEE 2nd international conference on cloud computing technology and science (CloudCom’10). Washington, DC, IEEE Computer Society, pp 159–168
Hamerly G, Elkan C (2002) Alternatives to the k-means algorithm that find better clusterings. In: Proceedings of the 17th international conference on Information and knowledge management (CIKM’02), New York, ACM, pp 600–607

Download references

Acknowledgments

We thank Google Inc, in particular Charles Reiss and John Wilkes, for making their invaluable trace data available. This work is supported by ANR project Clouds@home (ANR-09-JCJC-0056-01), also in part by the Advanced Scientific Computing Research Program, Office of Science, U.S. Department of Energy, under Contract DE-AC02-06CH11357, and by the INRIA-Illinois Joint Laboratory for Petascale Computing. This paper has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory (“Argonne”). Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government.

Author information

Authors and Affiliations

INRIA, Paris, France
Sheng Di & Derrick Kondo
Argonne National Laboratory, Lemont, USA
Franck Cappello

Authors

Sheng Di
View author publications
You can also search for this author in PubMed Google Scholar
Derrick Kondo
View author publications
You can also search for this author in PubMed Google Scholar
Franck Cappello
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sheng Di.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Di, S., Kondo, D. & Cappello, F. Characterizing and modeling cloud applications/jobs on a Google data center. J Supercomput 69, 139–160 (2014). https://doi.org/10.1007/s11227-014-1131-z

Download citation

Published: 29 April 2014
Issue Date: July 2014
DOI: https://doi.org/10.1007/s11227-014-1131-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Characterizing and modeling cloud applications/jobs on a Google data center

Abstract

Access this article

Similar content being viewed by others

Analysis and Modeling of Heterogeneity from Google Cluster Traces

PerficientCloudSim: a tool to simulate large-scale computation in heterogeneous clouds

Resource Distribution Estimation for Data-Intensive Workloads: Give Me My Share & No One Gets Hurt!

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Characterizing and modeling cloud applications/jobs on a Google data center

Abstract

Access this article

Similar content being viewed by others

Analysis and Modeling of Heterogeneity from Google Cluster Traces

PerficientCloudSim: a tool to simulate large-scale computation in heterogeneous clouds

Resource Distribution Estimation for Data-Intensive Workloads: Give Me My Share & No One Gets Hurt!

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation