Consolidated cluster systems for data centers in the cloud age: a survey and analysis

Lin, Jian; Zha, Li; Xu, Zhiwei

doi:10.1007/s11704-012-2086-y

Consolidated cluster systems for data centers in the cloud age: a survey and analysis

Review Article
Published: 28 November 2012

Volume 7, pages 1–19, (2013)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Jian Lin^1,2,
Li Zha¹ &
Zhiwei Xu¹

392 Accesses
17 Citations
Explore all metrics

Abstract

In the cloud age, heterogeneous application modes on large-scale infrastructures bring about the challenges on resource utilization and manageability to data centers. Many resource and runtime management systems are developed or evolved to address these challenges and relevant problems from different perspectives. This paper tries to identify the main motivations, key concerns, common features, and representative solutions of such systems through a survey and analysis. A typical kind of these systems is generalized as the consolidated cluster system, whose design goal is identified as reducing the overall costs under the quality of service premise. A survey on this kind of systems is given, and the critical issues concerned by such systems are summarized as resource consolidation and runtime coordination. These two issues are analyzed and classified according to the design styles and external characteristics abstracted from the surveyed work. Five representative consolidated cluster systems from both academia and industry are illustrated and compared in detail based on the analysis and classifications. We hope this survey and analysis to be conducive to both design implementation and technology selection of this kind of systems, in response to the constantly emerging challenges on infrastructure and application management in data centers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Data center network architecture in cloud computing: review, taxonomy, and open research issues

Article 23 September 2014

Han Qi, Muhammad Shiraz, … Torki A. Altameem

Virtual Clusters: Isolated, Containerized HPC Environments in Kubernetes

Comparing various approaches to resource allocation in data centers

Article 01 September 2014

P. M. Vdovin, I. A. Zotov, … R. L. Smelyanskiy

References

Hindman B, Konwinski A, Zaharia M, Ghodsi A, Joseph A, Katz R, Shenker S, Stoica I. Mesos: a platform for fine-grained resource sharing in the data center. In: Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI’11. 2011
Google Scholar
Murthy A C, Douglas C, Konar M, O’Malley O, Radia S, Agarwal S, V V K. Architecture of next generation apache hadoop MapReduce framework. Technical report, Apache Hadoop community, 2011
Google Scholar
Lu X, Lin J, Zha L, Xu Z. Vega LingCloud: a resource single leasing point system to support heterogeneous application modes on shared infrastructure. In: Proceedings of the 9th International Symposium on Parallel and Distributed Processing with Applications, ISPA’11. 2011, 99–106
Google Scholar
Chase J S, Irwin D E, Grit L E, Moore J D, Sprenkle S E. Dynamic virtual clusters in a grid site manager. In: Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing, HPDC’03. 2003, 90–100
Chapter Google Scholar
Ramakrishnan L, Koelbel C, Kee Y, Wolski R, Nurmi D, Gannon D, Obertelli G, YarKhan A, Mandal A, Huang T M, Thyagaraja K, Zagorodnov D. VGrADS: enabling e-Science workflows on grids and clouds with fault tolerance. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC’09. 2009
Google Scholar
Kim H, el-Khamra Y, Jha S, Parashar M. An autonomic approach to integrated HPC grid and cloud usage. In: Proceedings of the 5th IEEE International Conference on e-Science, e-Science’09. 2009, 366–373
Google Scholar
Lu X, Lin J, Zha L. Architecture and key technologies of LingCloud. Journal of Computer Research and Development, 2011, 48(7): 1111–1122
Google Scholar
Baker M, Buyya R. Cluster computing at a glance. In: Buyya R, ed. High Performance Cluster Computing: Architectures and Systems, volume 2. Prentice Hall PTR, 1999, 3–47
Google Scholar
Beloglazov A, Buyya R, Lee Y C, Zomaya A. A taxonomy and survey of energy-efficient data centers and cloud computing systems. In: Zelkowitz M V ed. Advances in Computers, Volume 82. Elsevier B.V., 2011, 47–111
Chapter Google Scholar
Wang L, Zhan J, Shi W, Liang Y. In cloud, can scientific communities benefit from the economies of scale? IEEE Transactions on Parallel and Distributed Systems, 2012, 23(2): 296–303
Article Google Scholar
Krauter K, Buyya R, Maheswaran M. A taxonomy and survey of grid resource management systems for distributed computing. Software: Practice and Experience, 2002, 32(2): 135–164
Article MATH Google Scholar
Barham P, Dragovic B, Fraser K, Hand S, Harris T, Ho A, Neugebauer R, Pratt I, Warfield A. Xen and the art of virtualization. In: Proceedings of the 19th ACM Symposium on Operating Systems Principles, SOSP’03. 2003, 164–177
Google Scholar
VMware virtualization software. http://www.vmware.com/
Kivity A, Kamay Y, Laor D, Lublin U, Liguori A. KVM: the Linux virtual machine monitor. In: Proceedings of the 9th Annual Ottawa Linux Symposium, OLS’07. 2007, 225–230
Google Scholar
Mell P, Grance T. The NIST definition of cloud computing. Technical Report SP 800-145, Information Technology Laboratory, National Institute of Standards and Technology, 2011
Google Scholar
Silberstein M, Geiger D, Schuster A, Livny M. Scheduling mixed workloads in multi-grids: the grid execution hierarchy. In: Proceedings of the 15th IEEE International Symposium on High Performance Distributed Computing, HPDC’06. 2006, 291–302
Google Scholar
Manyika J, Chui M, Brown B, Bugin J, Dobbs R, Roxburgh C, Byers A H. Big data: the next frontier for innovation, competition, and productivity. Technical report, McKinsey Global Institute, 2011
Google Scholar
Litzkow M, Livny M, Mutka M. Condor-a hunter of idle workstations. In: Proceedings of the 8th International Conference of Distributed Computing Systems, ICDCS’88. 1988, 104–111
Google Scholar
Oracle Corporation. Oracle grid engine: an overview. Technical report, 2010
Google Scholar
Foster I, Zhao Y, Raicu I, Lu S. Cloud computing and grid computing 360-degree compared. In: Proceedings of Grid Computing Environments Workshop, GCE’08. 2008
Google Scholar
Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th USENIX Symposium on Operating Systems Design & Implementation, OSDI’04. 2004
Google Scholar
Apache Hadoop. http://hadoop.apache.org/
Peng D, Dabek F. Large-scale incremental processing using distributed transactions and notifications. In: Proceedings of the 9th USENIX Symposium on Operating Systems Design & Implementation, OSDI’10. 2010
Google Scholar
Neumeyer L, Robbins B, Nair A, Kesari A. S4: distributed stream computing platform. In: Proceedings of 2010 IEEE International Conference on Data Mining Workshops, ICDMW’10. 2010, 170–177
Chapter Google Scholar
Gropp W, Lusk E, Skjellum A. Using MPI: Portable Parallel Programming with the Message-Passing Interface. MIT Press, 1994
Google Scholar
MPICH2: High-performance and widely portable MPI. http://www.mcs.anl.gov/research/projects/mpich2/
Graham R L, Shipman G M, Barrett B, Castain R H, Bosilca G, Lumsdaine A. Open MPI: a high-performance, heterogeneous MPI. In: Proceedings of 2006 IEEE International Conference on Cluster Computing, Cluster’06. 2006
Google Scholar
Armbrust M, Fox A, Griffith R, Joseph A, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I, Zaharia M. Above the clouds: a berkeley view of cloud computing. Technical Report UCB/EECS-2009-28, EECS Department, University of California, Berkeley, 2009
Google Scholar
Wentzlaff D, Gruenwald III C, Beckmann N, Modzelewski K, Belay A, Youseff L, Miller J, Agarwal A. An operating system for multicore and clouds: mechanisms and implementation. In: Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC’10. 2010, 3–14
Chapter Google Scholar
Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems, EuroSys’10. 2010, 265–278
Chapter Google Scholar
Benson T, Akella A, Maltz D A. Network traffic characteristics of data centers in the wild. In: Proceedings of the 10th Annual Conference on Internet Measurement, IMC’10. 2010, 267–280
Chapter Google Scholar
Boutaba R, Cheng L, Zhang Q. On cloud computational models and the heterogeneity challenge. Journal of Internet Services and Applications, 2012, 3(1): 77–86
Article Google Scholar
Zaharia M, Konwinski A, Joseph A D, Katz R, Stoica I. Improving MapReduce performance in heterogeneous environments. In: Proceedings of the 8th USENIX conference on Operating Systems Design & Implementation, OSDI’08. 2008
Google Scholar
Fan Z, Qiu F, Kaufman A, Yoakum-Stover S. GPU cluster for high performance computing. In: Proceedings of the ACM/IEEE Conference on Supercomputing, SC’04. 2004
Google Scholar
Liu J, Chandrasekaran B, Wu J, Jiang W, Kini S, Yu W, Buntinas D, Wyckoff P, Panda D K. Performance comparison of MPI implementations over InfiniBand, myrinet and quadrics. In: Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, SC’03. 2003
Google Scholar
Greenberg A, Hamilton J, Maltz D A, Patel P. The cost of a cloud: research problems in data center networks. ACM SIGCOMM Computer Communication Review, 2008, 39(1): 68–73
Article Google Scholar
Abadi D J. Data management in the cloud: limitations and opportunities. IEEE Data Engineering Bulletin, 2009, 32(1): 3–12
Google Scholar
Buyya R, Beloglazov A, Abawajy J H. Energy-efficient management of data center resources for cloud computing: a vision, architectural elements, and open challenges. In: Proceedings of the 2010 International Conference on Parallel and Distributed Processing Techniques and Applications, PDPTA’10. 2010, 6–20
Google Scholar
Ramgovind S, Eloff M M, Smith E. The management of security in cloud computing. In: Proceedings of the 9th Annual Information Security for South Africa Conference, ISSA’10. 2010
Google Scholar
Mehta S, Neogi A. ReCon: a tool to recommend dynamic server consolidation in multi-cluster data centers. In: Proceedings of the 11th IEEE/IFIP Network Operations and Management Symposium, NOMS’08. 2008, 363–370
Google Scholar
Zhan J, Wang L, Tu B, Li Y, Wang P, Zhou W, Meng D. Phoenix cloud: consolidating different computing loads on shared cluster system for large organization. In: Proceedings of the 1st Workshop on Cloud Computing and Its Applications, CCA’08. 2008
Google Scholar
Calheiros R N, Ranjan R, Beloglazov A, De Rose C A F, Buyya R. CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software: Practice and Experience, 2011, 41(1): 23–50
Article Google Scholar
Livny M. Condor and the cloud-the challenges and the roadmap of condor. http://www.grid.org.il/_Uploads/dbsAttachedFiles/Condor-Cloud-IGT.pdf, 2009
Google Scholar
Linux containers. http://lxc.sourceforge.net/
Koziolek H. Performance evaluation of component-based software systems: a survey. Performance Evaluation, 2010, 67(8): 634–658
Article Google Scholar
Huai Y, Lee R, Zhang S, Xia C H, Zhang X. DOT: a matrix model for analyzing, optimizing and deploying software for big data analytics in distributed systems. In: Proceedings of the 2nd ACM Symposium on Cloud Computing, SoCC’11. 2011, 1–14
Chapter Google Scholar
Thain D, Tannenbaum T, Livny M. Distributed computing in practice: the condor experience. Concurrency and Computation: Practice and Experience, 2005, 17(2–4): 323–356
Article Google Scholar
Youseff L, Butrico M, Da Silva D. Toward a unified ontology of cloud computing. In: Proceedings of Grid Computing Environments Workshop, GCE’08. 2008
Google Scholar
Apache Mesos: dynamic resource sharing for clusters. http://incubator.apache.org/mesos/
Lee G, Chun B, Katz R H. Heterogeneity-aware resource allocation and scheduling in the cloud. In: Proceedings of the 3rd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud’11. 2011
Google Scholar
Zaharia M, Chowdhury M, Franklin M J, Shenker S, Stoica I. Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud’10. 2010
Google Scholar
Apache ZooKeeper. http://zookeeper.apache.org/
Murthy A C. The next generation of apache hadoop MapReduce. http://developer.yahoo.com/blogs/hadoop/posts/2011/02/mapreducenextgen/, 2011
Apache HBase. http://hbase.apache.org/
Seo S, Yoon E J, Kim J, Jin S, Kim J, Maeng S. HAMA: an efficient matrix computation with the MapReduce framework. In: Proceedings of the 2nd International Conference on Cloud Computing Technology and Science, CloudCom’10. 2010, 721–726
Chapter Google Scholar
Apache giraph. http://incubator.apache.org/giraph/
Pandey J. RPC improvements and wire compatibility in apache hadoop. http://hortonworks.com/blog/rpc-improvements-and-wire-compatibility-in-apache-hadoop/, 2012
Google Scholar
Wright D. Cheap cycles from the desktop to the dedicated cluster: combining opportunistic and dedicated scheduling with Condor. In: Proceedings of the LCI International Conference on Linux Clusters: The HPC Revolution. 2001
Google Scholar
Thain G. Condor integrated with hadoop’s map reduce. http://research.cs.wisc.edu/condor/CondorWeek2010/condor-presentations/thain-condor-hadoop.pdf, 2010
Google Scholar
Foster I, and Kesselman C. Globus: a metacomputing infrastructure toolkit. International Journal of Supercomputer Applications, 1997, 11(2): 115–128
Article Google Scholar
Henderson R. Job scheduling under the portable batch system. In: Feitelson D, Rudolph L, eds. Job Scheduling Strategies for Parallel Processing. LNCS. Springer Berlin / Heidelberg, 1995, 949: 279–294
Article Google Scholar
Coleman N, Raman R, Livny M, Solomon M. Distributed policy management and comprehension with classified advertisements. Technical Report UW-CS-TR-1481, Computer Sciences Department, University of Wisconsin-Madison, 2003
Google Scholar
Couvares P, Kosar T, Roy A, Weber J, Wenger K. Workflow management in condor. In: Taylor I J, Deelman E, Gannon D B, Shields M, eds. Workflows for e-Science. Springer London, 2007, 357–375
Chapter Google Scholar
Basney J, Livny M. Deploying a high throughput computing cluster. In: Buyya R, ed. High Performance Cluster Computing: Architectures and Systems, Volume 1. Prentice Hall PTR, 1999, 116–134
Google Scholar
Farrellee M. Condor: cloud scheduler. http://spinningmatt.files.wordpress.com/2010/04/matthewfarrelleeopensourcecloudcomputingforum-10feb2010.pdf, 2010
Google Scholar
Open grid scheduler: the official open source grid engine. http://gridscheduler.sourceforge.net/
Son of grid engine. https://arc.liv.ac.uk/trac/SGE
Sun microsystems. Sun ONE grid engine, enterprise edition administration and user’s guide. Technical Report 816-4739-11, 2002
Google Scholar
Troger P, Rajic H, Haas A, Domagalski P. Standardization of an API for distributed resource management systems. In: Proceedings of the 7th IEEE International Symposium on Cluster Computing and the Grid, CCGRID’07. 2007, 619–626
Chapter Google Scholar
Gentzsch W. Sun grid engine: towards creating a compute power grid. In: Proceedings of the 1st IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGIRD’01 2001, 35–36
Oracle Corporation. Extreme scalability using oracle grid engine software: managing extreme workloads. Technical report, 2010
Google Scholar
Templeton D. Intro to service domain manager. http://blogs.oracle.com/templedf/entry/service_domain_manager, 2010
Google Scholar
Sotomayor B, Montero R S, Llorente I M, Foster I. Virtual infrastructure management in private and hybrid clouds. IEEE Internet Computing, 2009, 13(5): 14–22
Article Google Scholar
Mugler J, Naughton T, Scott S L. OSCAR meta-package system. In: Proceedings of the 19th International Symposium on High Performance Computing Systems and Applications, HPCS’05. 2005, 353–360
Chapter Google Scholar
Massie ML, Chun B N, Culler D E. The ganglia distributed monitoring system: design, implementation, and experience. Parallel Computing, 2004, 30(7): 817–840.
Article Google Scholar
Zha L, Li W, Yu H, Xie X, Xiao N, Xu Z. System software for China national grid. In: Proceedings of IFIP International Conference on Network and Parallel Computing, NPC’05. 2005, 14–21
Google Scholar
Lin J, Lu X, Yu L, Zou Y, Zha L. VegaWarden: a uniform user management system for cloud applications. In: Proceedings of the 5th IEEE International Conference on Networking, Architecture and Storage, NAS’10. 2010, 457–464
Chapter Google Scholar
Yu L, Zha L, Wang X, Zhou H, Zou Y. GOS security: design and implementation. In: Proceedings of the 15th International Conference on Parallel and Distributed Systems, ICPADS’09. 2009, 955–960
Chapter Google Scholar
Steinder M, Whalley I, Carrera D, Gaweda I, Chess D. Server virtualization in autonomic management of heterogeneous workloads. In: Proceedings of the 10th IFIP/IEEE International Symposium on Integrated Network Management, IM’07. 2007, 139–148
Chapter Google Scholar
Mateescu G, Gentzsch W, Ribbens C J. Hybrid computing-where HPC meets grid and cloud computing. Future Generation Computer Systems, 2011, 27(5): 440–453
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Jian Lin, Li Zha & Zhiwei Xu
Graduate University of Chinese Academy of Sciences, Beijing, 100049, China
Jian Lin

Authors

Jian Lin
View author publications
You can also search for this author in PubMed Google Scholar
Li Zha
View author publications
You can also search for this author in PubMed Google Scholar
Zhiwei Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jian Lin.

Additional information

Jian Lin is a PhD candidate in computer architecture at Institute of Computing Technology, Chinese Academy of Sciences. His current research interests include distributed software architecture, large-scale resource management, and security technologies in grid and cloud computing systems.

Li Zha obtained his PhD in 2003, and is an associate professor of Institute of Computing Technology, Chinese Academy of Sciences. He has been the project leader of several national level research programs. His research is focused on large-scale distributed resource management, data storage/processing/retrieval and system level optimization. His interests also include other classic issues in distributed computing and grid computing field.

Zhiwei Xu received the PhD from University of Southern California in 1987. He is currently a professor of Institute of Computing Technology, Chinese Academy of Sciences. His research interests include network computing, distributed operating systems, and high-performance computer architecture. His editorial board services include the IEEE Transactions on Services Computing, Journal of Grid Computing, Journal of Computer Science and Technology, and Journal of Computer Research and Development. He is a senior member of the IEEE.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, J., Zha, L. & Xu, Z. Consolidated cluster systems for data centers in the cloud age: a survey and analysis. Front. Comput. Sci. 7, 1–19 (2013). https://doi.org/10.1007/s11704-012-2086-y

Download citation

Received: 14 March 2012
Accepted: 03 June 2012
Published: 28 November 2012
Issue Date: February 2013
DOI: https://doi.org/10.1007/s11704-012-2086-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Consolidated cluster systems for data centers in the cloud age: a survey and analysis

Abstract

Access this article

Similar content being viewed by others

Data center network architecture in cloud computing: review, taxonomy, and open research issues

Virtual Clusters: Isolated, Containerized HPC Environments in Kubernetes

Comparing various approaches to resource allocation in data centers

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Consolidated cluster systems for data centers in the cloud age: a survey and analysis

Abstract

Access this article

Similar content being viewed by others

Data center network architecture in cloud computing: review, taxonomy, and open research issues

Virtual Clusters: Isolated, Containerized HPC Environments in Kubernetes

Comparing various approaches to resource allocation in data centers

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation