CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications

Luo, Chunjie; Zhan, Jianfeng; Jia, Zhen; Wang, Lei; Lu, Gang; Zhang, Lixin; Xu, Cheng-Zhong; Sun, Ninghui

doi:10.1007/s11704-012-2118-7

CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications

Research Article
Published: 03 August 2012

Volume 6, pages 347–362, (2012)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Chunjie Luo¹,
Jianfeng Zhan¹,
Zhen Jia¹,
Lei Wang¹,
Gang Lu¹,
Lixin Zhang¹,
Cheng-Zhong Xu^2,3 &
…
Ninghui Sun¹

544 Accesses
71 Citations
14 Altmetric
2 Mentions
Explore all metrics

Abstract

With the explosive growth of information, more and more organizations are deploying private cloud systems or renting public cloud systems to process big data. However, there is no existing benchmark suite for evaluating cloud performance on the whole system level. To the best of our knowledge, this paper proposes the first benchmark suite CloudRank-D to benchmark and rank cloud computing systems that are shared for running big data applications. We analyze the limitations of previous metrics, e.g., floating point operations, for evaluating a cloud computing system, and propose two simple metrics: data processed per second and data processed per Joule as two complementary metrics for evaluating cloud computing systems. We detail the design of CloudRank-D that considers representative applications, diversity of data characteristics, and dynamic behaviors of both applications and system software platforms. Through experiments, we demonstrate the advantages of our proposed metrics. In several case studies, we evaluate two small-scale deployments of cloud computing systems using CloudRank-D.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SWOT Analysis of Cloud Computing Environment

Quantifying Cloud Data Analytic Platform Scalability with Extended TPC-DS Benchmark

Analyzing Requirements Engineering for Cloud Computing

References

Armbrust M, Fox A, Griffith R, Joseph A, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I, Zaharia M. Above the clouds: a Berkeley view of cloud computing. Deptartment Electrical Engineering and Compututer Sciences, University of California, Berkeley, Report UCB/EECS, 2009, 28
Barroso L, Hölzle U. The datacenter as a computer: an introduction to the design of warehouse-scale machines. Synthesis Lectures on Computer Architecture, 2009, 4(1): 1–108
Article Google Scholar
http://wiki.apache.org/hadoop/PoweredBy
Wang P, Meng D, Han J, Zhan J, Tu B, Shi X, Wan L. Transformer: a new paradigm for building data-parallel programming models. IEEE Micro, 2010, 30(4): 55–64
Article MATH Google Scholar
Isard M, Budiu M, Yu Y, Birrell A, Fetterly D. Dryad: distributed dataparallel programs from sequential building blocks. ACM SIGOPS Operating Systems Review, 2007, 41(3): 59–72
Article Google Scholar
Thusoo A, Shao Z, Anthony S, Borthakur D, Jain N, Sen Sarma J, Murthy R, Liu H. Data warehousing and analytics infrastructure at Facebook. In: Proceedings of the 2010 International Conference on Management of Data. 2010, 1013–1020
Dongarra J, Luszczek P, Petitet A. The linpack benchmark: past, present and future. Concurrency and Computation: Practice and Experience, 2003, 15(9): 803–820
Article Google Scholar
http://hadoop.apache.org
Bienia C. Benchmarking modern multiprocessors. PhD thesis. Princeton University, 2011
http://www.spec.org/cpu2006
http://www.spec.org/web2005
http://www.tpc.org/information/benchmarks.asp
http://hadoop.apache.org/mapreduce/docs/current/gridmix.html
Huang S, Huang J, Dai J, Xie T, Huang B. The hibench benchmark suite: characterization of the mapreduce-based data analysis. In: Proceedings of the 26th IEEE International Conference on Data Engineering Workshops, ICDEW’10. 2010, 41–51
Chen Y, Ganapathi A, Griffith R, Katz R. The case for evaluating mapreduce performance using workload suites. In: Proceedings of the IEEE 19th International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems, MASCOTS’11. 2011, 390–399
Ferdman M, Adileh A, Kocberber O, Volos S, Alisafaee M, Jevdjic D, Kaynak C, Popescu A, Ailamaki A, Falsafi B. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems. 2012, 37–48
Zhan J, Zhang L, Sun N, Wang L, Jia Z, Luo C. High volume throughput computing: identifying and characterizing throughput oriented workloads in data centers. In: Proceedings of the 2012 Workshop on Large-Scale Parallel Processing. 2012
Xi H, Zhan J, Jia Z, Hong X, Wang L, Zhang L, Sun N, Lu G. Characterization of real workloads of web search engines. In: Proceedings of the 2011 IEEE International Symposium on Workload Characterization, IISWC’11. 2011, 15–25
http://hadoop.apache.org/common/docs/r0.20.2/fair_scheduler.html
Zaharia M, Borthakur D, Sarma J, Elmeleegy K, Shenker S, Stoica I. Job scheduling for multi-user mapreduce clusters. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2009-55, 2009
http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.html
Rasooli A, Down D. An adaptive scheduling algorithm for dynamic heterogeneous Hadoop systems. In: Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research. 2011, 30–44
Sandholm T, Lai K. Dynamic proportional share scheduling in Hadoop. In: Job Scheduling Strategies for Parallel Processing. 2010, 110–131
Wolf J, Rajan D, Hildrum K, Khandekar R, Kumar V, Parekh S, Wu K, Balmin A. Flex: a slot allocation scheduling optimizer for mapreduce workloads. Middleware 2010, 2010, 1–20
Lee G, Chun B, Katz R. Heterogeneity-aware resource allocation and scheduling in the cloud. In: Proceedings of the 3rd USENIXWorkshop on Hot Topics in Cloud Computing, HotCloud’11. 2011
Yong M, Garegrat N, Mohan S. Towards a resource aware scheduler in hadoop. In: Proceedings of the 2009 IEEE International Conference on Web Services. 2009, 102–109
Wang L, Zhan J, Shi W, Yi L. In cloud, can scientific communities benefit from the economies of scale? IEEE Transactions on Parallel and Distributed Systems, 2012, 23(2): 296–303
Article Google Scholar
Narayanan R, Ozisikyilmaz B, Zambreno J, Memik G, Choudhary A. Minebench: a benchmark suite for data mining workloads. In: Proceedings of the 2006 IEEE International Symposium on Workload Characterization. 2006, 182–188
Patterson D, Hennessy J. Computer organization and design: the hardware/software interface. Morgan Kaufmann, 2009
Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Communications of the ACM, 2008, 51(1): 107–113
Article Google Scholar
Wu X, Kumar V, Ross Quinlan J, Ghosh J, Yang Q, Motoda H, McLachlan G, Ng A, Liu B, Yu P, Zhou Z-H, Steinbach M, Hand D, Steinberg D. Top 10 algorithms in data mining. Knowledge and Information Systems, 2008, 14(1): 1–37
Article Google Scholar
Linden G, Smith B, York J. Amazon.com recommendations: item-toitem collaborative filtering. IEEE Internet Computing, 2003, 7(1): 76–80
Article Google Scholar
http://en.wikipedia.org/wiki/Association_rule_learning
https://issues.apache.org/jira/browse/HIVE-396
http://hive.apache.org/
Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems. 2010, 265–278

Download references

Author information

Authors and Affiliations

State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100019, China
Chunjie Luo, Jianfeng Zhan, Zhen Jia, Lei Wang, Gang Lu, Lixin Zhang & Ninghui Sun
Department of Electrical and Computer Engineering, Wayne State University, Detroit, MI, 48202, USA
Cheng-Zhong Xu
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
Cheng-Zhong Xu

Authors

Chunjie Luo
View author publications
Search author on:PubMed Google Scholar
Jianfeng Zhan
View author publications
Search author on:PubMed Google Scholar
Zhen Jia
View author publications
Search author on:PubMed Google Scholar
Lei Wang
View author publications
Search author on:PubMed Google Scholar
Gang Lu
View author publications
Search author on:PubMed Google Scholar
Lixin Zhang
View author publications
Search author on:PubMed Google Scholar
Cheng-Zhong Xu
View author publications
Search author on:PubMed Google Scholar
Ninghui Sun
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Jianfeng Zhan.

Additional information

Chunjie Luo is a Master’s student at the Institute of Computing Technology, Chinese Academy of Sciences. His research interest focuses on data center computing. He received his BS in 2009 from Huazhong University of Science and Technology in China.

Jianfeng Zhan received his PhD in Computer Engineering from the Chinese Academy of Sciences, Beijing, China, in 2002. He is currently an associate professor of computer science with Institute of Computing Technology, Chinese Academy of Sciences. He was a recipient of the Second-class Chinese National Technology Promotion Prize in 2006, and the Distinguished Achievement Award of the Chinese Academy of Sciences in 2005.

Zhen Jia is a PhD candidate in Computer Science at the Institute of Computing Technology, Chinese Academy of Sciences. His research focuses on parallel and distributed systems, benchmarks, and data center workload characterization. He received his BS in 2010 from Dalian University of Technology in China.

Lei Wang received the his MS in Computer Engineering from the Chinese Academy of Sciences, Beijing, China, in 2006. He is currently a senior engineer with the Institute of Computing Technology, Chinese Academy of Sciences. His current research interests include resource management of cloud systems.

Gang Lu received his Bachelor’s degree in 2010 from Huazhong University of Science and Technology in China, in computer science. He is a PhD candidate at the Institute of Computing Technology, Chinese Academy of Sciences. His research interests include distributed and parallel systems.

Lixin Zhang is a professor and vice general engineer of Systems at the Institute of Computing Technology, Chinese Academy of Sciences. He is the director of the Advanced Computer Systems Laboratory. His main research areas include computer architecture, data center computing, high performance computing, advanced memory systems, and workload characterization. Dr. Zhang received his BS in computer science from Fudan University in 1993 and his PhD in computer science from the University of Utah in 2001. He was previously a research staff member at IBM Austin Research Lab and a Master Inventor of IBM.

Cheng-Zhong Xu received his PhD from the University of HongKong in 1993. He is currently a tenured professor of Wayne State University and the director of the Center for Cloud Computing in Shenzhen Institute of Advanced Technology of CAS. His research interest are in parallel and distributed systems, and cloud computing. He has published more than 180 papers in journals and conferences. He serves on a number of journal editorial boards, including IEEE TPDS and JPDC. He was a recipient of the Faculty Research Award, Career Development Chair Award, and the President’s Award for Excellence in Teaching of WSU. He was also a recipient of the “Outstanding Oversea Scholar” award of NSFC.

Ninghui Sun is a professor and the director of Information Computing Technology (ICT) of the Chinese Academy of Sciences (CAS). He graduated at Peking University in 1989, and received his MS and PhD degrees from ICT of CAS in 1992 and 1999, respectively. Prof. Sun is the architect and main designer of the Dawning2000, Dawning3000, Dawning4000, Dawning5000, and Dawning6000 high performance computers.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luo, C., Zhan, J., Jia, Z. et al. CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications. Front. Comput. Sci. 6, 347–362 (2012). https://doi.org/10.1007/s11704-012-2118-7

Download citation

Received: 28 March 2012
Accepted: 15 June 2012
Published: 03 August 2012
Issue Date: August 2012
DOI: https://doi.org/10.1007/s11704-012-2118-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SWOT Analysis of Cloud Computing Environment

Quantifying Cloud Data Analytic Platform Scalability with Extended TPC-DS Benchmark

Analyzing Requirements Engineering for Cloud Computing

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now