research-article

A framework for an in-depth comparison of scale-up and scale-out

Authors:

Michael Sevilla,

Kleoni Ioannidou,

Carlos MaltzahnAuthors Info & Claims

DISCS-2013: Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems

Pages 13 - 18

https://doi.org/10.1145/2534645.2534654

Published: 18 November 2013 Publication History

Abstract

When data grows too large, we scale to larger systems, either by scaling out or up. It is understood that scale-out and scale-up have different complexities and bottlenecks but a thorough comparison of the two architectures is challenging because of the diversity of their programming interfaces, their significantly different system environments, and their sensitivity to workload specifics. In this paper, we propose a novel comparison framework based on MapReduce that accounts for the application, its requirements, and its input size by considering input, software, and hardware parameters. Part of this framework requires implementing scale-out properties on scale-up and we discuss the complex trade-offs, interactions, and dependencies of these properties for two specific case studies (word count and sort). This work lays the foundation for future work in quantifying design decisions and in building a system that automatically compares architectures and selects the best one.

References

[1]

J. Ansel, K. Arya, and G. Cooperman. DMTCP: Transparent Checkpointing for Cluster Computations and the Desktop. In Proceedings of the IEEE International Symposium on Parallel & Distributed Processing, pages 1--12, 2009.

Digital Library

[2]

R. Appuswamy, C. Gkantsidis, D. Narayanan, O. Hodson, and A. Rowstron. Scale-up vs Scale-out for Hadoop: Time to Rethink? In Proceedings of the 4th ACM Symposium on Cloud Computing, 2013.

Digital Library

[3]

S. Babu. Towards Automatic Optimization of MapReduce Programs. In Proceedings of the 1st ACM Symposium on Cloud Computing, pages 137--142, 2010.

Digital Library

[4]

K. Czechowski and R. Vuduc. A Theoretical Framework for Algorithm-architecture Co-design. In Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, pages 791--802, 2013.

Digital Library

[5]

J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation, 2004.

Digital Library

[6]

Gigaspaces. Scale Up vs. Scale Out. In Gigaspaces Resource Center. http://www.gigaspaces.com/WhitePapers, 2011.

[7]

Hadoop. http://hadoop.apache.org/, accessed 08/09/2012.

[8]

S. Huang, J. Huang, J. Dai, T. Xie, and B. Huang. The HiBench Benchmark Suite: Characterization of the MapReduce-based Data Analysis. In Internatioanl Conference on Data Engineering Workshops, pages 41--51, 2010.

[9]

M. Michael, J. Moreira, D. Shiloach, and R. Wisniewski. Scale-up x Scale-out: A Case Study using Nutch/Lucene. In Proceedings of the IEEE International Symposium on Parallel & Distributed Processing, pages 1--8, 2007.

[10]

OpenMP Architecture Review Board. OpenMP Application Program Interface Version 3.0, accessed 08/09/2012.

[11]

C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating MapReduce for Multi-core and Multiprocessor Systems. In Proceedings of the 13th IEEE International Symposium on High Performance Computer Architecture, pages 13--24, 2007.

Digital Library

[12]

A. Rowstron, D. Narayanan, A. Donnelly, G. O'Shea, and A. Douglas. Nobody Ever Got Fired for using Hadoop on a Cluster. In Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing, pages 2:1--2:5, 2012.

Digital Library

[13]

M. Schwarzkopf, D. G. Murray, and S. Hand. The Seven Deadly Sins of Cloud Computing Research. In Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Computing, pages 1--1, 2012.

Digital Library

[14]

Swift. http://www.openstack.org/software, accessed 08/09/2012.

[15]

J. Talbot, R. M. Yoo, and C. Kozyrakis. Phoenix++: Modular MapReduce for Shared-memory Systems. In Proceedings of the 2nd International Workshop on MapReduce and its Applications, pages 9--16, 2011.

Digital Library

[16]

A. Talkington and K. Dixit. Scaling-Up or Out. International Business, 2002.

[17]

S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long, and C. Maltzahn. Ceph: A Scalable, High-Performance Distributed File System. In Proceedings of the 7th Symposium on Operating Systems Design & Implementation, pages 307--320, 2006.

Digital Library

Cited By

Verma NMalhotra DSingh J(2020)Big data analytics for retail industry using MapReduce-Apriori frameworkJournal of Management Analytics10.1080/23270012.2020.17284037:3(424-442)Online publication date: 26-Feb-2020
https://doi.org/10.1080/23270012.2020.1728403
Gandhi ADube PKarve AKochut AZhang L(2018)Model-driven optimal resource scaling in cloudSoftware and Systems Modeling (SoSyM)10.1007/s10270-017-0584-y17:2(509-526)Online publication date: 1-May-2018
https://dl.acm.org/doi/10.1007/s10270-017-0584-y
Nassi I(2017)Scaling the Computer to the Problem: Application Programming with Unlimited MemoryComputer10.1109/MC.2017.300124950:8(46-51)Online publication date: 2017
https://doi.org/10.1109/MC.2017.3001249
Show More Cited By

Index Terms

A framework for an in-depth comparison of scale-up and scale-out

Recommendations

Scale-up vs scale-out for Hadoop: time to rethink?
SOCC '13: Proceedings of the 4th annual Symposium on Cloud Computing

In the last decade we have seen a huge deployment of cheap clusters to run data analytics workloads. The conventional wisdom in industry and academia is that scaling out using a cluster of commodity machines is better for these workloads than scaling up ...
Scale-out beyond map-reduce
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

The amount and variety of data being collected in the enterprise is growing at a staggering pace. The default now is to capture and store any and all data, in anticipation of potential future strategic value, and vast amounts of data are being generated ...
Large Scale and Big Data: Processing and Management

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DISCS-2013: Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems

November 2013

66 pages

ISBN:9781450325066

DOI:10.1145/2534645

General Chair:
Xian-He Sun
Illinois Institute of Technology
,
Program Chairs:
Yong Chen
Texas Tech University
,
Philip C. Roth
Oak Ridge National Laboratory

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 November 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

SC13

Sponsor:

SC13: International Conference for High Performance Computing, Networking, Storage and Analysis

November 18, 2013

Colorado, Denver

Acceptance Rates

DISCS-2013 Paper Acceptance Rate 10 of 19 submissions, 53%;

Overall Acceptance Rate 19 of 34 submissions, 56%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
198
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)2

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Verma NMalhotra DSingh J(2020)Big data analytics for retail industry using MapReduce-Apriori frameworkJournal of Management Analytics10.1080/23270012.2020.17284037:3(424-442)Online publication date: 26-Feb-2020
https://doi.org/10.1080/23270012.2020.1728403
Gandhi ADube PKarve AKochut AZhang L(2018)Model-driven optimal resource scaling in cloudSoftware and Systems Modeling (SoSyM)10.1007/s10270-017-0584-y17:2(509-526)Online publication date: 1-May-2018
https://dl.acm.org/doi/10.1007/s10270-017-0584-y
Nassi I(2017)Scaling the Computer to the Problem: Application Programming with Unlimited MemoryComputer10.1109/MC.2017.300124950:8(46-51)Online publication date: 2017
https://doi.org/10.1109/MC.2017.3001249
Zeng GLiu W(2017)An iso-time scaling method for big data tasks executing on parallel computing systemsThe Journal of Supercomputing10.1007/s11227-017-2029-373:10(4493-4516)Online publication date: 1-Oct-2017
https://dl.acm.org/doi/10.1007/s11227-017-2029-3
Taekyung Yoo Minsub Yim Ilgyun Jeong Yunsu Lee Seung-Tae Chun (2016)Performance evaluation of in-memory computing on scale-up and scale-out cluster2016 Eighth International Conference on Ubiquitous and Future Networks (ICUFN)10.1109/ICUFN.2016.7537070(456-461)Online publication date: Jul-2016
https://doi.org/10.1109/ICUFN.2016.7537070
Kc KHsu CFreeh V(2015)Evaluation of MapReduce in a Large ClusterProceedings of the 2015 IEEE 8th International Conference on Cloud Computing10.1109/CLOUD.2015.68(461-468)Online publication date: 27-Jun-2015
https://dl.acm.org/doi/10.1109/CLOUD.2015.68
de C. Gatti MVieira Mde Melo JCavalin PPinhanez CBuckley SMiller J(2014)Handling big data on agent-based modeling of online social networks with mapreduceProceedings of the 2014 Winter Simulation Conference10.5555/2693848.2693962(851-862)Online publication date: 7-Dec-2014
https://dl.acm.org/doi/10.5555/2693848.2693962
de C. Gatti MVieira Mde Melo JCavalin PPinhanez C(2014)Handling big data on agent-based modeling of Online Social Networks with MapReduceProceedings of the Winter Simulation Conference 201410.1109/WSC.2014.7019946(851-862)Online publication date: Dec-2014
https://doi.org/10.1109/WSC.2014.7019946
Gandhi ADube PKarve AKochut AZhang L(2014)Modeling the Impact of Workload on Cloud Resource ScalingProceedings of the 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing10.1109/SBAC-PAD.2014.16(310-317)Online publication date: 22-Oct-2014
https://dl.acm.org/doi/10.1109/SBAC-PAD.2014.16
Sevilla MNassi IIoannidou KBrandt SMaltzahn C(2014)SupMRProceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops10.1109/IPDPSW.2014.168(1505-1514)Online publication date: 19-May-2014
https://dl.acm.org/doi/10.1109/IPDPSW.2014.168

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents