skip to main content
10.1145/2534645.2534654acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

A framework for an in-depth comparison of scale-up and scale-out

Published: 18 November 2013 Publication History

Abstract

When data grows too large, we scale to larger systems, either by scaling out or up. It is understood that scale-out and scale-up have different complexities and bottlenecks but a thorough comparison of the two architectures is challenging because of the diversity of their programming interfaces, their significantly different system environments, and their sensitivity to workload specifics. In this paper, we propose a novel comparison framework based on MapReduce that accounts for the application, its requirements, and its input size by considering input, software, and hardware parameters. Part of this framework requires implementing scale-out properties on scale-up and we discuss the complex trade-offs, interactions, and dependencies of these properties for two specific case studies (word count and sort). This work lays the foundation for future work in quantifying design decisions and in building a system that automatically compares architectures and selects the best one.

References

[1]
J. Ansel, K. Arya, and G. Cooperman. DMTCP: Transparent Checkpointing for Cluster Computations and the Desktop. In Proceedings of the IEEE International Symposium on Parallel & Distributed Processing, pages 1--12, 2009.
[2]
R. Appuswamy, C. Gkantsidis, D. Narayanan, O. Hodson, and A. Rowstron. Scale-up vs Scale-out for Hadoop: Time to Rethink? In Proceedings of the 4th ACM Symposium on Cloud Computing, 2013.
[3]
S. Babu. Towards Automatic Optimization of MapReduce Programs. In Proceedings of the 1st ACM Symposium on Cloud Computing, pages 137--142, 2010.
[4]
K. Czechowski and R. Vuduc. A Theoretical Framework for Algorithm-architecture Co-design. In Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, pages 791--802, 2013.
[5]
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation, 2004.
[6]
Gigaspaces. Scale Up vs. Scale Out. In Gigaspaces Resource Center. http://www.gigaspaces.com/WhitePapers, 2011.
[7]
Hadoop. http://hadoop.apache.org/, accessed 08/09/2012.
[8]
S. Huang, J. Huang, J. Dai, T. Xie, and B. Huang. The HiBench Benchmark Suite: Characterization of the MapReduce-based Data Analysis. In Internatioanl Conference on Data Engineering Workshops, pages 41--51, 2010.
[9]
M. Michael, J. Moreira, D. Shiloach, and R. Wisniewski. Scale-up x Scale-out: A Case Study using Nutch/Lucene. In Proceedings of the IEEE International Symposium on Parallel & Distributed Processing, pages 1--8, 2007.
[10]
OpenMP Architecture Review Board. OpenMP Application Program Interface Version 3.0, accessed 08/09/2012.
[11]
C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating MapReduce for Multi-core and Multiprocessor Systems. In Proceedings of the 13th IEEE International Symposium on High Performance Computer Architecture, pages 13--24, 2007.
[12]
A. Rowstron, D. Narayanan, A. Donnelly, G. O'Shea, and A. Douglas. Nobody Ever Got Fired for using Hadoop on a Cluster. In Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing, pages 2:1--2:5, 2012.
[13]
M. Schwarzkopf, D. G. Murray, and S. Hand. The Seven Deadly Sins of Cloud Computing Research. In Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Computing, pages 1--1, 2012.
[14]
Swift. http://www.openstack.org/software, accessed 08/09/2012.
[15]
J. Talbot, R. M. Yoo, and C. Kozyrakis. Phoenix++: Modular MapReduce for Shared-memory Systems. In Proceedings of the 2nd International Workshop on MapReduce and its Applications, pages 9--16, 2011.
[16]
A. Talkington and K. Dixit. Scaling-Up or Out. International Business, 2002.
[17]
S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long, and C. Maltzahn. Ceph: A Scalable, High-Performance Distributed File System. In Proceedings of the 7th Symposium on Operating Systems Design & Implementation, pages 307--320, 2006.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DISCS-2013: Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
November 2013
66 pages
ISBN:9781450325066
DOI:10.1145/2534645
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 November 2013

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SC13

Acceptance Rates

DISCS-2013 Paper Acceptance Rate 10 of 19 submissions, 53%;
Overall Acceptance Rate 19 of 34 submissions, 56%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)2
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Big data analytics for retail industry using MapReduce-Apriori frameworkJournal of Management Analytics10.1080/23270012.2020.17284037:3(424-442)Online publication date: 26-Feb-2020
  • (2018)Model-driven optimal resource scaling in cloudSoftware and Systems Modeling (SoSyM)10.1007/s10270-017-0584-y17:2(509-526)Online publication date: 1-May-2018
  • (2017)Scaling the Computer to the Problem: Application Programming with Unlimited MemoryComputer10.1109/MC.2017.300124950:8(46-51)Online publication date: 2017
  • (2017)An iso-time scaling method for big data tasks executing on parallel computing systemsThe Journal of Supercomputing10.1007/s11227-017-2029-373:10(4493-4516)Online publication date: 1-Oct-2017
  • (2016)Performance evaluation of in-memory computing on scale-up and scale-out cluster2016 Eighth International Conference on Ubiquitous and Future Networks (ICUFN)10.1109/ICUFN.2016.7537070(456-461)Online publication date: Jul-2016
  • (2015)Evaluation of MapReduce in a Large ClusterProceedings of the 2015 IEEE 8th International Conference on Cloud Computing10.1109/CLOUD.2015.68(461-468)Online publication date: 27-Jun-2015
  • (2014)Handling big data on agent-based modeling of online social networks with mapreduceProceedings of the 2014 Winter Simulation Conference10.5555/2693848.2693962(851-862)Online publication date: 7-Dec-2014
  • (2014)Handling big data on agent-based modeling of Online Social Networks with MapReduceProceedings of the Winter Simulation Conference 201410.1109/WSC.2014.7019946(851-862)Online publication date: Dec-2014
  • (2014)Modeling the Impact of Workload on Cloud Resource ScalingProceedings of the 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing10.1109/SBAC-PAD.2014.16(310-317)Online publication date: 22-Oct-2014
  • (2014)SupMRProceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops10.1109/IPDPSW.2014.168(1505-1514)Online publication date: 19-May-2014

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media