skip to main content
10.1145/3127479.3134762acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper

Sparkle: optimizing spark for large memory machines and analytics

Published: 24 September 2017 Publication History

Abstract

Given the growing availability of affordable scale-up servers, our goal is to bring the performance benefits of in-memory processing on scale-up servers to an increasingly common class of data analytics applications that process small to medium size datasets (up to a few 100GBs) that can easily fit in the memory of a typical scale-up server To achieve this goal, we leverage Spark, an existing memory-centric data analytics framework with wide-spread adoption among data scientists. Bringing Spark's data analytic capabilities to a scale-up system requires rethinking the original design assumptions, which, although effective for a scale-out system, are a poor match to a scale-up system resulting in unnecessary communication and memory inefficiencies.

Cited By

View all
  • (2021)Memory-centric Architecture for Disaggregated ComputersNTT Technical Review10.53829/ntr202107fa919:7(65-69)Online publication date: Jul-2021
  • (2021)FlashByte: Improving Memory Efficiency with Lightweight Native Storage2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid51090.2021.00016(61-70)Online publication date: May-2021
  • (2021)On Divide&Conquer in Image Processing of Data MonsterBig Data Research10.1016/j.bdr.2021.10021425:COnline publication date: 15-Jul-2021
  • Show More Cited By

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SoCC '17: Proceedings of the 2017 Symposium on Cloud Computing
September 2017
672 pages
ISBN:9781450350280
DOI:10.1145/3127479
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 September 2017

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Short-paper

Conference

SoCC '17
Sponsor:
SoCC '17: ACM Symposium on Cloud Computing
September 24 - 27, 2017
California, Santa Clara

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Memory-centric Architecture for Disaggregated ComputersNTT Technical Review10.53829/ntr202107fa919:7(65-69)Online publication date: Jul-2021
  • (2021)FlashByte: Improving Memory Efficiency with Lightweight Native Storage2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid51090.2021.00016(61-70)Online publication date: May-2021
  • (2021)On Divide&Conquer in Image Processing of Data MonsterBig Data Research10.1016/j.bdr.2021.10021425:COnline publication date: 15-Jul-2021
  • (2020)Disaggregating persistent memory and controlling them remotelyProceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference10.5555/3489146.3489149(33-48)Online publication date: 15-Jul-2020
  • (2020)A Survey on Spark Ecosystem: Big Data Processing Infrastructure, Machine Learning, and ApplicationsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.2975652(1-1)Online publication date: 2020
  • (2019)ATuMm: Auto-tuning Memory Manager in Apache Spark2019 IEEE 38th International Performance Computing and Communications Conference (IPCCC)10.1109/IPCCC47392.2019.8958724(1-8)Online publication date: Oct-2019
  • (2018)lpt: A Tool for Tuning the Level of Parallelism of Spark Applications2018 25th Asia-Pacific Software Engineering Conference (APSEC)10.1109/APSEC.2018.00080(633-637)Online publication date: Dec-2018
  • (2017)Sandpiper: Scaling probabilistic inferencing to large scale graphical models2017 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2017.8257949(383-388)Online publication date: Dec-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media