research-article

Spark Performance Optimization Analysis in Memory Tuning On GC Overhead for Big Data Analytics

Authors:
Deleli Mesay Adinew

School of Information and Software engineering, University of Electronic Science and Technology College, Chengedu, China

School of Information and Software engineering, University of Electronic Science and Technology College, Chengedu, China
View Profile

,
Zhou Shijie

School of Information and Software engineering, University of Electronic Science and Technology College, Chengedu, China

School of Information and Software engineering, University of Electronic Science and Technology College, Chengedu, China
View Profile

,
Yongjian Liao

School of Information and Software engineering, University of Electronic Science and Technology College, Chengedu, China

School of Information and Software engineering, University of Electronic Science and Technology College, Chengedu, China
View Profile

ICNCC '19: Proceedings of the 2019 8th International Conference on Networks, Communication and ComputingDecember 2019Pages 75–78https://doi.org/10.1145/3375998.3376039

Published:28 January 2020Publication History

ICNCC '19: Proceedings of the 2019 8th International Conference on Networks, Communication and Computing

Pages 75–78

ABSTRACT

Apache spark is one of the high speed "in-memory computing" that run over the JVM. Due to increasing data in volume, it needs performance optimization mechanism that requires management of JVM heap space. To Manage JVM heap space it needs management of garbage collector pause time that affects application performance. There are different parameters to pass to spark to control JVM heap space and GC time overhead to increase application performance. Passing appropriate heap size with appropriate types of GC as a parameter is one of performance optimization which is known as Spark Garbage collection tuning. To reduce GC overhead, an experiment was done by adjusting certain parameters for loading and dataframe creation and data retrieval process. The result shows 3.23% improvement in Latency and 1.62% improvement in Throughput as compared to default parameter configuration in garbage collection tuning approach.

References

"Apache Spark™ - Unified Analytics Engine for Big Data," Apache Spark™ Unified Analytics Engine for Big Data. [Online]. Available: https://spark.apache.org/. [Accessed: 25-Feb-2019].Google Scholar
Y. Yu, T. Lei, W. Zhang, H. Chen and B. Zang, "Performance Analysis and Optimization of Full Garbage Collection in Memory-hungry Environments", ACM SIGPLAN Notices, vol. 51, no. 7, pp. 123--130, 2016.Google ScholarDigital Library
V. Bande, G. Pakle: "CSRS: Customized Service Recommendation System for Big Data Analysis using Map Reduce":2018Google Scholar
S. Choi, W. Yang, Y. Kee, "Early experience with optimizing I/O performance using high-performance SSDs for in-memory cluster computing -2015 IEEE International Conference on Big Data (Big Data), 2015.Google Scholar
Y. Zhao, D. Chen, H. Che, and Z. Jiang, "An adaptive memory tuning strategy with high performance for Spark", International Journal of Big Data Intelligence, 2017.Google ScholarCross Ref
Z. Zhu, Q. Shen, Y. Yang, and Z. Wu "MCS: Memory Constraint Strategy for Unified Memory Manager in Spark." 2017 IEEE 23rd International Conference onParallel and distributed systems, 2017Google Scholar
N. Nhan, M. Mohammad, K. Hasan, A. Yusuf, and W. Kewen, "Understanding the Influence of Configuration Settings: An Execution Model-driven Framework for Apache Spark Platform", 2017 IEEE 10th International Conference on Cloud Computing, 2017.Google Scholar
W. Guolu, X. Jungang, and H. Ben, "A Novel Method for Tuning Configuration Parameters of Spark Based on Machine Learning", 2016 IEEE 18th International Conference onHigh-Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems.Google Scholar
R. Charles, "Understanding Memory Configurations for In-Memory Analytics | EECS at UC Berkeley", Www2.eecs.berkeley.edu, 2019Google Scholar
Z. Han and Y. Zhang, "Spark: A Big Data Processing Platform Based on Memory Computing," 2015 Seventh International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), 2015.Google Scholar
T. Chiba and T. Onodera, "Workload characterization and optimization of TPC-H queries on Apache Spark," 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2016.Google Scholar
H. Du, P. Han, W. Chen, Y. Wang, and C. Zhang, "Otterman: A Novel Approach of Spark Auto-tuning by a Hybrid Strategy," 2018 5th International Conference on Systems and Informatics (ICSAI), 2018.Google Scholar

Recommendations

High-performance copying garbage collection with low space overhead
Read More
GC assertions: using the garbage collector to check heap properties
PLDI '09: Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation

This paper introduces GC assertions, a system interface that programmers can use to check for errors, such as data structure invariant violations, and to diagnose performance problems, such as memory leaks. GC assertions are checked by the garbage ...
Read More
Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICNCC '19: Proceedings of the 2019 8th International Conference on Networks, Communication and Computing
December 2019
263 pages
ISBN:9781450377027
DOI:10.1145/3375998

Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 January 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
GC Events
Garbage Collection Tuning
HeapRegionSize
JVM heap space
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 240
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Spark Performance Optimization Analysis in Memory Tuning On GC Overhead for Big Data Analytics

ICNCC '19: Proceedings of the 2019 8th International Conference on Networks, Communication and Computing

ABSTRACT

References

Cited By

Recommendations

High-performance copying garbage collection with low space overhead

GC assertions: using the garbage collector to check heap properties

Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Spark Performance Optimization Analysis in Memory Tuning On GC Overhead for Big Data Analytics

ICNCC '19: Proceedings of the 2019 8th International Conference on Networks, Communication and Computing

ABSTRACT

References

Cited By

Recommendations

High-performance copying garbage collection with low space overhead

GC assertions: using the garbage collector to check heap properties

Big Data Analytics with Spark: A Practitioner's Guide to Using Spark for Large Scale Data Analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media