skip to main content
10.1145/3375998.3376039acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicnccConference Proceedingsconference-collections
research-article

Spark Performance Optimization Analysis in Memory Tuning On GC Overhead for Big Data Analytics

Authors Info & Claims
Published:28 January 2020Publication History

ABSTRACT

Apache spark is one of the high speed "in-memory computing" that run over the JVM. Due to increasing data in volume, it needs performance optimization mechanism that requires management of JVM heap space. To Manage JVM heap space it needs management of garbage collector pause time that affects application performance. There are different parameters to pass to spark to control JVM heap space and GC time overhead to increase application performance. Passing appropriate heap size with appropriate types of GC as a parameter is one of performance optimization which is known as Spark Garbage collection tuning. To reduce GC overhead, an experiment was done by adjusting certain parameters for loading and dataframe creation and data retrieval process. The result shows 3.23% improvement in Latency and 1.62% improvement in Throughput as compared to default parameter configuration in garbage collection tuning approach.

References

  1. "Apache Spark™ - Unified Analytics Engine for Big Data," Apache Spark™ Unified Analytics Engine for Big Data. [Online]. Available: https://spark.apache.org/. [Accessed: 25-Feb-2019].Google ScholarGoogle Scholar
  2. Y. Yu, T. Lei, W. Zhang, H. Chen and B. Zang, "Performance Analysis and Optimization of Full Garbage Collection in Memory-hungry Environments", ACM SIGPLAN Notices, vol. 51, no. 7, pp. 123--130, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. V. Bande, G. Pakle: "CSRS: Customized Service Recommendation System for Big Data Analysis using Map Reduce":2018Google ScholarGoogle Scholar
  4. S. Choi, W. Yang, Y. Kee, "Early experience with optimizing I/O performance using high-performance SSDs for in-memory cluster computing -2015 IEEE International Conference on Big Data (Big Data), 2015.Google ScholarGoogle Scholar
  5. Y. Zhao, D. Chen, H. Che, and Z. Jiang, "An adaptive memory tuning strategy with high performance for Spark", International Journal of Big Data Intelligence, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  6. Z. Zhu, Q. Shen, Y. Yang, and Z. Wu "MCS: Memory Constraint Strategy for Unified Memory Manager in Spark." 2017 IEEE 23rd International Conference onParallel and distributed systems, 2017Google ScholarGoogle Scholar
  7. N. Nhan, M. Mohammad, K. Hasan, A. Yusuf, and W. Kewen, "Understanding the Influence of Configuration Settings: An Execution Model-driven Framework for Apache Spark Platform", 2017 IEEE 10th International Conference on Cloud Computing, 2017.Google ScholarGoogle Scholar
  8. W. Guolu, X. Jungang, and H. Ben, "A Novel Method for Tuning Configuration Parameters of Spark Based on Machine Learning", 2016 IEEE 18th International Conference onHigh-Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems.Google ScholarGoogle Scholar
  9. R. Charles, "Understanding Memory Configurations for In-Memory Analytics | EECS at UC Berkeley", Www2.eecs.berkeley.edu, 2019Google ScholarGoogle Scholar
  10. Z. Han and Y. Zhang, "Spark: A Big Data Processing Platform Based on Memory Computing," 2015 Seventh International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), 2015.Google ScholarGoogle Scholar
  11. T. Chiba and T. Onodera, "Workload characterization and optimization of TPC-H queries on Apache Spark," 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2016.Google ScholarGoogle Scholar
  12. H. Du, P. Han, W. Chen, Y. Wang, and C. Zhang, "Otterman: A Novel Approach of Spark Auto-tuning by a Hybrid Strategy," 2018 5th International Conference on Systems and Informatics (ICSAI), 2018.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICNCC '19: Proceedings of the 2019 8th International Conference on Networks, Communication and Computing
    December 2019
    263 pages
    ISBN:9781450377027
    DOI:10.1145/3375998

    Copyright © 2019 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 28 January 2020

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader