skip to main content
10.1145/2837060.2837074acmotherconferencesArticle/Chapter ViewAbstractPublication PagesbigdasConference Proceedingsconference-collections
research-article

Performance Evaluation of Apache Spark According to the Number of Nodes using Principal Component Analysis

Authors Info & Claims
Published:20 October 2015Publication History

ABSTRACT

With the development of big data collection and storage technology, an analysis for its utilization has recently been expanded in public sector and various industries. Especialy in manufacturing and financial sectors, there has been a very high demand for real-time analysis of big data. Existing studies on the big data analysis mainly dealt with its batch scheme as a premise. In recent years, studies related to real-time analytics using SPARK, STORM and IMDG have been underway. In this regard, this paper seeks to evaluate the processing performance of the principal component analysis using an open sourse SPARK which is in-memeory based distributed processing method. It is necessary for real-time analysis and fast operation of large amount of data. This paper shows how fast spark is by comprison with open source R and also investigate the distributed processing capability of Spark according to the Node configuration.

References

  1. Jay Lee, Hung-An Kao, Shanhu Yang: Service innovation and smart analytics for Industry 4.0 and big data environment. Procedia CIRP 16 (2014) 3--8Google ScholarGoogle Scholar
  2. Jay Lee, Edzel Lapira, Beharad Bagherim Hung-an Kao: Recent advances and trends in predictive manufacturing systems in big data environment. Manufacturing Letters 1 (2013) 38--41Google ScholarGoogle Scholar
  3. Girma Kejela, Rui Maximo Esteves and Chunming Rong: Predictive Analytics of Sensor Data Using Distributed Machine Learning Technique. 2014 IEEE 6th International Conference on Cloud Computing Technology and Science, DOI 10.1109 (2014) Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Yunhee Kang:Open-source distributed data processing framework for Bigdata Trend, http://www.oss.kr/, 2014.07.09Google ScholarGoogle Scholar
  5. Kijun Lee: The Comparison Between Hadoop MapReduce and Spark Device's Machine Learning Performance, soongsil university, 2014.12Google ScholarGoogle Scholar
  6. Tarek Elgamal, Maysam Yabandeh, Ashraf Aboulnaga, Mohamed Hefeeda: sPCA: Scalable Principal Component Analysis for Big Data on Distributed Platforms, arXiv: 1503.05214, 2015.03.17Google ScholarGoogle Scholar
  7. Hyuk Lee: Rank-Sparsity based signal processing techniques for the analysis of Big Data, KICS, 2014.11Google ScholarGoogle Scholar
  8. Chieh-Yen Lin, Chang-Hao Tsai, Ching-Pei Lee, Chih-Jen Lin: Large-scale Logistic Regression and Linear Support Vector Machines Using Spark. 2014 IEEE International Conference on Big DataGoogle ScholarGoogle Scholar
  9. Apache Spark: Spark Overview. http://spark.apache.org/docs/latestGoogle ScholarGoogle Scholar
  10. Myung Soo Park, Jin Hee Na, Jin Young Choi: PCA-based Feature Extractiion using Class Information. Proceedings of KFIS Spring Conference 2005, Volume 15, Number 1Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    BigDAS '15: Proceedings of the 2015 International Conference on Big Data Applications and Services
    October 2015
    321 pages
    ISBN:9781450338462
    DOI:10.1145/2837060

    Copyright © 2015 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 20 October 2015

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited
  • Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader