ABSTRACT
With the development of big data collection and storage technology, an analysis for its utilization has recently been expanded in public sector and various industries. Especialy in manufacturing and financial sectors, there has been a very high demand for real-time analysis of big data. Existing studies on the big data analysis mainly dealt with its batch scheme as a premise. In recent years, studies related to real-time analytics using SPARK, STORM and IMDG have been underway. In this regard, this paper seeks to evaluate the processing performance of the principal component analysis using an open sourse SPARK which is in-memeory based distributed processing method. It is necessary for real-time analysis and fast operation of large amount of data. This paper shows how fast spark is by comprison with open source R and also investigate the distributed processing capability of Spark according to the Node configuration.
- Jay Lee, Hung-An Kao, Shanhu Yang: Service innovation and smart analytics for Industry 4.0 and big data environment. Procedia CIRP 16 (2014) 3--8Google Scholar
- Jay Lee, Edzel Lapira, Beharad Bagherim Hung-an Kao: Recent advances and trends in predictive manufacturing systems in big data environment. Manufacturing Letters 1 (2013) 38--41Google Scholar
- Girma Kejela, Rui Maximo Esteves and Chunming Rong: Predictive Analytics of Sensor Data Using Distributed Machine Learning Technique. 2014 IEEE 6th International Conference on Cloud Computing Technology and Science, DOI 10.1109 (2014) Google ScholarDigital Library
- Yunhee Kang:Open-source distributed data processing framework for Bigdata Trend, http://www.oss.kr/, 2014.07.09Google Scholar
- Kijun Lee: The Comparison Between Hadoop MapReduce and Spark Device's Machine Learning Performance, soongsil university, 2014.12Google Scholar
- Tarek Elgamal, Maysam Yabandeh, Ashraf Aboulnaga, Mohamed Hefeeda: sPCA: Scalable Principal Component Analysis for Big Data on Distributed Platforms, arXiv: 1503.05214, 2015.03.17Google Scholar
- Hyuk Lee: Rank-Sparsity based signal processing techniques for the analysis of Big Data, KICS, 2014.11Google Scholar
- Chieh-Yen Lin, Chang-Hao Tsai, Ching-Pei Lee, Chih-Jen Lin: Large-scale Logistic Regression and Linear Support Vector Machines Using Spark. 2014 IEEE International Conference on Big DataGoogle Scholar
- Apache Spark: Spark Overview. http://spark.apache.org/docs/latestGoogle Scholar
- Myung Soo Park, Jin Hee Na, Jin Young Choi: PCA-based Feature Extractiion using Class Information. Proceedings of KFIS Spring Conference 2005, Volume 15, Number 1Google Scholar
Recommendations
Performance comparison of Apache Hadoop and Apache Spark
ICAICR '19: Proceedings of the Third International Conference on Advanced Informatics for Computing ResearchThe term 'Big Data' is a broad term used for the data sets, which is enormous and traditional data processing applications find it hard to process. Both Apache Spark and Apache Hadoop are one of the significant parts of the big data family. Some of the ...
Comments