skip to main content
10.1145/2905055.2905326acmotherconferencesArticle/Chapter ViewAbstractPublication PagesictcsConference Proceedingsconference-collections
research-article

Big Data Analytics Based on In-Memory Infrastructure On Traditional HPC: A Survey

Published: 04 March 2016 Publication History

Abstract

As the capacity of main memory is growing, in-memory based big data analytics is becoming more popular. In-memory technologies support interactive analysis by providing high I/O throughput. On traditional high performance computing (HPC), big data processing needs data-intensive as well as computation-intensive systems for large data storage and high speed processing respectively. Currently, there are many such tools and technologies available which supports memory centric data processing to perform analysis on them. Taking advantage of in-memory on a HPC platform can result in a high speed, more reliable and fault tolerant data analysis. In this paper, we survey the existing storage and computation engines to perform big data analysis, and their performance while integrating together. Also, we discuss the contribution of such infrastructures in solving many I/O intensive analytical issues.

References

[1]
Pengfei Xuan, Feng Luo, Pradip K Srimani: Big Data Analytics on Traditional HPC Infrastructure Using Two-Level Storage. School of Computing, Clemson University (2015).
[2]
Jorge L. Reyes-Ortiz, Luca Oneto, and Davide Anguita: Big Data Analytics in the Cloud Spark on Hadoop vs MPI/OpenMP on Beowulf. The Scientific Programme Committee of INNS-BigData conference (2015).
[3]
H. Li, A. Ghodsi, M. Zaharia, S. Shenker, and I. Stoica. Tachyon: Reliable, memory speed storage for cluster computing frameworks. In SOCC, pages 1--15, 2014
[4]
Hao Zhang, Gang Chen, Member, IEEE, Beng Chin Ooi, Fellow, IEEE, Kian-Lee Tan, Member, IEEE, and Meihui Zhang, Member, IEEE: In-Memory Big Data Management and Processing: A Survey. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 7, JULY (2015).
[5]
Yuzhong Yan, Mahsa Hanifi, Liqi Yi, Lei Huang: Building a Productive Domain-Specific Cloud for Big Data Processing and Analytics Service. Journal of Computer and Communications, 3, 107--117, (2015)
[6]
Zaheer Khan, Ashiq Anjum, Kamran Soomro and Muhammad Atif Tahir: Towards cloud based big data analytics for smart future cities. Khan et al.; licensee Springer Journal of Cloud Computing: Advances, Systems and Applications 4:2 (2015).
[7]
Dilpreet Singh and Chandan K Reddy: A survey on platforms for big data analytics. Singh and Reddy Journal of Big Data 2014, 1:8.
[8]
M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, 2012
[9]
https://www.oreilly.com/ideas/accelerating-big-data-analytics-workloads-with-tachyon (Online)
[10]
Ahsan Javed Awan, Mats Brorsson, Vladimir Vlassov and Eduard Ayguade; Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server (2015).
[11]
Frank Austin Nothaft, Matt Massie, Timothy Danford; Rethinking Data-Intensive Science Using Scalable Analytics Systems. 2015.
[12]
Juwei Shiz, Yunjie Qiuy, Umar Farooq Minhasx, Limei Jiaoy, Chen Wang; Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics. Proceedings of the VLDB Endowment, Vol. 8, No. 13. (2015).
[13]
FAN ZHANG1, JUNWEI CA, WEI TAN: Evolutionary Scheduling of Dynamic Multitasking Workloads for Big-Data Analytics in Elastic. Transaction on emerging topics in computing, IEEE, (2014).
[14]
A. Eldawy and M. F. Mokbel. SpatialHadoop: A MapReduce framework for spatial data. In Proceedings of the IEEE International Conference on Data Engineering (ICDE '15). IEEE, 2015.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICTCS '16: Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies
March 2016
843 pages
ISBN:9781450339629
DOI:10.1145/2905055
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 March 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Analytics
  2. Big-data
  3. Hadoop
  4. High performance Computing
  5. I/O throughput
  6. In-memory
  7. Spark
  8. Tachyon

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICTCS '16

Acceptance Rates

Overall Acceptance Rate 97 of 270 submissions, 36%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 133
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media