research-article

Big Data Analytics Based on In-Memory Infrastructure On Traditional HPC: A Survey

Authors:

ICTCS '16: Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies

Article No.: 110, Pages 1 - 5

https://doi.org/10.1145/2905055.2905326

Published: 04 March 2016 Publication History

Get Access

Abstract

As the capacity of main memory is growing, in-memory based big data analytics is becoming more popular. In-memory technologies support interactive analysis by providing high I/O throughput. On traditional high performance computing (HPC), big data processing needs data-intensive as well as computation-intensive systems for large data storage and high speed processing respectively. Currently, there are many such tools and technologies available which supports memory centric data processing to perform analysis on them. Taking advantage of in-memory on a HPC platform can result in a high speed, more reliable and fault tolerant data analysis. In this paper, we survey the existing storage and computation engines to perform big data analysis, and their performance while integrating together. Also, we discuss the contribution of such infrastructures in solving many I/O intensive analytical issues.

References

[1]

Pengfei Xuan, Feng Luo, Pradip K Srimani: Big Data Analytics on Traditional HPC Infrastructure Using Two-Level Storage. School of Computing, Clemson University (2015).

Google Scholar

[2]

Jorge L. Reyes-Ortiz, Luca Oneto, and Davide Anguita: Big Data Analytics in the Cloud Spark on Hadoop vs MPI/OpenMP on Beowulf. The Scientific Programme Committee of INNS-BigData conference (2015).

Crossref

Google Scholar

[3]

H. Li, A. Ghodsi, M. Zaharia, S. Shenker, and I. Stoica. Tachyon: Reliable, memory speed storage for cluster computing frameworks. In SOCC, pages 1--15, 2014

Digital Library

Google Scholar

[4]

Hao Zhang, Gang Chen, Member, IEEE, Beng Chin Ooi, Fellow, IEEE, Kian-Lee Tan, Member, IEEE, and Meihui Zhang, Member, IEEE: In-Memory Big Data Management and Processing: A Survey. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 7, JULY (2015).

Google Scholar

[5]

Yuzhong Yan, Mahsa Hanifi, Liqi Yi, Lei Huang: Building a Productive Domain-Specific Cloud for Big Data Processing and Analytics Service. Journal of Computer and Communications, 3, 107--117, (2015)

Crossref

Google Scholar

[6]

Zaheer Khan, Ashiq Anjum, Kamran Soomro and Muhammad Atif Tahir: Towards cloud based big data analytics for smart future cities. Khan et al.; licensee Springer Journal of Cloud Computing: Advances, Systems and Applications 4:2 (2015).

Google Scholar

[7]

Dilpreet Singh and Chandan K Reddy: A survey on platforms for big data analytics. Singh and Reddy Journal of Big Data 2014, 1:8.

Google Scholar

[8]

M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, 2012

Digital Library

Google Scholar

[9]

https://www.oreilly.com/ideas/accelerating-big-data-analytics-workloads-with-tachyon (Online)

Google Scholar

[10]

Ahsan Javed Awan, Mats Brorsson, Vladimir Vlassov and Eduard Ayguade; Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server (2015).

Google Scholar

[11]

Frank Austin Nothaft, Matt Massie, Timothy Danford; Rethinking Data-Intensive Science Using Scalable Analytics Systems. 2015.

Google Scholar

[12]

Juwei Shiz, Yunjie Qiuy, Umar Farooq Minhasx, Limei Jiaoy, Chen Wang; Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics. Proceedings of the VLDB Endowment, Vol. 8, No. 13. (2015).

Digital Library

Google Scholar

[13]

FAN ZHANG1, JUNWEI CA, WEI TAN: Evolutionary Scheduling of Dynamic Multitasking Workloads for Big-Data Analytics in Elastic. Transaction on emerging topics in computing, IEEE, (2014).

Google Scholar

[14]

A. Eldawy and M. F. Mokbel. SpatialHadoop: A MapReduce framework for spatial data. In Proceedings of the IEEE International Conference on Data Engineering (ICDE '15). IEEE, 2015.

Crossref

Google Scholar

Big Data Analytics Based on In-Memory Infrastructure On Traditional HPC: A Survey

Recommendations

A Performance Study of Big Spatial Data Systems
BigSpatial '18: Proceedings of the 7th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data

With the accelerated growth in spatial data volume, being generated from a wide variety of sources, the need for efficient storage, retrieval, processing and analyzing of spatial data is ever more important. Hence, spatial data processing system has ...
Big data analytics on traditional HPC infrastructure using two-level storage
DISCS '15: Proceedings of the 2015 International Workshop on Data-Intensive Scalable Computing Systems

Data-intensive computing has become one of the major workloads on traditional high-performance computing (HPC) clusters. Currently, deploying data-intensive computing software framework on HPC clusters still faces performance and scalability issues. In ...
A Spark-Based Big Data Platform for Massive Remote Sensing Data Processing
ICDS 2015: Proceedings of the Second International Conference on Data Science - Volume 9208

With the fast development of remote sensing techniques, the volume of acquired data grows exponentially. This brings a big challenge to process massive remote sensing data. In the paper, an in-memory computing framework is proposed to address this ...

Comments

Information & Contributors

Information

Published In

ICTCS '16: Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies

March 2016

843 pages

ISBN:9781450339629

DOI:10.1145/2905055

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 March 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICTCS '16

ICTCS '16: Second International Conference on Information and Communication Technology for Competitive Strategies

March 4 - 5, 2016

Udaipur, India

Acceptance Rates

Overall Acceptance Rate 97 of 270 submissions, 36%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
133
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)1

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Recommendations

A Performance Study of Big Spatial Data Systems

Big data analytics on traditional HPC infrastructure using two-level storage

A Spark-Based Big Data Platform for Massive Remote Sensing Data Processing

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations