short-paper

Hybrid Clustering Framework for Multi-dimensional Array Data

Authors:
Hyeon Park

ETRI, Yuseong-gu, Daejeon, Korea

ETRI, Yuseong-gu, Daejeon, Korea
View Profile

,
Dae-Heon Park

ETRI, Yuseong-gu, Daejeon, Korea

ETRI, Yuseong-gu, Daejeon, Korea
View Profile

,
Eun-Ju Lee

ETRI, Yuseong-gu, Daejeon, Korea

ETRI, Yuseong-gu, Daejeon, Korea
View Profile

,
Se-Han Kim

ETRI, Yuseong-gu, Daejeon, Korea

ETRI, Yuseong-gu, Daejeon, Korea
View Profile

BigDAS '15: Proceedings of the 2015 International Conference on Big Data Applications and ServicesOctober 2015Pages 225–228https://doi.org/10.1145/2837060.2837101

Published:20 October 2015Publication History

BigDAS '15: Proceedings of the 2015 International Conference on Big Data Applications and Services

Pages 225–228

ABSTRACT

As the satellite imagery containing multi-dimensional array data is currently used for analysis of various applications, the frameworks to analyze that sort of scientific data have been introduced.

To process the scientific data like the satellite imagery there are some restrictions: for the analysis of large-scale data the aggregated data would be stored in specified data formats, for the time-series analysis of the huge size the specified file system would be needed as the data is rapidly increased, and so on. Although Hadoop framework which is big data computing platform is popular to process the big data it is not feasible to handle the scientific data. It does not support to process the data in different scientific formats. On the other hand, though SciDB is the data management system to mainly process large-scale array data, it is not appropriate to analyze the scalable data of the time series. In this paper, we propose hybrid clustering framework, which is to process the scientific data composed of the multidimensional arrays with time series.

The proposed framework would address the issues to provide the framework both processing array-based scientific data and handling ever-increasing data at the same time.

References

H. T. Mai, K. H. Park, H. S. Lee, C. S. Kim, M. Lee, and S. J. Hur,: Dynamic Data Migration in Hybrid Main Memories for In-Memory Big Data Storage: ETRI Journal, vol.36, no6, pp. 988--998(2014)Google Scholar
P. Cudre-Mauroux, H. Kimura, K.-T. Lim, J. Rogers, R. Simakov, E. Soroush, P. Velikhov, D. L. Wang, M. Balazinska, J. Becla, D. DeWitt, B. Heath, D. Maier, S. Madden, J. Patel, M. Stonebraker, and S. Zdoni: A demonstration of scidb: a science-oriented dbms. Proc. VLDB Endow., 2(2):1534--1537(2009) Google ScholarDigital Library
Yi Wang Wei Jiang Gagan Agrawal: SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats: CCGrid 2012, 13--16(2012) Google ScholarDigital Library
Wei Jiang and Gagan Agrawal. Ex-MATE: Data Intensive Computing with Large Reduction Objects and Its Application to Graph Mining: In Proceedings of CCGRID, pages 475--484(2011) Google ScholarDigital Library
Wei Jiang, Vignesh T. Ravi, and Gagan Agrawal: A Map-Reduce System with an Alternate API for Multi-core Environments: In Proceedings of CCGRID, pages 84--93(2010) Google ScholarDigital Library
Joe B. Buck Noah Watkins Jeff LeFevre Kleoni Ioannidou Carlos Maltzahn Neoklis Polyzotis Scott Brandt: SciHadoop: array-based query processing in Hadoop: SC11 November 12-18, Seattle, WA, USA (2011)Google Scholar
Sarade Shrikant D., Ghule Nilkanth B., Disale Swapnil P., Sasane Sandip R: Large scale satellite image processing using Hadoop distribution system: IJARCET, Volume 3 Issue 3 (2014)Google Scholar
The SciDB Development Team http://www.scidb.org: Overview of SciDB Large Scale Array Storage, Processing and Analysis: SIGMOD'10, Indiana, USA (2010). Google ScholarDigital Library

Recommendations

A Spark-Based Big Data Platform for Massive Remote Sensing Data Processing
ICDS 2015: Proceedings of the Second International Conference on Data Science - Volume 9208

With the fast development of remote sensing techniques, the volume of acquired data grows exponentially. This brings a big challenge to process massive remote sensing data. In the paper, an in-memory computing framework is proposed to address this ...
Read More
A Multi-dimensional Comparison of Toolkits for Machine Learning with Big Data
IRI '15: Proceedings of the 2015 IEEE International Conference on Information Reuse and Integration

Big data is a big business, and effective modeling of this data is key. This paper provides a comprehensive multidimensional analysis of various open source tools for machine learning with big data. An evaluation standard is proposed along with detailed ...
Read More
Making sense of performance in in-memory computing frameworks for scientific data analysis: A case study of the spark system
Abstract
Over the last five years, Apache Spark has become a major software platform for in-memory data analysis. Acknowledging its widespread use, we present a comprehensive study of system characteristics of Spark targeting scientific data ...
Highlights
- We develop a benchmark, ArrayBench, for benchmarking scientific data analytics that process gene expression matrices using Spark and SciDB.
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
BigDAS '15: Proceedings of the 2015 International Conference on Big Data Applications and Services
October 2015
321 pages
ISBN:9781450338462
DOI:10.1145/2837060
Conference Chairs:
Jongsup Choi,
Sun Hwa Han,
Joo-Yeoun Lee,
Taeho Park,
Editor:
Aziz Nasridinov,
Program Chairs:
Carson K. Leung,
Yoo-Sung Kim,
Young-Koo Lee
Copyright © 2015 ACM
© 2015 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 October 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Big data
Hadoop
Hybrid clustering
Satellite imagery
SciDB
Qualifiers
- short-paper
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 50
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Hybrid Clustering Framework for Multi-dimensional Array Data

BigDAS '15: Proceedings of the 2015 International Conference on Big Data Applications and Services

ABSTRACT

References

Cited By

Recommendations

A Spark-Based Big Data Platform for Massive Remote Sensing Data Processing

A Multi-dimensional Comparison of Toolkits for Machine Learning with Big Data

Making sense of performance in in-memory computing frameworks for scientific data analysis: A case study of the spark system