ABSTRACT
As the satellite imagery containing multi-dimensional array data is currently used for analysis of various applications, the frameworks to analyze that sort of scientific data have been introduced.
To process the scientific data like the satellite imagery there are some restrictions: for the analysis of large-scale data the aggregated data would be stored in specified data formats, for the time-series analysis of the huge size the specified file system would be needed as the data is rapidly increased, and so on. Although Hadoop framework which is big data computing platform is popular to process the big data it is not feasible to handle the scientific data. It does not support to process the data in different scientific formats. On the other hand, though SciDB is the data management system to mainly process large-scale array data, it is not appropriate to analyze the scalable data of the time series. In this paper, we propose hybrid clustering framework, which is to process the scientific data composed of the multidimensional arrays with time series.
The proposed framework would address the issues to provide the framework both processing array-based scientific data and handling ever-increasing data at the same time.
- H. T. Mai, K. H. Park, H. S. Lee, C. S. Kim, M. Lee, and S. J. Hur,: Dynamic Data Migration in Hybrid Main Memories for In-Memory Big Data Storage: ETRI Journal, vol.36, no6, pp. 988--998(2014)Google Scholar
- P. Cudre-Mauroux, H. Kimura, K.-T. Lim, J. Rogers, R. Simakov, E. Soroush, P. Velikhov, D. L. Wang, M. Balazinska, J. Becla, D. DeWitt, B. Heath, D. Maier, S. Madden, J. Patel, M. Stonebraker, and S. Zdoni: A demonstration of scidb: a science-oriented dbms. Proc. VLDB Endow., 2(2):1534--1537(2009) Google ScholarDigital Library
- Yi Wang Wei Jiang Gagan Agrawal: SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats: CCGrid 2012, 13--16(2012) Google ScholarDigital Library
- Wei Jiang and Gagan Agrawal. Ex-MATE: Data Intensive Computing with Large Reduction Objects and Its Application to Graph Mining: In Proceedings of CCGRID, pages 475--484(2011) Google ScholarDigital Library
- Wei Jiang, Vignesh T. Ravi, and Gagan Agrawal: A Map-Reduce System with an Alternate API for Multi-core Environments: In Proceedings of CCGRID, pages 84--93(2010) Google ScholarDigital Library
- Joe B. Buck Noah Watkins Jeff LeFevre Kleoni Ioannidou Carlos Maltzahn Neoklis Polyzotis Scott Brandt: SciHadoop: array-based query processing in Hadoop: SC11 November 12-18, Seattle, WA, USA (2011)Google Scholar
- Sarade Shrikant D., Ghule Nilkanth B., Disale Swapnil P., Sasane Sandip R: Large scale satellite image processing using Hadoop distribution system: IJARCET, Volume 3 Issue 3 (2014)Google Scholar
- The SciDB Development Team http://www.scidb.org: Overview of SciDB Large Scale Array Storage, Processing and Analysis: SIGMOD'10, Indiana, USA (2010). Google ScholarDigital Library
Recommendations
A Spark-Based Big Data Platform for Massive Remote Sensing Data Processing
ICDS 2015: Proceedings of the Second International Conference on Data Science - Volume 9208With the fast development of remote sensing techniques, the volume of acquired data grows exponentially. This brings a big challenge to process massive remote sensing data. In the paper, an in-memory computing framework is proposed to address this ...
A Multi-dimensional Comparison of Toolkits for Machine Learning with Big Data
IRI '15: Proceedings of the 2015 IEEE International Conference on Information Reuse and IntegrationBig data is a big business, and effective modeling of this data is key. This paper provides a comprehensive multidimensional analysis of various open source tools for machine learning with big data. An evaluation standard is proposed along with detailed ...
Making sense of performance in in-memory computing frameworks for scientific data analysis: A case study of the spark system
AbstractOver the last five years, Apache Spark has become a major software platform for in-memory data analysis. Acknowledging its widespread use, we present a comprehensive study of system characteristics of Spark targeting scientific data ...
Highlights- We develop a benchmark, ArrayBench, for benchmarking scientific data analytics that process gene expression matrices using Spark and SciDB.
Comments