skip to main content
10.1145/2903150.2911719acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

An in-memory based framework for scientific data analytics

Published: 16 May 2016 Publication History

Abstract

This work presents the I/O in-memory server implemented in the context of the Ophidia framework, a big data analytics stack addressing scientific data analysis of n-dimensional datasets. The provided I/O server represents a key component in the Ophidia 2.0 architecture proposed in this paper. It exploits (i) a NoSQL approach to manage scientific data at the storage level, (ii) user-defined functions to perform array-based analytics, (iii) the Ophidia Storage API to manage heterogeneous back-ends through a plugin-based approach, and (iv) an in-memory and parallel analytics engine to address high scalability and performance. Preliminary performance results about a statistical analytics kernel benchmark performed on a HPC cluster running at the CMCC SuperComputing Centre are provided in this paper.

References

[1]
Openmp application program interface version 3.0. http://www.openmp.org/mp-documents/spec30.pdf, May 2008. Accessed: February, 24. 2016.
[2]
Open geospatial consortium. http://www.opengeospatial.org/, April 2015. Accessed: February, 24. 2016.
[3]
G. Bell, T. Hey, and A. Szalay. Beyond the data deluge. Science, 323(5919):1297--1298, 2009.
[4]
R. Chumbley, J. Durand, M. Hainline, G. Pilz, and T. Rutt. Basic profile version 1.2. http://ws-i.org/Profiles/BasicProfile-1.2-2010-11-09.html, September 2010. Accessed: February, 24. 2016.
[5]
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Commun. ACM, 51(1):107--113, Jan. 2008.
[6]
R. T. Fielding and R. N. Taylor. Principled design of the modern web architecture. ACM Trans. Internet Techn., 2(2):115--150, 2002.
[7]
S. Fiore, A. D'Anca, C. Palazzo, I. T. Foster, D. N. Williams, and G. Aloisio. Ophidia: Toward big data analytics for escience. In Proceedings of the International Conference on Computational Science, ICCS 2013, Barcelona, Spain, 5--7 June, 2013, pages 2376--2385, 2013.
[8]
I. Foster, C. Kesselman, G. Tsudik, and S. Tuecke. A security architecture for computational grids. In Proceedings of the 5th ACM Conference on Computer and Communications Security, CCS '98, pages 83--92, New York, NY, USA, 1998. ACM.
[9]
J. Gray, D. T. Liu, M. Nieto-Santisteban, A. Szalay, D. J. DeWitt, and G. Heber. Scientific data management in the coming decade. SIGMOD Rec., 34(4):34--41, Dec. 2005.
[10]
N. Leavitt. Will nosql databases live up to their promise? Computer, 43(2):12--14, Feb. 2010.
[11]
J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra, A. Narayanan, G. Parulkar, M. Rosenblum, S. M. Rumble, E. Stratmann, and R. Stutsman. The case for ramclouds: Scalable high-performance storage entirely in dram. SIGOPS Oper. Syst. Rev., 43(4):92--105, Jan. 2010.
[12]
C. Palazzo, A. Mariello, S. Fiore, A. D'Anca, D. Elia, D. N. Williams, and G. Aloisio. A workflow-enabled big data analytics software stack for escience. In 2015 International Conference on High Performance Computing & Simulation, HPCS 2015, Amsterdam, Netherlands, July 20--24, 2015, pages 545--552, 2015.
[13]
R. K. Rew and G. P. Davis. The unidata netcdf: Software for scientific data access. In Sixth International Conference on Interactive Information and Processing Systems for Meteorology, Oceanography, and Hydrology, pages 33--40, 1990.
[14]
K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed file system. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), MSST '10, pages 1--10, Washington, DC, USA, 2010. IEEE Computer Society.
[15]
M. Stonebraker, P. Brown, A. Poliakov, and S. Raman. The architecture of scidb. In Proceedings of the 23rd International Conference on Scientific and Statistical Database Management, SSDBM'11, pages 1--16, Berlin, Heidelberg, 2011. Springer-Verlag.
[16]
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI'12, pages 2--2, Berkeley, CA, USA, 2012. USENIX Association.
[17]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud'10, pages 10--10, Berkeley, CA, USA, 2010. USENIX Association.
[18]
H. Zhang, G. Chen, B. C. Ooi, K. L. Tan, and M. Zhang. In-memory big data management and processing: A survey. IEEE Transactions on Knowledge and Data Engineering, 27(7):1920--1948, July 2015.

Cited By

View all
  • (2023)End-to-End Workflows for Climate Science: Integrating HPC Simulations, Big Data Processing, and Machine LearningProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624283(2042-2052)Online publication date: 12-Nov-2023
  • (2021)Towards HPC and Big Data Analytics Convergence: Design and Experimental Evaluation of a HPDA Framework for eScience at ScaleIEEE Access10.1109/ACCESS.2021.30791399(73307-73326)Online publication date: 2021
  • (2019)BigDataCube: A Scalable, Federated Service Platform for Copernicus2019 IEEE International Conference on Big Data (Big Data)10.1109/BigData47090.2019.9006222(4103-4112)Online publication date: Dec-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CF '16: Proceedings of the ACM International Conference on Computing Frontiers
May 2016
487 pages
ISBN:9781450341288
DOI:10.1145/2903150
  • General Chairs:
  • Gianluca Palermo,
  • John Feo,
  • Program Chairs:
  • Antonino Tumeo,
  • Hubertus Franke
© 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 May 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. big data analytics
  2. in-memory analytics engine
  3. scientific data management

Qualifiers

  • Research-article

Funding Sources

  • EU H2020 EUBRA-BIGSEA
  • Italian Ministry of Education, Universities and Research
  • EU H2020 ESiWACE
  • U.S. Department of Energy, Office of Science

Conference

CF'16
Sponsor:
CF'16: Computing Frontiers Conference
May 16 - 19, 2016
Como, Italy

Acceptance Rates

CF '16 Paper Acceptance Rate 30 of 94 submissions, 32%;
Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)2
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)End-to-End Workflows for Climate Science: Integrating HPC Simulations, Big Data Processing, and Machine LearningProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624283(2042-2052)Online publication date: 12-Nov-2023
  • (2021)Towards HPC and Big Data Analytics Convergence: Design and Experimental Evaluation of a HPDA Framework for eScience at ScaleIEEE Access10.1109/ACCESS.2021.30791399(73307-73326)Online publication date: 2021
  • (2019)BigDataCube: A Scalable, Federated Service Platform for Copernicus2019 IEEE International Conference on Big Data (Big Data)10.1109/BigData47090.2019.9006222(4103-4112)Online publication date: Dec-2019
  • (2019)An Integrated Big and Fast Data Analytics Platform for Smart Urban Transportation ManagementIEEE Access10.1109/ACCESS.2019.29369417(117652-117677)Online publication date: 2019
  • (2019)Towards High Performance Data Analytics for Climate ChangeHigh Performance Computing10.1007/978-3-030-34356-9_20(240-257)Online publication date: 3-Dec-2019
  • (2017)Review on Big Data & Analytics – Concepts, Philosophy, Process and ApplicationsCybernetics and Information Technologies10.1515/cait-2017-001317:2(3-27)Online publication date: 26-Jun-2017
  • (2017)Big Data Analytics on Large-Scale Scientific Datasets in the INDIGO-DataCloud ProjectProceedings of the Computing Frontiers Conference10.1145/3075564.3078884(343-348)Online publication date: 15-May-2017
  • (2017)On the Use of In-Memory Analytics Workflows to Compute eScience Indicators from Large Climate DatasetsProceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2017.132(1035-1043)Online publication date: 14-May-2017
  • (2016)Distributed and cloud-based multi-model analytics experiments on large volumes of climate change data in the earth system grid federation eco-system2016 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2016.7840941(2911-2918)Online publication date: Dec-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media