skip to main content
10.1145/3330345.3330380acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article
Public Access

Henosis: workload-driven small array consolidation and placement for HDF5 applications on heterogeneous data stores

Published: 26 June 2019 Publication History

Abstract

Scientific data analysis pipelines face scalability bottlenecks when processing massive datasets that consist of millions of small files. Such datasets commonly arise in domains as diverse as detecting supernovae and post-processing computational fluid dynamics simulations. Furthermore, applications often use inference frameworks such as TensorFlow and PyTorch whose naive I/O methods exacerbate I/O bottlenecks. One solution is to use scientific file formats, such as HDF5 and FITS, to organize small arrays in one big file. However, storing everything in one file does not fully leverage the heterogeneous data storage capabilities of modern clusters.
This paper presents Henosis, a system that intercepts data accesses inside the HDF5 library and transparently redirects I/O to the in-memory Redis object store or the disk-based TileDB array store. During this process, Henosis consolidates small arrays into bigger chunks and intelligently places them in data stores. A critical research aspect of Henosis is that it formulates object consolidation and data placement as a single optimization problem. Henosis carefully constructs a graph to capture the I/O activity of a workload and produces an initial solution to the optimization problem using graph partitioning. Henosis then refines the solution using a hill-climbing algorithm which migrates arrays between data stores to minimize I/O cost. The evaluation on two real scientific data analysis pipelines shows that consolidation with Henosis makes I/O 300× faster than directly reading small arrays from TileDB and 3.5× faster than workload-oblivious consolidation methods. Moreover, jointly optimizing consolidation and placement in Henosis makes I/O 1.7× faster than strategies that perform consolidation and placement independently.

References

[1]
Sanjay Agrawal, Eric Chu, and Vivek Narasayya. 2006. Automatic Physical Design Tuning: Workload As a Sequence. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (SIGMOD '06). ACM, New York, NY, USA, 683--694.
[2]
Sanjay Agrawal, Vivek Narasayya, and Beverly Yang. 2004. Integrating Vertical and Horizontal Partitioning into Automated Physical Database Design. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data (SIGMOD '04). ACM, New York, NY, USA, 359--370.
[3]
Ahmed M. Aly, Ahmed R. Mahmood, Mohamed S. Hassan, Walid G. Aref, Mourad Ouzzani, Hazem Elmeleegy, and Thamir Qadah. 2015. AQWA: Adaptive Query Workload Aware Partitioning of Big Spatial Data. Proc. VLDB Endow. 8, 13 (Sept. 2015), 2062--2073.
[4]
Leilani Battle, Remco Chang, and Michael Stonebraker. 2016. Dynamic prefetching of data tiles for interactive visualization. In Proceedings of the 2016 International Conference on Management of Data. ACM, 1363--1375.
[5]
Doug Beaver, Sanjeev Kumar, Harry C. Li, Jason Sobel, and Peter Vajgel. 2010. Finding a Needle in Haystack: Facebook's Photo Storage. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI'10). USENIX Association, Berkeley, CA, USA, 47--60. http://dl.acm.org/citation.cfm?id=1924943.1924947
[6]
Spyros Blanas, Kesheng Wu, Surendra Byna, Bin Dong, and Arie Shoshani. 2014. Parallel Data Analysis Directly on Scientific File Formats. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD '14). ACM, New York, NY, USA, 385--396.
[7]
Michael R Blanton, Matthew A Bershady, Bela Abolfathi, Franco D Albareti, Carlos Allende Prieto, Andres Almeida, Javier Alonso-García, Friedrich Anders, Scott F Anderson, Brett Andrews, et al. 2017. Sloan digital sky survey IV: Mapping the Milky Way, nearby galaxies, and the distant universe. The Astronomical Journal 154, 1 (2017), 28.
[8]
James K Bonfield and Rodger Staden. 2002. ZTR: a new format for DNA sequence trace data. Bioinformatics 18, 1 (2002), 3--10.
[9]
Kevin J Bowers, BJ Albright, L Yin, B Bergen, and TJT Kwan. 2008. Ultrahigh performance three-dimensional electromagnetic relativistic kinetic plasma simulation. Physics of Plasmas 15, 5 (2008), 055703.
[10]
Paul G. Brown. 2010. Overview of SciDB: Large Scale Array Storage, Processing and Analysis. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD '10). ACM, New York, NY, USA, 963--968.
[11]
Mustafa Canim, George A. Mihaila, Bishwaranjan Bhattacharjee, Kenneth A. Ross, and Christian A. Lang. 2009. An Object Placement Advisor for DB2 Using Solid State Storage. Proc. VLDB Endow. 2, 2 (Aug. 2009), 1318--1329.
[12]
Feng Chen, David A. Koufaty, and Xiaodong Zhang. 2011. Hystor: Making the Best Use of Solid State Drives in High Performance Storage Systems. In Proceedings of the International Conference on Supercomputing (ICS '11). ACM, New York, NY, USA, 22--32.
[13]
Yue Cheng, M. Safdar Iqbal, Aayush Gupta, and Ali R. Butt. 2015. CAST: Tiering Storage for Data Analytics in the Cloud. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '15). ACM, New York, NY, USA, 45--56.
[14]
Yu Cheng and Florin Rusu. 2014. Parallel in-situ data processing with speculative loading. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, 1287--1298.
[15]
Carlo Curino, Evan Jones, Yang Zhang, and Sam Madden. 2010. Schism: A Workload-driven Approach to Database Replication and Partitioning. Proc. VLDB Endow. 3, 1--2 (Sept. 2010), 48--57.
[16]
Simon Driscoll, Alessio Bozzo, Lesley J Gray, Alan Robock, and Georgiy Stenchikov. 2012. Coupled Model Intercomparison Project 5 (CMIP5) simulations of climate following volcanic eruptions. Journal of Geophysical Research: Atmospheres 117, D17 (2012).
[17]
Quang Duong, Sharad Goel, Jake Hofman, and Sergei Vassilvitskii. 2013. Sharding Social Networks. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining (WSDM '13). ACM, New York, NY, USA, 223--232.
[18]
Apache Hadoop. 2013 (accessed May 23, 2019). Hadoop Archives Guide. https://hadoop.apache.org/docs/r1.2.1/hadoop_archives.html.
[19]
TW-S Holoien, JS Brown, KZ Stanek, CS Kochanek, BJ Shappee, JL Prieto, Subo Dong, J Brimacombe, DW Bishop, S Bose, et al. 2017. The ASAS-SN bright supernova catalogue-III. 2016. Monthly Notices of the Royal Astronomical Society 471, 4 (2017), 4966--4981.
[20]
Binbing Hou and Feng Chen. 2018. Pacaca: Mining Object Correlations and Parallelism for Enhancing User Experience with Cloud Storage. In 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE, 293--305.
[21]
George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on scientific Computing 20, 1 (1998), 359--392.
[22]
Kalyan Khandrika. 2018. ASHWHIN: Array Storage system on HadoopFS With HDF5 Interface. Master's thesis. The Ohio State University.
[23]
Feilong Liu, Ario Salmasi, Spyros Blanas, and Anastasios Sidiropoulos. 2018. Chasing Similarity: Distribution-aware Aggregation Scheduling. PVLDB 12, 3 (2018), 292--306. http://www.vldb.org/pvldb/vol12/p292-liu.pdf
[24]
Haikun Liu, Yujie Chen, Xiaofei Liao, Hai Jin, Bingsheng He, Long Zheng, and Rentong Guo. 2017. Hardware/Software Cooperative Caching for Hybrid DRAM/NVM Memory Architectures. In Proceedings of the International Conference on Supercomputing (ICS '17). ACM, New York, NY, USA, Article 26, 10 pages.
[25]
Tiago Macedo and Fred Oliveira. 2011. Redis Cookbook: Practical Techniques for Fast Data Manipulation. O'Reilly Media, Inc.
[26]
Shadi A. Noghabi, Sriram Subramanian, Priyesh Narayanan, Sivabalan Narayanan, Gopalakrishna Holla, Mammad Zadeh, Tianwei Li, Indranil Gupta, and Roy H. Campbell. 2016. Ambry: LinkedIn's Scalable Geo-Distributed Object Store. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). ACM, New York, NY, USA, 253--265.
[27]
S. Papadomanolakis and A. Ailamaki. 2004. AutoPart: automating schema design for large scientific databases using data partitioning. In Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004. 383--392.
[28]
Stavros Papadopoulos, Kushal Datta, Samuel Madden, and Timothy Mattson. 2016. The TileDB Array Data Storage Manager. Proc. VLDB Endow. 10, 4 (Nov. 2016), 349--360.
[29]
Andrew Pavlo, Carlo Curino, and Stanley Zdonik. 2012. Skew-aware Automatic Database Partitioning in Shared-nothing, Parallel OLTP Systems. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD '12). ACM, New York, NY, USA, 61--72.
[30]
Josep M. Pujol, Vijay Erramilli, Georgos Siganos, Xiaoyuan Yang, Nikos Laoutaris, Parminder Chhabra, and Pablo Rodriguez. 2010. The Little Engine(s) That Could: Scaling Online Social Networks. In Proceedings of the ACM SIGCOMM 2010 Conference (SIGCOMM '10). ACM, New York, NY, USA, 375--386.
[31]
Luiz E. Ramos, Eugene Gorbatov, and Ricardo Bianchini. 2011. Page Placement in Hybrid Memory Systems. In Proceedings of the International Conference on Supercomputing (ICS '11). ACM, New York, NY, USA, 85--95.
[32]
Jun Rao, Chun Zhang, Nimrod Megiddo, and Guy Lohman. 2002. Automating Physical Database Design in a Parallel Database. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD '02). ACM, New York, NY, USA, 558--569.
[33]
Peter Scheuermann, Gerhard Weikum, and Peter Zabback. 1998. Data Partitioning and Load Balancing in Parallel Disk Systems. The VLDB Journal 7, 1 (Feb. 1998), 48--66.
[34]
Emad Soroush, Magdalena Balazinska, and Daniel Wang. 2011. ArrayStore: A Storage Manager for Complex Parallel Array Processing. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD '11). ACM, New York, NY, USA, 253--264.
[35]
Liwen Sun, Michael J. Franklin, Sanjay Krishnan, and Reynold S. Xin. 2014. Fine-grained Partitioning for Aggressive Data Skipping. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD '14). ACM, New York, NY, USA, 1115--1126.
[36]
S. Unnikrishnan and Datta V. Gaitonde. 2016. A high-fidelity method to analyze perturbation evolution in turbulent flows. J. Comput. Phys. 310 (2016), 45 -- 62.
[37]
Y. Wang, S. Parthasarathy, and P. Sadayappan. 2013. Stratification driven placement of complex data: A framework for distributed data analytics. In 2013 IEEE 29th International Conference on Data Engineering (ICDE). 709--720.
[38]
Shasha Wen, Lucy Cherkasova, Felix Xiaozhu Lin, and Xu Liu. 2018. ProfDP: A Lightweight Profiler to Guide Data Placement in Heterogeneous Memory Systems. In Proceedings of the 2018 International Conference on Supercomputing (ICS '18). ACM, New York, NY, USA, 263--273.
[39]
Tom White. 2012. Hadoop: The definitive guide. O'Reilly Media, Inc.
[40]
H. Xing, S. Floratos, S. Blanas, S. Byna, Prabhat, K. Wu, and P. Brown. 2018. ArrayBridge: Interweaving declarative array processing in SciDB with imperative HDF5-based programs. In 2018 IEEE 34th International Conference on Data Engineering. 1--12.
[41]
Hui Yang, Srinivasan Parthasarathy, and Sameep Mehta. 2005. A Generalized Framework for Mining Spatio-temporal Patterns in Scientific Data. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD '05). ACM, New York, NY, USA, 716--721.
[42]
Jingren Zhou, Nicolas Bruno, and Wei Lin. 2012. Advanced Partitioning Techniques for Massively Distributed Computation. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD '12). ACM, New York, NY, USA, 13--24.

Cited By

View all
  • (2020)Predicting and Comparing the Performance of Array Management Libraries2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS47924.2020.00097(906-915)Online publication date: May-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '19: Proceedings of the ACM International Conference on Supercomputing
June 2019
533 pages
ISBN:9781450360791
DOI:10.1145/3330345
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2019

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

ICS '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)94
  • Downloads (Last 6 weeks)10
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Predicting and Comparing the Performance of Array Management Libraries2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS47924.2020.00097(906-915)Online publication date: May-2020

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media