Abstract
This paper describes an attempt to use a NoSQL database engine to manage custom metadata using a rich query interface as motivating and descriptive examples of what kind of functionality is desired. While the difficulties are numerous, a number of important considerations for how and when to use this alternative technology were revealed as well as some initial performance numbers showing the performance impact of those choices.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Apache: Apache Accumulo (2018). http://accumulo.apache.org. Accessed 18 Dec 2018
Baron, J., Kotecha, S.: Storage options in the AWS cloud. Amazon Web Services, Washington DC, Technical report (2013)
Edward Hartnett, E., Rew, R.K.: Experience with an enhanced NetCDF data model and interface for scientific data access. In: 24th Conference on IIPS (2008)
Folk, M., Heber, G., Koziol, Q., Pourmal, E., Robinson, D.: An overview of the HDF5 technology suite and its applications. In: Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases, pp. 36–47. ACM (2011)
Gamblin, T., et al.: The spack package manager: bringing order to HPC software chaos. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p. 40. ACM (2015)
Greenberg, H., Bent, J., Grider, G.: MDHIM: a parallel key/value framework for HPC. In: HotStorage (2015)
Khetrapal, A., Ganesh, V.: Hbase and hypertable for large scale distributed storage systems. Dept. of Computer Science, Purdue University, pp. 22–28 (2006)
Lakshman, A., Malik, P.: Cassandra: structured storage system on a P2P network. In: Proceedings of the 28th ACM Symposium on Principles of Distributed Computing, p. 5. ACM (2009)
Lamb, A., et al.: The vertica analytic database: C-store 7 years later. Proc. VLDB Endow. 5(12), 1790–1801 (2012)
Lawson, M.: EMPRESS Metadata Management System (2018). https://github.com/mlawsonca/empress. Accessed 18 Dec 2018
Lawson, M., Lofstead, J.: Using a robust metadata management system to accelerate scientific discovery at extreme scales. In: Proceedings of the 3rd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems. ACM (2018)
Lawson, M., et al.: Empress: extensible metadata provider for extreme-scale scientific simulations. In: Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems, pp. 19–24. ACM (2017)
Li, J., et al.: Parallel NetCDF: a high-performance scientific I/O interface. In: 2003 ACM/IEEE Conference on Supercomputing, p. 39, November 2003. https://doi.org/10.1109/SC.2003.10053
Lofstead, J., et al.: Six degrees of scientific data: reading patterns for extreme scale science IO. In: Proceedings of the 20th International Symposium on High Performance Distributed Computing, HPDC 2011, pp. 49–60. ACM (2011). http://doi.acm.org/10.1145/1996130.1996139
Lofstead, J., Zheng, F., Klasky, S., Schwan, K.: Adaptable, metadata rich IO methods for portable high performance IO. In: Proceedings of IPDPS 2009, Rome, Italy, 25–29 May 2009
Lofstead, J.F., Klasky, S., Schwan, K., Podhorszki, N., Jin, C.: Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS). In: Proceedings of the 6th International Workshop on Challenges of Large Applications in Distributed Environments, pp. 15–24. ACM (2008)
Rew, R., Hartnett, E., Caron, J., et al.: NetCDF-4: software implementing an enhanced data model for the geosciences. In: 22nd International Conference on Interactive Information Processing Systems for Meteorology, Oceanograph, and Hydrology (2006)
Sahin, S., Cao, W., Zhang, Q., Liu, L.: JVM configuration management and its performance impact for big data applications. In: 2016 IEEE International Congress on Big Data (BigData Congress), pp. 410–417. IEEE (2016)
Sevilla, M.A., et al.: Tintenfisch: file system namespace schemas and generators. In: The 10th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 2018) (2018)
Stax, D.: DataStax Cassandra Connector (2018). https://www.datastax.com/. Accessed 18 Dec 2018
Tang, H., Byna, S., Dong, B., Liu, J., Koziol, Q.: SoMeta: scalable object-centric metadata management for high performance computing. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 359–369. IEEE (2017)
Tschetter, E.: Introducing Druid (2012). http://druid.io/blog/2012/10/24/introducing-druid.html. Accessed 18 Dec 2018
Ulmer, C.D., et al.: Faodail: enabling in situ analytics for next-generation systems. Technical report, Sandia National Lab. (SNL-NM), Albuquerque, NM (United States) (2017)
Indiana University: IndexedHbase (2019). http://salsaproj.indiana.edu/IndexedHBase/HBguide.html. Accessed 14 June 2019
Vora, M.N.: Hadoop-hbase for large-scale data. In: 2011 International Conference on Computer Science and Network Technology (ICCSNT), vol. 1, pp. 601–605. IEEE (2011)
Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Lofstead, J., Ryan, A., Lawson, M. (2019). Adventures in NoSQL for Metadata Management. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11887. Springer, Cham. https://doi.org/10.1007/978-3-030-34356-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-34356-9_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34355-2
Online ISBN: 978-3-030-34356-9
eBook Packages: Computer ScienceComputer Science (R0)