IMSS: In-Memory Storage System for Data Intensive Applications

Garcia-Blas, Javier; Singh, David E.; Carretero, Jesus

doi:10.1007/978-3-031-23220-6_13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13387))

Included in the following conference series:

International Conference on High Performance Computing

Abstract

Computer applications are growing in terms of data management requirements. In both scientific and engineering domains, high-performance computing clusters tend to experience bottlenecks in the I/O layer, limiting the scalability of data-intensive based applications. Thus, minimizing the number of cycles required by I/O operations constitutes a widely addressed challenge. In order to cope with that constraint, distributed in-memory store solutions provide a network-attached storage system using the compute nodes main memory as storage device. This solution provides a temporary but faster storage approach than those based on non-volatile memory like SSDs. This work presents a novel ad-hoc in-memory storage system focused on data management and data distribution, namely IMSS. Our solution accelerates both data and metadata management, taking advantage of ZeroMQ, a fast and flexible communication mechanism. One of the main contributions of IMSS is that it incorporates multiple distribution policies for both optimizing network performance and increasing load-balance. The experimental evaluation demonstrates that our proposal outperforms Redis, a well-known in-memory data structure store, outperforming Redis in both write and read data accesses.

This work was partially supported by the EU project “ASPIDE: Exascale Programming Models for Extreme Data Processing” under grant 801091. This work has been partially funded by the European Union’s Horizon 2020 under the ADMIRE project, grant Agreement number 956748-ADMIRE-H2020-JTI-EuroHPC-2019-1. This research was partially supported by Madrid regional Government (Spain) under the grant “Convergencia Big Data-HPC: de los sensores a las Aplicaciones. (CABAHLA-CM)”. Finally, this work was partially supported by the Spanish Ministry of Science and Innovation Project “New Data Intensive Computing Methods for High-End and Edge Computing Platforms (DECIDE)” Ref. PID2019-107858GB-I00.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Invited Paper: Towards Practical Atomic Distributed Shared Memory: An Experimental Evaluation

DAOS: A Scale-Out High Performance Storage Stack for Storage Class Memory

ONFS: a hierarchical hybrid file system based on memory, SSD, and HDD for high performance computers

Article 01 December 2017

Notes

References

Aghayev, A., Weil, S., Kuchnik, M., Nelson, M., Ganger, G.R., Amvrosiadis, G.: The case for custom storage backends in distributed storage systems. ACM Trans. Storage (TOS) 16(2), 1–31 (2020)
Article Google Scholar
Braam, P.J., Schwan, P.: Lustre: the intergalactic file system. In: Ottawa Linux Symposium, vol. 8, pp. 3429–3441 (2002)
Google Scholar
Brinkmann, A., et al.: Ad hoc file systems for high-performance computing. J. Comput. Sci. Technol. 35(1), 4–26 (2020). https://doi.org/10.1007/s11390-020-9801-1
Article Google Scholar
Duro, F.R., Blas, J.G., Carretero, J.: A hierarchical parallel storage system based on distributed memory for large scale systems. In: Proceedings of the 20th European MPI Users’ Group Meeting, pp. 139–140 (2013)
Google Scholar
Hintjens, P.: Zeromq: an open-source universal messaging library (2007). https://zeromq.org
Isaila, F., Garcia, J., Carretero, J., Ross, R., Kimpe, D.: Making the case for reforming the I/O software stack of extreme-scale systems. Adv. Eng. Softw. 111, 26–31 (2017). https://doi.org/10.1016/j.advengsoft.2016.07.003, http://www.sciencedirect.com/science/article/pii/S0965997816301740, advances in High Performance Computing: on the path to Exascale software
Kune, R., Konugurthi, P.K., Agarwal, A., Chillarige, R.R., Buyya, R.: The anatomy of big data computing. Softw. Pract. Exp. 46(1), 79–105 (2016)
Google Scholar
Lauener, J., Sliwinski, W.: How to design & implement a modern communication middleware based on ZeroMQ. In: 16th International Conference on Accelerator and Large Experimental Physics Control Systems, p. MOBPL05 (2018). https://doi.org/10.18429/JACoW-ICALEPCS2017-MOBPL05
Li, H.: Alluxio: A virtual distributed file system. Ph.D. thesis, UC Berkeley (2018)
Google Scholar
Lu, Y., Shu, J., Chen, Y., Li, T.: Octopus: an rdma-enabled distributed persistent memory file system. In: 2017 $\{$USENIX$\}$ Annual Technical Conference ($\{$USENIX$\}$$\{$ATC$\}$ 2017), pp. 773–785 (2017)
Google Scholar
Miller, E.L., Brandt, S.A., Long, D.D.: Hermes: high-performance reliable mram-enabled storage. In: Proceedings Eighth Workshop on Hot Topics in Operating Systems, pp. 95–99. IEEE (2001)
Google Scholar
Narasimhamurthy, S., et al.: Sage: percipient storage for exascale data centric computing. Parallel Comput. 83, 22–33 (2019)
Article Google Scholar
Nishtala, R., et al.: Scaling memcache at facebook. In: Presented as part of the 10th $\{$USENIX$\}$ Symposium on Networked Systems Design and Implementation ($\{$NSDI$\}$ 2013), pp. 385–398 (2013)
Google Scholar
Radulovic, M., Asifuzzaman, K., Carpenter, P., Radojković, P., Ayguadé, E.: HPC benchmarking: scaling right and looking beyond the average. In: Aldinucci, M., Padovani, L., Torquati, M. (eds.) Euro-Par 2018. LNCS, vol. 11014, pp. 135–146. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96983-1_10
Chapter Google Scholar
Sanfilippo, S., Noordhuis., P.: Redis (2009). https://redis.io
Schmuck, F.B., Haskin, R.L.: GPFS: a shared-disk file system for large computing clusters. In: FAST, vol. 2 (2002)
Google Scholar
Tirumala, A.: Iperf: the TCP/UDP bandwidth measurement tool (1999). http://dast.nlanr.net/Projects/Iperf/
Vahi, K., Rynge, M., Juve, G., Mayani, R., Deelman, E.: Rethinking data management for big data scientific workflows. In: 2013 IEEE International Conference on Big Data, pp. 27–35. IEEE (2013)
Google Scholar
Vef, M., et al.: Gekkofs - a temporary distributed file system for hpc applications. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp. 319–324 (2018)
Google Scholar
Wang, T., Mohror, K., Moody, A., Sato, K., Yu, W.: An ephemeral burst-buffer file system for scientific applications. In: SC 2016: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 807–818 (2016)
Google Scholar
Weil, S.A., Brandt, S.A., Miller, E.L., Maltzahn, C.: Crush: controlled, scalable, decentralized placement of replicated data. In: SC 2006: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, pp. 31–31. IEEE (2006)
Google Scholar
Wiggins, A., Langston, J.: Enhancing the scalability of memcached. Intel document, unpublished (2012). http://software.intel.com/en-us/articles/enhancing-the-scalability-of-memcached
Yang, J., Izraelevitz, J., Swanson, S.: Orion: a distributed file system for non-volatile main memory and rdma-capable networks. In: 17th $\{$USENIX$\}$ Conference on File and Storage Technologies ($\{$FAST$\}$ 2019), pp. 221–234 (2019)
Google Scholar
Zhang, H., Chen, G., Ooi, B.C., Tan, K.L., Zhang, M.: In-memory big data management and processing: a survey. IEEE Trans. Knowl. Data Eng. 27(7), 1920–1948 (2015)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University Carlos III of Madrid, Leganes, Spain
Javier Garcia-Blas, David E. Singh & Jesus Carretero

Authors

Javier Garcia-Blas
View author publications
You can also search for this author in PubMed Google Scholar
David E. Singh
View author publications
You can also search for this author in PubMed Google Scholar
Jesus Carretero
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Javier Garcia-Blas .

Editor information

Editors and Affiliations

University of Tennessee, Knoxville, TN, USA
Hartwig Anzt
University of New Mexico, Albuquerque, NM, USA
Amanda Bienz
University of Tennessee, Knoxville, TN, USA
Piotr Luszczek
Université Paris-Saclay, Orsay, France
Marc Baboulin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Garcia-Blas, J., Singh, D.E., Carretero, J. (2022). IMSS: In-Memory Storage System for Data Intensive Applications. In: Anzt, H., Bienz, A., Luszczek, P., Baboulin, M. (eds) High Performance Computing. ISC High Performance 2022 International Workshops. ISC High Performance 2022. Lecture Notes in Computer Science, vol 13387. Springer, Cham. https://doi.org/10.1007/978-3-031-23220-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-23220-6_13
Published: 04 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23219-0
Online ISBN: 978-3-031-23220-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

IMSS: In-Memory Storage System for Data Intensive Applications