Skip to main content

IMSS: In-Memory Storage System for Data Intensive Applications

  • Conference paper
  • First Online:
High Performance Computing. ISC High Performance 2022 International Workshops (ISC High Performance 2022)

Abstract

Computer applications are growing in terms of data management requirements. In both scientific and engineering domains, high-performance computing clusters tend to experience bottlenecks in the I/O layer, limiting the scalability of data-intensive based applications. Thus, minimizing the number of cycles required by I/O operations constitutes a widely addressed challenge. In order to cope with that constraint, distributed in-memory store solutions provide a network-attached storage system using the compute nodes main memory as storage device. This solution provides a temporary but faster storage approach than those based on non-volatile memory like SSDs. This work presents a novel ad-hoc in-memory storage system focused on data management and data distribution, namely IMSS. Our solution accelerates both data and metadata management, taking advantage of ZeroMQ, a fast and flexible communication mechanism. One of the main contributions of IMSS is that it incorporates multiple distribution policies for both optimizing network performance and increasing load-balance. The experimental evaluation demonstrates that our proposal outperforms Redis, a well-known in-memory data structure store, outperforming Redis in both write and read data accesses.

This work was partially supported by the EU project “ASPIDE: Exascale Programming Models for Extreme Data Processing” under grant 801091. This work has been partially funded by the European Union’s Horizon 2020 under the ADMIRE project, grant Agreement number 956748-ADMIRE-H2020-JTI-EuroHPC-2019-1. This research was partially supported by Madrid regional Government (Spain) under the grant “Convergencia Big Data-HPC: de los sensores a las Aplicaciones. (CABAHLA-CM)”. Finally, this work was partially supported by the Spanish Ministry of Science and Innovation Project “New Data Intensive Computing Methods for High-End and Edge Computing Platforms (DECIDE)” Ref. PID2019-107858GB-I00.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.weka.io.

  2. 2.

    (https://gitlab.arcos.inf.uc3m.es/mandres/imss/blob/master/Middleware_Comparison.pdf).

  3. 3.

    https://gitlab.arcos.inf.uc3m.es/mandres/IMSS.

  4. 4.

    (https://cloud.google.com).

References

  1. Aghayev, A., Weil, S., Kuchnik, M., Nelson, M., Ganger, G.R., Amvrosiadis, G.: The case for custom storage backends in distributed storage systems. ACM Trans. Storage (TOS) 16(2), 1–31 (2020)

    Article  Google Scholar 

  2. Braam, P.J., Schwan, P.: Lustre: the intergalactic file system. In: Ottawa Linux Symposium, vol. 8, pp. 3429–3441 (2002)

    Google Scholar 

  3. Brinkmann, A., et al.: Ad hoc file systems for high-performance computing. J. Comput. Sci. Technol. 35(1), 4–26 (2020). https://doi.org/10.1007/s11390-020-9801-1

    Article  Google Scholar 

  4. Duro, F.R., Blas, J.G., Carretero, J.: A hierarchical parallel storage system based on distributed memory for large scale systems. In: Proceedings of the 20th European MPI Users’ Group Meeting, pp. 139–140 (2013)

    Google Scholar 

  5. Hintjens, P.: Zeromq: an open-source universal messaging library (2007). https://zeromq.org

  6. Isaila, F., Garcia, J., Carretero, J., Ross, R., Kimpe, D.: Making the case for reforming the I/O software stack of extreme-scale systems. Adv. Eng. Softw. 111, 26–31 (2017). https://doi.org/10.1016/j.advengsoft.2016.07.003, http://www.sciencedirect.com/science/article/pii/S0965997816301740, advances in High Performance Computing: on the path to Exascale software

  7. Kune, R., Konugurthi, P.K., Agarwal, A., Chillarige, R.R., Buyya, R.: The anatomy of big data computing. Softw. Pract. Exp. 46(1), 79–105 (2016)

    Google Scholar 

  8. Lauener, J., Sliwinski, W.: How to design & implement a modern communication middleware based on ZeroMQ. In: 16th International Conference on Accelerator and Large Experimental Physics Control Systems, p. MOBPL05 (2018). https://doi.org/10.18429/JACoW-ICALEPCS2017-MOBPL05

  9. Li, H.: Alluxio: A virtual distributed file system. Ph.D. thesis, UC Berkeley (2018)

    Google Scholar 

  10. Lu, Y., Shu, J., Chen, Y., Li, T.: Octopus: an rdma-enabled distributed persistent memory file system. In: 2017 \(\{\)USENIX\(\}\) Annual Technical Conference (\(\{\)USENIX\(\}\)\(\{\)ATC\(\}\) 2017), pp. 773–785 (2017)

    Google Scholar 

  11. Miller, E.L., Brandt, S.A., Long, D.D.: Hermes: high-performance reliable mram-enabled storage. In: Proceedings Eighth Workshop on Hot Topics in Operating Systems, pp. 95–99. IEEE (2001)

    Google Scholar 

  12. Narasimhamurthy, S., et al.: Sage: percipient storage for exascale data centric computing. Parallel Comput. 83, 22–33 (2019)

    Article  Google Scholar 

  13. Nishtala, R., et al.: Scaling memcache at facebook. In: Presented as part of the 10th \(\{\)USENIX\(\}\) Symposium on Networked Systems Design and Implementation (\(\{\)NSDI\(\}\) 2013), pp. 385–398 (2013)

    Google Scholar 

  14. Radulovic, M., Asifuzzaman, K., Carpenter, P., Radojković, P., Ayguadé, E.: HPC benchmarking: scaling right and looking beyond the average. In: Aldinucci, M., Padovani, L., Torquati, M. (eds.) Euro-Par 2018. LNCS, vol. 11014, pp. 135–146. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96983-1_10

    Chapter  Google Scholar 

  15. Sanfilippo, S., Noordhuis., P.: Redis (2009). https://redis.io

  16. Schmuck, F.B., Haskin, R.L.: GPFS: a shared-disk file system for large computing clusters. In: FAST, vol. 2 (2002)

    Google Scholar 

  17. Tirumala, A.: Iperf: the TCP/UDP bandwidth measurement tool (1999). http://dast.nlanr.net/Projects/Iperf/

  18. Vahi, K., Rynge, M., Juve, G., Mayani, R., Deelman, E.: Rethinking data management for big data scientific workflows. In: 2013 IEEE International Conference on Big Data, pp. 27–35. IEEE (2013)

    Google Scholar 

  19. Vef, M., et al.: Gekkofs - a temporary distributed file system for hpc applications. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), pp. 319–324 (2018)

    Google Scholar 

  20. Wang, T., Mohror, K., Moody, A., Sato, K., Yu, W.: An ephemeral burst-buffer file system for scientific applications. In: SC 2016: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 807–818 (2016)

    Google Scholar 

  21. Weil, S.A., Brandt, S.A., Miller, E.L., Maltzahn, C.: Crush: controlled, scalable, decentralized placement of replicated data. In: SC 2006: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, pp. 31–31. IEEE (2006)

    Google Scholar 

  22. Wiggins, A., Langston, J.: Enhancing the scalability of memcached. Intel document, unpublished (2012). http://software.intel.com/en-us/articles/enhancing-the-scalability-of-memcached

  23. Yang, J., Izraelevitz, J., Swanson, S.: Orion: a distributed file system for non-volatile main memory and rdma-capable networks. In: 17th \(\{\)USENIX\(\}\) Conference on File and Storage Technologies (\(\{\)FAST\(\}\) 2019), pp. 221–234 (2019)

    Google Scholar 

  24. Zhang, H., Chen, G., Ooi, B.C., Tan, K.L., Zhang, M.: In-memory big data management and processing: a survey. IEEE Trans. Knowl. Data Eng. 27(7), 1920–1948 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Javier Garcia-Blas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Garcia-Blas, J., Singh, D.E., Carretero, J. (2022). IMSS: In-Memory Storage System for Data Intensive Applications. In: Anzt, H., Bienz, A., Luszczek, P., Baboulin, M. (eds) High Performance Computing. ISC High Performance 2022 International Workshops. ISC High Performance 2022. Lecture Notes in Computer Science, vol 13387. Springer, Cham. https://doi.org/10.1007/978-3-031-23220-6_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23220-6_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23219-0

  • Online ISBN: 978-3-031-23220-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics