Skip to main content

DICE: Generic Data Abstraction for Enhancing the Convergence of HPC and Big Data

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1540))

Abstract

Today’s data-intensive applications require access to multiple types of storage platforms, such as parallel file systems, distributed file systems, and in-memory data systems. In addition, many applications are demanding the processing of data streams. The goal is to develop mechanisms to integrate and hide the diversity of data sources from applications and improve data access performance.

In this work, we propose the implementation of a data container-based solution for data-intensive applications, which provides a high-level programming interface to different storage systems, commonly used in both HPC and HPDA environments. Our approach, DICE, hides the complexity of dealing with data from multiple sources, reducing the effort to access items transparently to end users and developers. The abstraction is based on a series of plugins that facilitate extension to other existing file systems.

This work was supported by the EU project “ASPIDE: Exascale Programming Models for Extreme Data Processing” under grant 801091. This research was partially supported by Madrid regional Government (Spain) under the grant “Convergencia Big Data-HPC: de los sensores a las Aplicaciones. (CABAHLA-CM)”. Finally, this work was partially supported by the Spanish Ministry of Science and Innovation Project “New Data Intensive Computing Methods for High-End and Edge Computing Platforms (DECIDE)” Ref. PID2019-107858GB-I00.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Source code and examples available at https://gitlab.arcos.inf.uc3m.es/pbrox/dice.git.

References

  1. Abbasi, H., Lofstead, J., Zheng, F., Schwan, K., Wolf, M., Klasky, S.: Extending I/O through high performance data services. In: 2009 IEEE International Conference on Cluster Computing and Workshops, pp. 1–10. IEEE (2009)

    Google Scholar 

  2. Alves, M.M., de Assumpção Drummond, L.M.: A multivariate and quantitative model for predicting cross-application interference in virtual environments. J. Syst. Softw. 128, 150–163 (2017)

    Google Scholar 

  3. Caíno-Lores, S., Lapin, A., Carretero, J., Kropf, P.: Applying big data paradigms to a large scale scientific workflow: lessons learned and future directions. Future Gener. Comput. Syst. 110, 440–452 (2018)

    Google Scholar 

  4. Carretero, J., Jeannot, E., Pallez, G., Singh, D.E., Vidal, N.: Mapping and scheduling HPC applications for optimizing I/O. In: Ayguadé, E., Hwu, W.W., Badia, R.M., Hofstee, H.P. (eds.) ICS 2020: 2020 International Conference on Supercomputing, pp. 33:1–33:12. ACM, Barcelona, Spain (2020)

    Google Scholar 

  5. Carretero, J., Zomaya, A.Y., Jeannot, E.: Ultrascale Computing Systems. Institution of Engineering and Technology (2019)

    Google Scholar 

  6. Dorier, M., Antoniu, G., Ross, R., Kimpe, D., Ibrahim, S.: CALCioM: mitigating I/O interference in HPC systems through cross-application coordination. In: IPDPS - International Parallel and Distributed Processing Symposium, pp. 155–164. Phoenix, United States (2014)

    Google Scholar 

  7. Fedak, G., He, H., Cappello, F.: Bitdew: a data management and distribution service with multi-protocol file transfer and metadata abstraction. J. Network Comput. Appl. 32(5), 961–975 (2009). (Next Generation Content Networks)

    Google Scholar 

  8. Gropp, W., Thakur, R., Lusk, E.: Using MPI-2: Advanced Features of the Message Passing Interface. MIT press (1999)

    Google Scholar 

  9. Isaila, F., Carretero, J., Ross, R.B.: CLARISSE: a middleware for data-staging coordination and control on large-scale HPC platforms. In: IEEE/ACM 16th International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2016, Cartagena, Colombia, 16–19 May 2016, pp. 346–355. IEEE Computer Society (2016)

    Google Scholar 

  10. Liu, Q., et al.: Hello adios: the challenges and lessons of developing leadership class i/o frameworks. Concurr. Comput. Pract. Exper. 26(7), 1453–1473 (2014)

    Google Scholar 

  11. Llopis, P., Blas, J., Isaila, F., Carretero, J.: VIDAS: object-based virtualized data sharing for high performance storage I/O. In: Proceedings of the 4th ACM Workshop on Scientific Cloud Computing, Science Cloud 2013, pp. 37–44. Association for Computing Machinery, New York, NY, USA (2013)

    Google Scholar 

  12. Sousa, L., Kropf, P., Kuonen, P., Prodan, R., Trinh, A.T., Carretero, J.: A Roadmap for Research in Sustainable Ultrascale Systems. Carlos III University of Madrid (2018)

    Google Scholar 

  13. Thapaliya, S., Bangalore, P., Lofstead, J., Mohror, K., Moody, A.: Managing I/O interference in a shared burst buffer system. In: 2016 45th International Conference on Parallel Processing (ICPP), pp. 416–425 (2016)

    Google Scholar 

  14. Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, pp. 15–28. NSDI (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Javier Garcia-Blas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Brox, P., Garcia-Blas, J., Singh, D.E., Carretero, J. (2022). DICE: Generic Data Abstraction for Enhancing the Convergence of HPC and Big Data. In: Gitler, I., Barrios Hernández, C.J., Meneses, E. (eds) High Performance Computing. CARLA 2021. Communications in Computer and Information Science, vol 1540. Springer, Cham. https://doi.org/10.1007/978-3-031-04209-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-04209-6_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-04208-9

  • Online ISBN: 978-3-031-04209-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics