Skip to main content
Log in

DataSpaces: an interaction and coordination framework for coupled simulation workflows

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Emerging high-performance distributed computing environments are enabling new end-to-end formulations in science and engineering that involve multiple interacting processes and data-intensive application workflows. For example, current fusion simulation efforts are exploring coupled models and codes that simultaneously simulate separate application processes, such as the core and the edge turbulence. These components run on different high performance computing resources, need to interact at runtime with each other and with services for data monitoring, data analysis and visualization, and data archiving. As a result, they require efficient and scalable support for dynamic and flexible couplings and interactions, which remains a challenge. This paper presents DataSpaces a flexible interaction and coordination substrate that addresses this challenge. DataSpaces essentially implements a semantically specialized virtual shared space abstraction that can be associatively accessed by all components and services in the application workflow. It enables live data to be extracted from running simulation components, indexes this data online, and then allows it to be monitored, queried and accessed by other components and services via the space using semantically meaningful operators. The underlying data transport is asynchronous, low-overhead and largely memory-to-memory. The design, implementation, and experimental evaluation of DataSpaces using a coupled fusion simulation workflow is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abbasi, H., Wolf, M., Eisenhauer, G., Klasky, S., Schwan, K., Zheng, F.: DataStager: scalable data staging services for petascale applications. In: Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing (HPDC’09), June (2009)

    Google Scholar 

  2. Armstrong, R., Gannon, D., Geist, A., Keahey, K., Kohn, S., McInnes, L., Parker, S., Smolinski, B.: Toward a common component architecture for high-performance scientific computing. In: Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing, August (1999)

    Google Scholar 

  3. Bertrand, F., Bramley, R.: DCA: a distributed CCA framework based on MPI. In: Proceedings of the 9th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS’04), April (2004)

    Google Scholar 

  4. Beynon, M.D., Ferreira, R., Kurc, T., Sussman, A., Saltz, J.: DataCutter: middleware for filtering very large scientific datasets on archival storage systems. In: Proceedings of Mass Storage Systems Conference, March (2000)

    Google Scholar 

  5. Bially, T.: A class of dimension changing mapping and its application to bandwidth compression. PhD thesis, Polytechnic Institute of Brooklyn, June (1976)

  6. Bonachea, D., Hargrove, P., Welcome, M., Yelick, K.: Porting GASNet to portals: partitioned global address space (PGAS) language support for the cray XT. In: Cray User Group (CUG’09), May (2009)

    Google Scholar 

  7. Carriero, N., Gelernter, D.: Linda in context. Commun. ACM 32(4), 444–458 (1989)

    Article  Google Scholar 

  8. Chang, C.S., Ku, S., Weitzner, H.: Numerical study of neoclassical plasma pedestal in a Tokamak geometry. Phys. Plasmas 11(5), 2649–2667 (2004)

    Article  Google Scholar 

  9. Chapman, B., Zima, H., Haines, M., Mehrotra, P., Rosendale, J.V.: Opus: a coordination language for multidisciplinary applications. J. Sci. Program. 6(4), 345–362 (1997)

    Google Scholar 

  10. Cummings, J.: Plasma edge kinetic-mhd modeling in Tokamaks using Kepler workflow for code coupling, data management and visualization. Commun. Comput. Phys. 4, 675–702 (2008)

    Google Scholar 

  11. del Rosario, J.M., Choudhary, A.N.: High-performance I/O for massively parallel computers: problems and prospects. Computer 27(3), 59–68 (1994)

    Article  Google Scholar 

  12. Docan, C., Parashar, M., Klasky, S.: DART: a substrate for high speed asynchronous data IO. In: Proceedings of the 17th International Symposium on High Performance Distributed Computing (HPDC’08), June (2008)

    Google Scholar 

  13. Docan, C., Parashar, M., Klasky, S.: Enabling high speed asynchronous data extraction and transfer using DART. Concur. Comput. Pract. Exp. (2010)

  14. Docan, C., Zhang, F., Parashar, M., Cummings, J., Podhorszki, N., Klasky, S.: Experiments with memory-to-memory coupling for end-to-end fusion simulation workflows. In: Proceedings of the 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid’10), May (2010)

    Google Scholar 

  15. Foster, I., Chandy, M.: Fortran M: a language for modular parallel programming. J. Parallel Distrib. Comput. 26(1), 24–35 (1995)

    Article  MATH  Google Scholar 

  16. HPF Language Specification, Version 2.0, January (1984). http://www.netlib.org/hpf/hpf-v20-final.ps.gz

  17. Jacob, R., Larson, J., Ong, E.: M×N communication and parallel interpolation in CCSM3 using the model coupling toolkit. Int. J. High Perform. Comput. Appl. 19(3), 293–307 (2005)

    Article  Google Scholar 

  18. Jacob, R., Larson, J., Ong, E.: The model coupling toolkit: a new Fortran90 toolkit for building multiphysics parallel coupled models. Int. J. High Perform. Comput. Appl. 19(3), 277–292 (2005)

    Article  Google Scholar 

  19. http://info.nccs.gov/resources/jaguar (2011)

  20. Kotz, D.: Disk-directed I/O for MIMD multiprocessors. ACM Trans. Comput. Syst. 15(1), 41–74 (1997)

    Article  MathSciNet  Google Scholar 

  21. Park, W., Belova, E.V., Fu, G.Y., Tang, X.Z., Strauss, H.R., Sugiyama, L.E.: Plasma simulation studies using multilevel physics models. Phys. Plasmas 6(5), 1796–1803 (1999)

    Article  Google Scholar 

  22. Seamons, K.E., Chen, Y., Jones, P., Jozwiak, J., Winslett, M.: Server-directed collective I/O in Panda. In: Supercomputing Conference (SC’95), December, p. 57 (1995)

    Google Scholar 

  23. Sterck, H.D., Markel, R.S., Pohl, T., Rude, U.: A lightweight Java taskspaces framework for scientific computing on computational grids. In: Proceedings of the 18th Annual ACM Symposium on Applied Computing, March (2003)

    Google Scholar 

  24. Tilak, S., Hubbard, P., Miller, M., Fountain, T.: The ring buffer network bus (RBNB) DataTurbine streaming data middleware for environmental observing systems. In: International Conference on High Performance Computing(HiPC’07), December (2007)

    Google Scholar 

  25. Vaidyanathan, K., Narravula, S., Panda, D.K.: DDSS: a low-overhead distributed data sharing substrate for cluster-based data-centers over modern interconnects. In: Int’l Symposium on High Performance Computing (HiPC’06), December (2006)

    Google Scholar 

  26. Wilson, H.R., Snyder, P.B., Huysmans, G.T.A., Miller, R.L.: Numerical studies of edge localized instabilities in Tokamaks. Phys. Plasmas 9(4), 1277–1286 (2002)

    Article  Google Scholar 

  27. Youssef, M., Yousif, A., El-Sheimy, N., Noureldin, A.: A novel Earthquake warning system based on virtual MIMO-wireless sensor networks. In: Proceedings of Canadian Conference on Electrical and Computer Engineering (CCECE’07), April (2007)

    Google Scholar 

  28. Zhang, L., Parashar, M.: A dynamic geometry-based shared space interaction framework for parallel scientific applications. In: Proceedings of the 11th Annual International Conference on High Performance Computing (HiPC’04), December (2004)

    Google Scholar 

  29. Zhang, L., Parashar, M.: Enabling efficient and flexible coupling of parallel scientific applications. In: Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium (IPDPS’06). April (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ciprian Docan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Docan, C., Parashar, M. & Klasky, S. DataSpaces: an interaction and coordination framework for coupled simulation workflows. Cluster Comput 15, 163–181 (2012). https://doi.org/10.1007/s10586-011-0162-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-011-0162-y

Keywords

Navigation