Skip to main content
Log in

A simulation provenance data management system for efficient job execution on an online computational science engineering platform

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

In the past few years an online simulation service platform (named EDISON) has been applauded by several computational science and engineering communities in several countries. Though armed with multiple computing clusters and high-end storage resources, the platform has suffered from handling a huge amount of CPU-/IO-bound simulations that are most duplicated. Such intense simulations are normally admitted with no duplicate elimination and thus can adversely affect the performance of the platform. To address this performance concern, we propose a novel system, termed SuperMan, to seamlessly record and retrieve the provenances of previously executed simulations, and so prevent users from initiating duplicate and/or similar simulations using the limited computing resources. The system collects the simulation provenances based on a variant of a de-facto standard form, thereby offering interoperability. Based on the stored provenances, the system can provide useful simulation run statistics for users that need assistance. SuperMan also applies a hash-based duplicate elimination technique, resulting in making more efficient simulations on the platform. Finally, we show that the proposed proposed system could remove slightly over half of duplicate simulations on a variety of simulation software while obtaining about overall elapsed time savings of 30% and queuing time savings of 25%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Or, science apps. In this article, both of the terms are interchangeably used.

  2. NCN was established in 2002 and is funded by the National Science Foundation (NSF) to support the National Nanotechnology Initiative (NNI).

References

  1. Suh, Y.-K., Ryu, H., Kim, H., Cho, K.W.: EDISON: a web-based HPC simulation execution framework for large-scale scientific computing software. In: Proceedings of IEEE/ACM 16th International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2016), pp. 608–612 (2016)

  2. Ma, J., Lee, J.R., Cho, K., Park, M.: Design and implementation of information management tools for the EDISON open platform. KSII Trans. Internet Inf. Syst. 11(2), 1089–1104 (2017)

    Google Scholar 

  3. EDISON: https://www.edison.re.kr/. Accessed 2 Jan 2018

  4. Liferay: Liferay Portal 6.2. https://web.liferay.com/products/liferay-portal/liferay-portal-6.2. Accessed 1 Jun 2017

  5. W3C: PROV-Overview. https://www.w3.org/TR /2013/NOTE-prov-overview-20130430/. Accessed 28 April 2017

  6. Moreau, L., Groth, P., Cheney, J., Lebo, T., Miles, S.: The rationale of PROV. J. Web Semant. 35(4), 235–257 (2015)

    Article  Google Scholar 

  7. Suh, Y.-K., Ma, J.: SuperMan: a novel system for storing and retrieving scientific-simulation provenance for efficient job executions on computing clusters. In: Proceedings of 2017 IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS*W), pp. 283–288 (2017)

  8. W3C: PROV-DM. https://www.w3.org/TR/prov-dm/. Accessed 20 April 2017

  9. ECMA: Standard ECMA-404: The JSON Data Interchange Format. https://www.ecma-international.org/publications/standards/Ecma-404.htm. Accessed 8 May 2017

  10. W3C: The PROV-JSON Serialization. https://www.w3.org/ Submission/2013/SUBM-prov-json-20130424/. Accessed 8 May 2017

  11. MongoDB: https://www.mongodb.com/. Accessed 5 Jan 2018

  12. Lee, K.Y., Suh, Y.-K., Cho, K.W.: Development of a simulation result management and prediction system using machine learning techniques. Int. J. Data Min. Bioinform. 19(1), 75–96 (2017)

    Article  Google Scholar 

  13. Schmidt, J., Polik, W.: WebMO Portal (Chemistry). http://www.webmo.net. Accessed 3 Jun 2017

  14. Hacker, T.J., et al.: The NEEShub cyberinfrastructure for earthquake engineering. Comput. Sci. Eng. 13(4), 6778 (2011)

    Article  Google Scholar 

  15. Klimeck, G., et al.: nanoHUB.org: advancing education and research in nanotechnology. Comput. Sci. Eng. 10(5), 17–23 (2008)

    Article  Google Scholar 

  16. The Network for Computational Nanotechnology (NCN): https://nanohub.org/groups/ncn. Accessed 5 Jan 2018

  17. McLennan, M., Kennell, R.: HUBzero: a platform for dissemination and collaboration in computational science and engineering. Comput. Sci. Eng. 12(2), 4853 (2010)

    Article  Google Scholar 

  18. Docan, C., Parashar, M., Klasky, S.: DataSpaces: an interaction and coordination framework for coupled simulation workflows. Cluster Comput. 15(2), 163–181 (2012)

    Article  Google Scholar 

  19. Mishin, D., Medvedev, D., Szalay, A.S., Plante, R., Graham, M.: Data sharing and publication using the SciDrive service. In: Proceedings of Astronomical Data Analysis Software and Systems XXIII, p. 465 (2014)

  20. Huang, J., Zhang, X., Eisenhauer, G., Schwan, K., Wolf, M., Ethier, S., Klasky, S.: Scibox: Online sharing of scientific data via the cloud. In: Proceedings of the 28th IEEE International Parallel & Distributed Processing Symposium, pp. 145–154 (2014)

  21. Univ. of Manchester and Univ. of Southampton: myExperiment. https://www.myexperiment.org/. Accessed 20 Jul 2017

Download references

Acknowledgements

This research was supported by the EDISON (EDucation-research Integration through Simulation On the Net) Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Science, ICT & Future Planning (No. NRF-2011-0020576). This study was also supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2018R1C1B6006409).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Kum Won Cho or Young-Kyoon Suh.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, J., Lee, S., Cho, K.W. et al. A simulation provenance data management system for efficient job execution on an online computational science engineering platform. Cluster Comput 22, 147–159 (2019). https://doi.org/10.1007/s10586-018-2827-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-018-2827-2

Keywords

Navigation