Skip to main content

Advertisement

Log in

Adaptive Executions of Multi-Physics Coupled Applications on Batch Grids

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Long running multi-physics coupled parallel applications have gained prominence in recent years. The high computational requirements and long durations of simulations of these applications necessitate the use of multiple systems of a Grid for execution. In this paper, we have built an adaptive middleware framework for execution of long running multi-physics coupled applications across multiple batch systems of a Grid. Our framework, apart from coordinating the executions of the component jobs of an application on different batch systems, also automatically resubmits the jobs multiple times to the batch queues to continue and sustain long running executions. As the set of active batch systems available for execution changes, our framework performs migration and rescheduling of components using a robust rescheduling decision algorithm. We have used our framework for improving the application throughput of a foremost long running multi-component application for climate modeling, the Community Climate System Model (CCSM). Our real multi-site experiments with CCSM indicate that Grid executions can lead to improved application throughput for climate models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Coveney, P., Fabritiis, G.D., Harvey, M., Pickles, S., Porter, A.: On steering coupled models. In: e-Science All Hands Meeting (2005)

  2. Larson, J., Jacob, R., Ong, E.: The model coupling toolkit: a new Fortran90 toolkit for building multiphysics parallel coupled models. Int. J. High Perform. Comput. Appl. 19, 277–292 (2005)

    Article  Google Scholar 

  3. Delgado-Buscalioni, R., Coveney, P., Riley, G., Ford, R.: Hybrid molecular-continuum fluid models: implementation within a general coupling framework. Philos. Trans. R. Soc. Lond. A 363, 1833 (2005)

    Article  MathSciNet  Google Scholar 

  4. TeraGrid: http://www.teragrid.org. Accessed Sept 2011

  5. UK e-Science: http://www.rcuk.ac.uk/escience/default.htm. Accessed Sept 2011

  6. Community Climate System Model (CCSM): http://www.ccsm.ucar.edu. Accessed Sept 2011

  7. Collins, W., Bitz, C., Blackmon, L., Bonan, G., Bretherton, C., Carton, J., Chang, P., Doney, S., Hack, J., Henderson, T., Kiehl, J., Large, W., McKenna, D., Santer, B., Smith, R.: The community climate system model version 3: CCSM3. J. Climate 19(11), 2122–2143 (2006)

    Article  Google Scholar 

  8. Ccsm user guide: http://www.cesm.ucar.edu/models/ccsm3.0/ccsm/doc/UsersGuide/UsersGuide.pdf. Accessed Sept 2011

  9. Gabriel, E., Resch, M., Beisel, T., Keller, R.: Distributed computing in a heterogenous computing environment. In: EuroPVMMPI’98 (1998)

  10. Park, K., Park, S., Kwon, O., Park, H.: MPICH-GP: a private-IP-enabled MPI over Grid environments. In: Proc. of Second International Symposium on Parallel and Distributed Processing and Applications, ISPA04, Hong Kong, China, pp. 469–473 (2004)

  11. Smith, W., Taylor, V., Foster, I.: Using run-time predictions to estimate queue wait times and improve scheduler performance. In: Job Scheduling Strategies for Parallel Processing (JSSPP), pp. 202–219 (1999)

  12. Brevik, J., Nurmi, D., Wolski, R.: Predicting bounds on queuing delay for batch-scheduled parallel machines. In: PPoPP ’06: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 110–118 (2006)

  13. The National Center for Atmospheric Research (NCAR): http://www.ncar.ucar.edu. Accessed Sept 2011

  14. Lublin, U., Feitelson, D.: The workload on parallel supercomputers: modeling the characteristics of rigid jobs. J. Parallel Distrib. Comput. 63(11), 1105–1122 (2003)

    Article  MATH  Google Scholar 

  15. Lee, B., Brooks, D., de Supinski, B., Schulz, M., Singh, K., McKee, S.: Methods of inference and learning for performance modeling of parallel applications. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Jose, CA (2007)

  16. Yang, L., Ma, X., Mueller, F.: Cross-platform performance prediction of parallel applications using partial execution. In: SC ’05: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, p. 40 (2005)

  17. Parallel Climate Model (PCM): http://www.cgd.ucar.edu/pcm. Accessed Sept 2011

  18. Skamarock, W., Klemp, J., Dudhia, J., Gill, D., Barker, D., Wang, W., Powers, J.: A description of the advanced research WRF version 2. NCAR, Tech. Rep. Technical Note (2005)

  19. Lefantzi, S., Ray, J.: A component-based scientific toolkit for reacting flows. In: Proc. Second MIT Conference on Computational Fluid and Solid Mechanics, pp. 1401–1405 (2003)

  20. ANSYS FLUENT: http://www.ansys.com/products/fluid-dynamics/fluent/default.asp. Accessed Sept 2011

  21. Vadhiyar, S., Dongarra, J.: SRS—a framework for developing malleable and migratableparallel applications for distributed systems. Parallel Process. Lett. 13(2), 291–312 (2003)

    Article  MathSciNet  Google Scholar 

  22. Fernandes, R., Pingali, K., Stodghill, P.: Mobile MPI programs in computational Grids. In: PPoPP ’06: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 22–31 (2006)

  23. WS Resource Framework: http://www.globus.org/wsrf. Accessed Sept 2011

  24. Czajkowski, K., Foster, I., Kesselman, C.: Agreement-based resource management. Proc. IEEE 93(3), 631–643 (2005)

    Article  Google Scholar 

  25. Markatchev, N., Kiddle, C., Simmonds, R.: A framework for executing long running jobs in Grid environments. In: HPCS ’08: Proceedings of the 22nd International Symposium on High Performance Computing Systems and Applications, pp. 69–75 (2008)

  26. Sarkar, A.D., Roy, S., Ghosh, D., Mukhopadhyay, R., Mukherjee, N.: An adaptive execution scheme for achieving guaranteed performance in computational Grids. J. Grid Computing 8(1), 109–131 (2010)

    Article  Google Scholar 

  27. de O. Lucchese, F., Yero, E., Sambatti, F., Henriques, M.: An adaptive scheduler for Grids. J. Grid Computing 4(1), 1–17 (2006)

    Article  Google Scholar 

  28. Bucur, A., Epema, D.: Scheduling policies for processor coallocation in multicluster systems. IEEE Trans. Parallel Distrib. Syst. 18(7), 958–972 (2007)

    Article  Google Scholar 

  29. Buisson, J., Sonmez, O., Mohamed, H., Lammers, W., Epema, D.: Scheduling malleable applications in multicluster systems. In: CLUSTER ’07: Proceedings of the 2007 IEEE International Conference on Cluster Computing, pp. 372–381 (2007)

  30. Casanova, H.: Benefits and drawbacks of redundant batch requests. J. Grid Computing 5(2), 235–250 (2007)

    Article  MathSciNet  Google Scholar 

  31. Ko, S.-H., Kim, N., Kim, J., Thota, A., Jha, S.: Efficient runtime environment for coupled multi-physics simulations: dynamic resource allocation and load-balancing. In: CCGRID 2010: Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, pp. 349–358 (2010)

  32. Yu, J., Buyya, R.: A taxonomy of workflow management systems for Grid computing. J. Grid Computing 3(3–4), 171–200 (2005)

    Article  Google Scholar 

  33. Nurmi, D., Mandal, A., Brevik, J., Koelbel, C., Wolski, R., Kennedy, K.: Evaluation of a workflow scheduler using integrated performance modelling and batch queue wait time prediction. In: SC ’06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p. 119 (2006)

  34. Kim, H., el-Khamra, Y., Rodero, I., Jha, S., Parashar, M.: Autonomic management of application workflows on hybrid computing infrastructure. Sci. Program. 19(2–3), 75–89 (2011)

    Google Scholar 

  35. Zhang, X., Freschl, J., Schopf, J.: A performance study of monitoring and information services for distributed systems. In: HPDC ’03: Proceedings of the 12th IEEE International Symposiumon High Performance Distributed Computing, p. 270 (2003)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sivagama Sundari Murugavel.

Additional information

This work is supported partly by Ministry of Information Technology, India, project ref. no. DIT/R&D/C-DAC/2(10)/2006 DT.30/04/07 and partly by Department of Science and Technology, India, project ref no. SR/S3/EECE/59/2005/8.6.06.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Murugavel, S.S., Vadhiyar, S.S. & Nanjundiah, R.S. Adaptive Executions of Multi-Physics Coupled Applications on Batch Grids. J Grid Computing 9, 455–478 (2011). https://doi.org/10.1007/s10723-011-9197-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-011-9197-9

Keywords

Navigation