Skip to main content
Log in

JETS: Language and System Support for Many-Parallel-Task Workflows

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Many-task computing is a well-established paradigm for implementing loosely coupled applications (tasks) on large-scale computing systems. However, few of the model’s existing implementations provide efficient, low-latency support for executing tasks that are tightly coupled multiprocessing applications. Thus, a vast array of parallel applications cannot readily be used effectively within many-task workloads. In this work, we present JETS, a middleware component that provides high performance support for many-parallel-task computing (MPTC). JETS is based on a highly concurrent approach to parallel task dispatch and on new capabilities now available in the MPICH2 MPI implementation and the ZeptoOS Linux operating system. JETS represents an advance over the few known examples of multilevel many-parallel-task scheduling systems: it more efficiently schedules and launches many short-duration parallel application invocations; it overcomes the challenges of coupling the user processes of each multiprocessing application invocation via the messaging fabric; and it concurrently manages many application executions in various stages. We report here on the JETS architecture and its performance on both synthetic benchmarks and an MPTC application in molecular dynamics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abramson, D., Giddy, J., Kotler, L.: High performance parametric modeling with Nimrod/G: killer application for the global Grid. In: Proc. International Parallel and Distributed Processing Symposium (2000)

  2. Armstrong, T.G., Zhang, Z., Katz, D.S., Wilde, M., Foster, I.T.: Scheduling many-task workloads on supercomputers: dealing with trailing tasks. In: Proc. MTAGS Workshop at SC’10 (2010)

  3. Berman, F., Wolski, R., Casanova, H., Cirne, W., Dail, H., Faerman, M., Figueira, S., Hayes, J., Obertelli, G., Schopf, J., Shao, G., Smallen, S., Spring, N., Su, A., Zagorodnov, D.: Adaptive computing on the Grid using AppLeS. IEEE Trans. Parallel Distrib. Syst. 14(4), 369–382 (2003)

    Article  Google Scholar 

  4. Boker, S., Neale, M., Maes, H., Wilde, M., Spiegel, M., Brick, T., Spies, J., Estabrook, R., Kenny, S., Bates, T., Mehta, P., Fox, J.: OpenMx: an open source extended structural equation modeling framework. Psychometrika 76(2), 306–317 (2011)

    Article  MathSciNet  Google Scholar 

  5. Budnik, T., Knudson, B., Megerian, M., Miller, S., Mundy, M., Stockdell, W.: Blue Gene/Q resource management architecture. In: Proc. Workshop on Many-Task Computing on Grids and Supercomputers (2010)

  6. Chakraborty, P., Jha, S., Katz, D.S.: Novel submission modes for tightly coupled jobs across distributed resources for reduced time-to-solution. Phil. Trans. R. Soc. A, Math. Phys. Eng. Sci. 367(1897), 2545–2556 (2009)

    Google Scholar 

  7. Chiu, P.-H., Potekhin, M.: Pilot factory—a Condor-based system for scalable pilot job generation in the Panda WMS framework. J. Phys. Conf. Ser. 219, 062041 (2011)

    Article  Google Scholar 

  8. Cobalt web site. http://trac.mcs.anl.gov/projects/cobalt. Accessed 30 May 2013

  9. Cray Inc. Workload Management and Application Placement for the Cray Linux Environment: Document number S–2496–3103. Cray Inc., Chippewa Falls, WI, USA (2011)

  10. Czajkowski, K., Foster, I., Karonis, N., Kesselman, C., Martin, S., Smith, W., Tuecke, S.: A resource management architecture for metacomputing systems. Lect. Notes Comput. Sci. 1459, 62–82 (1998)

    Article  Google Scholar 

  11. DeBartolo, J., Hocky, G., Wilde, M., Xu, J., Freed, K.F., Sosnick, T.R.: Protein structure prediction enhanced with evolutionary diversity: speed. Protein Sci. 19(3), 520–534 (2010)

    Google Scholar 

  12. Dinan, J., Krishnamoorthy, S., Larkins, D.B., Nieplocha, J., Sadayappan, P.: Scioto: a framework for global-view task parallelism. In: Intl. Conf. on Parallel Processing, pp. 586–593 (2008)

  13. Fedorov, A., Clifford, B., Warfield, S.K., Kikinis, R., Chrisochoides, N.: Non-rigid registration for image-guided neurosurgery on the TeraGrid: a case study. Technical Report WM-CS-2009-05, College of William and Mary (2009)

  14. Foley, S.S., Elwasif, W.R., Shet, A.G., Bernholdt, D.E., Bramley, R.: Incorporating concurrent component execution in loosely coupled integrated fusion plasma simulation. In: Component-Based High-Performance Computing 2008 (2008)

  15. Foster, I.: What is the Grid? A three point checklist. GRIDToday 1(6) (2002)

  16. Foster, I., Kesselman, C. (eds.): The Grid: Blueprint for a New Computing Infrastructure, 1st edn. Morgan Kaufmann (1999)

  17. Frey, J., Tannenbaum, T., Foster, I., Livny, M., Tuecke, S.: Condor-G: a computation management agent for multi-institutional Grids. Cluster Comput. 5(3), 237–246 (2002)

    Article  Google Scholar 

  18. Hasson, U., Skipper, J.I., Wilde, M.J., Nusbaum, H.C., Small, S.L.: Improving the analysis, storage and sharing of neuroimaging data using relational databases and distributed computing. NeuroImage 39(2), 693–706 (2008)

    Article  Google Scholar 

  19. Hategan, M., Wozniak, J.M., Maheshwari, K.: Coasters: uniform resource provisioning and access for scientific computing on clouds and Grids. In: Proc. Utility and Cloud Computing (2011)

  20. Henderson, R.L., Tweten, D.: Portable batch system: requirement specification. Technical report, NAS Systems Division, NASA Ames Research Center (1998)

  21. Hocky, G., Wilde, M., DeBartolo, J., Hategan, M., Foster, I., Sosnick, T.R., Freed, K.F.: Towards petascale ab initio protein folding through parallel scripting. Technical Report ANL/MCS-P1612-0409, Argonne National Laboratory (2009)

  22. Kenny, S., Andric, M., Boker, S.M., Neale, M.C., Wilde, M., Small, S.L.: Parallel workflows for data-driven structural equation modeling in functional neuroimaging. Front. Neuroinform. 3(34) (2009). doi:10.3389%2Fneuro.11.034.2009

  23. Kernighan, B.W., Pike, R.: The UNIX Programming Environment. Prentice Hall (1984)

  24. Lee, S., Chen, Y., Luo, H., Wu, A.A., Wilde, M., Schumacker, P.T., Zhao, Y.: The first global screening of protein substrates bearing protein-bound 3,4-dihydroxyphenylalanine in Escherichia coli and human mitochondria. J. Proteome Res. 9(11), 5705–5714 (2010)

    Article  Google Scholar 

  25. Litzkow, M., Livny, M., Mutka, M.: Condor—a hunter of idle workstations. In: Proc. International Conference of Distributed Computing Systems (1988)

  26. Luckow, A., Lacinski, L., Jha, S.: SAGA BigJob: an extensible and interoperable pilot-job abstraction for distributed applications and systems. In: Proc. CCGrid (2010)

  27. Lusk, E.L., Pieper, S.C., Butler, R.M.: More scalability, less pain: a simple programming model and its implementation for extreme computing. SciDAC Rev. 17, 992056 (2010)

    Google Scholar 

  28. MPICH web site. http://www.mpich.org. Accessed 30 May 2013

  29. Nieplocha, J., Harrison, R.J., Littlefield, R.J.: Global arrays: a nonuniform memory access programming model for high-performance computers. J. Supercomputing 10(2), 1–17 (1996)

    Article  Google Scholar 

  30. NMA structure in the Protein Data Bank. http://www.rcsb.org/pdb/ligand/ligandsummary.do?hetId=NMA. Accessed 30 May 2013

  31. OpenSSH web site. http://www.openssh.com. Accessed 30 May 2013

  32. Phillips, J.C., Braun, R., Wang, W., Gumbart, J., Tajkhorshid, E., Villa, E., Chipot, C., Skeel, R.D., Kalé, L., Schulten, K.: Scalable molecular dynamics with NAMD. J. Comput. Chem. 26(16), 1781–1802 (2005)

    Article  Google Scholar 

  33. Raicu, I., Foster, I., Zhao, Y.: Many-task computing for Grids and supercomputers. In: Proc. Workshop on Many-Task Computing on Grids and Supercomputers (2008)

  34. Raicu, I., Zhang, Z., Wilde, M., Foster, I., Beckman, P., Iskra, K., Clifford, B.: Towards loosely-coupled programming on petascale systems. In: Proc. SC’08 (2008)

  35. Raicu, I., Zhao, Y., Foster, I.T., Szalay, A.: Accelerating large-scale data exploration through data diffusion. In: Proc. Workshop on Data-aware Distributed Computing (2008)

  36. Schmuck, F., Haskin, R.: GPFS: a shared-disk file system for large computing clusters. In: Proc. USENIX Conference on File and Storage Technologies (2002)

  37. Sfiligoi, I.: glideinWMS a generic pilot-based workload management system. J. Phys. Conf. Ser. 119(6), 062044 (2008)

    Article  Google Scholar 

  38. Stef-Praun, T., Clifford, B., Foster, I., Hasson, U., Hategan, M., Small, S.L., Wilde, M., Zhao, Y.: Accelerating medical research using the Swift workflow system. Stud. Health Technol. Inform. 126, 207–216 (2007)

    Google Scholar 

  39. Stef-Praun, T., Madeira, G.A., Foster, I., Townsend, R.: Accelerating solution of a moral hazard problem with Swift. In: e-Social Science 2007, Indianapolis (2007)

  40. Sugita, Y., Okamoto, Y.: Replica-exchange molecular dynamics method for protein folding. Chem. Phys. Lett. 314(1–2), 141–151 (1999)

    Article  Google Scholar 

  41. Sun Grid Engine web site. http://www.oracle.com/technetwork/oem/grid-engine-166852.html. Accessed 30 May 2013

  42. Thain, D., Tannenbaum, T., Livny, M.: Distributed computing in practice: the Condor experience. Concurrency Computat. Pract. Exper. 17(2–4), 325–356 (2005)

    Google Scholar 

  43. Thota, A., Luckow, A., Jha, S.: Efficient large-scale replica-exchange simulations on production infrastructure. Phil. Trans. R. Soc. Lond. A 369(1949), 3318–3335 (2011)

    Google Scholar 

  44. Top 500 web site. http://www.top500.org. Accessed 30 May 2013

  45. Using the Hydra process manager. https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager. Accessed 30 May 2013

  46. von Laszewski, G., Foster, I., Gawor, J., Lane, P.: A Java commodity Grid kit. Concurrency Computat. Pract. Exper. 13(8–9), 645–662 (2001)

    Google Scholar 

  47. Wibisono, A., Zhao, Z., Belloum, A., Bubak, M.: A framework for interactive parameter sweep applications. In: Bubak, M., van Albada, G., Dongarra, J., Sloot, P. (eds.) Computational Science—ICCS 2008. Lecture Notes in Computer Science, vol. 5103. Springer, Berlin/Heidelberg (2008)

  48. Wilde, M., Foster, I., Iskra, K., Beckman, P., Zhang, Z., Espinosa, A., Hategan, M., Clifford, B., Raicu, I.: Parallel scripting for applications at the petascale and beyond. Computer 42(11), 50–60 (2009)

    Article  Google Scholar 

  49. Wilde, M., Hategan, M., Wozniak, J.M., Clifford, B., Katz, D.S., Foster, I.: Swift: a language for distributed parallel scripting. Parallel Comput. 37(9), 633–652 (2011)

    Article  Google Scholar 

  50. Wozniak, J.M., Wilde, M.: Case studies in storage access by loosely coupled petascale applications. In: Proc. Petascale Data Storage Workshop at SC’09 (2009)

  51. Wozniak, J.M., Jacobs, B., Latham, R., Lang, S., Son, S.W., Ross, R.: Implementing reliable data structures for MPI services in high component count systems. In: Recent Advances in Parallel Virtual Machine and Message Passing Interface. Lecture Notes in Computer Science, vol. 5759. Springer (2009)

  52. Zhang, Z., Espinosa, A., Iskra, K., Raicu, I., Foster, I., Wilde, M.: Design and evaluation of a collective I/O model for loosely-coupled petascale programming. In: Proc. MTAGS Workshop at SC’08 (2008)

  53. Zhao, Y., Hategan, M., Clifford, B., Foster, I., von Laszewski, G., Raicu, I., Stef-Praun, T., Wilde, M.: Swift: Fast, reliable, loosely coupled parallel computation. In: Proc. Workshop on Scientific Workflows (2007)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Justin M. Wozniak.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wozniak, J.M., Wilde, M. & Katz, D.S. JETS: Language and System Support for Many-Parallel-Task Workflows. J Grid Computing 11, 341–360 (2013). https://doi.org/10.1007/s10723-013-9259-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-013-9259-2

Keywords

Navigation