Abstract
Many-task computing is a well-established paradigm for implementing loosely coupled applications (tasks) on large-scale computing systems. However, few of the model’s existing implementations provide efficient, low-latency support for executing tasks that are tightly coupled multiprocessing applications. Thus, a vast array of parallel applications cannot readily be used effectively within many-task workloads. In this work, we present JETS, a middleware component that provides high performance support for many-parallel-task computing (MPTC). JETS is based on a highly concurrent approach to parallel task dispatch and on new capabilities now available in the MPICH2 MPI implementation and the ZeptoOS Linux operating system. JETS represents an advance over the few known examples of multilevel many-parallel-task scheduling systems: it more efficiently schedules and launches many short-duration parallel application invocations; it overcomes the challenges of coupling the user processes of each multiprocessing application invocation via the messaging fabric; and it concurrently manages many application executions in various stages. We report here on the JETS architecture and its performance on both synthetic benchmarks and an MPTC application in molecular dynamics.
Similar content being viewed by others
References
Abramson, D., Giddy, J., Kotler, L.: High performance parametric modeling with Nimrod/G: killer application for the global Grid. In: Proc. International Parallel and Distributed Processing Symposium (2000)
Armstrong, T.G., Zhang, Z., Katz, D.S., Wilde, M., Foster, I.T.: Scheduling many-task workloads on supercomputers: dealing with trailing tasks. In: Proc. MTAGS Workshop at SC’10 (2010)
Berman, F., Wolski, R., Casanova, H., Cirne, W., Dail, H., Faerman, M., Figueira, S., Hayes, J., Obertelli, G., Schopf, J., Shao, G., Smallen, S., Spring, N., Su, A., Zagorodnov, D.: Adaptive computing on the Grid using AppLeS. IEEE Trans. Parallel Distrib. Syst. 14(4), 369–382 (2003)
Boker, S., Neale, M., Maes, H., Wilde, M., Spiegel, M., Brick, T., Spies, J., Estabrook, R., Kenny, S., Bates, T., Mehta, P., Fox, J.: OpenMx: an open source extended structural equation modeling framework. Psychometrika 76(2), 306–317 (2011)
Budnik, T., Knudson, B., Megerian, M., Miller, S., Mundy, M., Stockdell, W.: Blue Gene/Q resource management architecture. In: Proc. Workshop on Many-Task Computing on Grids and Supercomputers (2010)
Chakraborty, P., Jha, S., Katz, D.S.: Novel submission modes for tightly coupled jobs across distributed resources for reduced time-to-solution. Phil. Trans. R. Soc. A, Math. Phys. Eng. Sci. 367(1897), 2545–2556 (2009)
Chiu, P.-H., Potekhin, M.: Pilot factory—a Condor-based system for scalable pilot job generation in the Panda WMS framework. J. Phys. Conf. Ser. 219, 062041 (2011)
Cobalt web site. http://trac.mcs.anl.gov/projects/cobalt. Accessed 30 May 2013
Cray Inc. Workload Management and Application Placement for the Cray Linux Environment: Document number S–2496–3103. Cray Inc., Chippewa Falls, WI, USA (2011)
Czajkowski, K., Foster, I., Karonis, N., Kesselman, C., Martin, S., Smith, W., Tuecke, S.: A resource management architecture for metacomputing systems. Lect. Notes Comput. Sci. 1459, 62–82 (1998)
DeBartolo, J., Hocky, G., Wilde, M., Xu, J., Freed, K.F., Sosnick, T.R.: Protein structure prediction enhanced with evolutionary diversity: speed. Protein Sci. 19(3), 520–534 (2010)
Dinan, J., Krishnamoorthy, S., Larkins, D.B., Nieplocha, J., Sadayappan, P.: Scioto: a framework for global-view task parallelism. In: Intl. Conf. on Parallel Processing, pp. 586–593 (2008)
Fedorov, A., Clifford, B., Warfield, S.K., Kikinis, R., Chrisochoides, N.: Non-rigid registration for image-guided neurosurgery on the TeraGrid: a case study. Technical Report WM-CS-2009-05, College of William and Mary (2009)
Foley, S.S., Elwasif, W.R., Shet, A.G., Bernholdt, D.E., Bramley, R.: Incorporating concurrent component execution in loosely coupled integrated fusion plasma simulation. In: Component-Based High-Performance Computing 2008 (2008)
Foster, I.: What is the Grid? A three point checklist. GRIDToday 1(6) (2002)
Foster, I., Kesselman, C. (eds.): The Grid: Blueprint for a New Computing Infrastructure, 1st edn. Morgan Kaufmann (1999)
Frey, J., Tannenbaum, T., Foster, I., Livny, M., Tuecke, S.: Condor-G: a computation management agent for multi-institutional Grids. Cluster Comput. 5(3), 237–246 (2002)
Hasson, U., Skipper, J.I., Wilde, M.J., Nusbaum, H.C., Small, S.L.: Improving the analysis, storage and sharing of neuroimaging data using relational databases and distributed computing. NeuroImage 39(2), 693–706 (2008)
Hategan, M., Wozniak, J.M., Maheshwari, K.: Coasters: uniform resource provisioning and access for scientific computing on clouds and Grids. In: Proc. Utility and Cloud Computing (2011)
Henderson, R.L., Tweten, D.: Portable batch system: requirement specification. Technical report, NAS Systems Division, NASA Ames Research Center (1998)
Hocky, G., Wilde, M., DeBartolo, J., Hategan, M., Foster, I., Sosnick, T.R., Freed, K.F.: Towards petascale ab initio protein folding through parallel scripting. Technical Report ANL/MCS-P1612-0409, Argonne National Laboratory (2009)
Kenny, S., Andric, M., Boker, S.M., Neale, M.C., Wilde, M., Small, S.L.: Parallel workflows for data-driven structural equation modeling in functional neuroimaging. Front. Neuroinform. 3(34) (2009). doi:10.3389%2Fneuro.11.034.2009
Kernighan, B.W., Pike, R.: The UNIX Programming Environment. Prentice Hall (1984)
Lee, S., Chen, Y., Luo, H., Wu, A.A., Wilde, M., Schumacker, P.T., Zhao, Y.: The first global screening of protein substrates bearing protein-bound 3,4-dihydroxyphenylalanine in Escherichia coli and human mitochondria. J. Proteome Res. 9(11), 5705–5714 (2010)
Litzkow, M., Livny, M., Mutka, M.: Condor—a hunter of idle workstations. In: Proc. International Conference of Distributed Computing Systems (1988)
Luckow, A., Lacinski, L., Jha, S.: SAGA BigJob: an extensible and interoperable pilot-job abstraction for distributed applications and systems. In: Proc. CCGrid (2010)
Lusk, E.L., Pieper, S.C., Butler, R.M.: More scalability, less pain: a simple programming model and its implementation for extreme computing. SciDAC Rev. 17, 992056 (2010)
MPICH web site. http://www.mpich.org. Accessed 30 May 2013
Nieplocha, J., Harrison, R.J., Littlefield, R.J.: Global arrays: a nonuniform memory access programming model for high-performance computers. J. Supercomputing 10(2), 1–17 (1996)
NMA structure in the Protein Data Bank. http://www.rcsb.org/pdb/ligand/ligandsummary.do?hetId=NMA. Accessed 30 May 2013
OpenSSH web site. http://www.openssh.com. Accessed 30 May 2013
Phillips, J.C., Braun, R., Wang, W., Gumbart, J., Tajkhorshid, E., Villa, E., Chipot, C., Skeel, R.D., Kalé, L., Schulten, K.: Scalable molecular dynamics with NAMD. J. Comput. Chem. 26(16), 1781–1802 (2005)
Raicu, I., Foster, I., Zhao, Y.: Many-task computing for Grids and supercomputers. In: Proc. Workshop on Many-Task Computing on Grids and Supercomputers (2008)
Raicu, I., Zhang, Z., Wilde, M., Foster, I., Beckman, P., Iskra, K., Clifford, B.: Towards loosely-coupled programming on petascale systems. In: Proc. SC’08 (2008)
Raicu, I., Zhao, Y., Foster, I.T., Szalay, A.: Accelerating large-scale data exploration through data diffusion. In: Proc. Workshop on Data-aware Distributed Computing (2008)
Schmuck, F., Haskin, R.: GPFS: a shared-disk file system for large computing clusters. In: Proc. USENIX Conference on File and Storage Technologies (2002)
Sfiligoi, I.: glideinWMS a generic pilot-based workload management system. J. Phys. Conf. Ser. 119(6), 062044 (2008)
Stef-Praun, T., Clifford, B., Foster, I., Hasson, U., Hategan, M., Small, S.L., Wilde, M., Zhao, Y.: Accelerating medical research using the Swift workflow system. Stud. Health Technol. Inform. 126, 207–216 (2007)
Stef-Praun, T., Madeira, G.A., Foster, I., Townsend, R.: Accelerating solution of a moral hazard problem with Swift. In: e-Social Science 2007, Indianapolis (2007)
Sugita, Y., Okamoto, Y.: Replica-exchange molecular dynamics method for protein folding. Chem. Phys. Lett. 314(1–2), 141–151 (1999)
Sun Grid Engine web site. http://www.oracle.com/technetwork/oem/grid-engine-166852.html. Accessed 30 May 2013
Thain, D., Tannenbaum, T., Livny, M.: Distributed computing in practice: the Condor experience. Concurrency Computat. Pract. Exper. 17(2–4), 325–356 (2005)
Thota, A., Luckow, A., Jha, S.: Efficient large-scale replica-exchange simulations on production infrastructure. Phil. Trans. R. Soc. Lond. A 369(1949), 3318–3335 (2011)
Top 500 web site. http://www.top500.org. Accessed 30 May 2013
Using the Hydra process manager. https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager. Accessed 30 May 2013
von Laszewski, G., Foster, I., Gawor, J., Lane, P.: A Java commodity Grid kit. Concurrency Computat. Pract. Exper. 13(8–9), 645–662 (2001)
Wibisono, A., Zhao, Z., Belloum, A., Bubak, M.: A framework for interactive parameter sweep applications. In: Bubak, M., van Albada, G., Dongarra, J., Sloot, P. (eds.) Computational Science—ICCS 2008. Lecture Notes in Computer Science, vol. 5103. Springer, Berlin/Heidelberg (2008)
Wilde, M., Foster, I., Iskra, K., Beckman, P., Zhang, Z., Espinosa, A., Hategan, M., Clifford, B., Raicu, I.: Parallel scripting for applications at the petascale and beyond. Computer 42(11), 50–60 (2009)
Wilde, M., Hategan, M., Wozniak, J.M., Clifford, B., Katz, D.S., Foster, I.: Swift: a language for distributed parallel scripting. Parallel Comput. 37(9), 633–652 (2011)
Wozniak, J.M., Wilde, M.: Case studies in storage access by loosely coupled petascale applications. In: Proc. Petascale Data Storage Workshop at SC’09 (2009)
Wozniak, J.M., Jacobs, B., Latham, R., Lang, S., Son, S.W., Ross, R.: Implementing reliable data structures for MPI services in high component count systems. In: Recent Advances in Parallel Virtual Machine and Message Passing Interface. Lecture Notes in Computer Science, vol. 5759. Springer (2009)
Zhang, Z., Espinosa, A., Iskra, K., Raicu, I., Foster, I., Wilde, M.: Design and evaluation of a collective I/O model for loosely-coupled petascale programming. In: Proc. MTAGS Workshop at SC’08 (2008)
Zhao, Y., Hategan, M., Clifford, B., Foster, I., von Laszewski, G., Raicu, I., Stef-Praun, T., Wilde, M.: Swift: Fast, reliable, loosely coupled parallel computation. In: Proc. Workshop on Scientific Workflows (2007)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wozniak, J.M., Wilde, M. & Katz, D.S. JETS: Language and System Support for Many-Parallel-Task Workflows. J Grid Computing 11, 341–360 (2013). https://doi.org/10.1007/s10723-013-9259-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10723-013-9259-2