Abstract
Despite maturing in many ways, heterogeneous distributed computing platforms continue to require substantial effort in terms of software installation and management for efficient use, often necessitating manual intervention by resource providers and end-users. In this paper we propose a novel model of resource sharing that is a viable alternative to that commonly adopted in the grid community. Our model, termed Unibus, shifts the resource virtualization and aggregation responsibilities to the software at the client side, taking these burdens away from resource providers. Drawing from parallels with operating systems, we argue that distributed resources may be unified and aggregated at the user’s end, in a manner similar to ordinary peripheral devices. Running on the user’s access device, the overlay system software can virtualize remote resources via dynamically deployed software mediators analogous to device drivers, reconfiguring the resources if necessary via “firmware” modules. To illustrate the feasibility of the Unibus model, we have prototyped a development toolkit automating the installation, build, run, and post-processing stages of MPI applications. Through the provided console, this toolkit can deploy and configure an MPI execution environment across a set of heterogeneous, isolated distributed resources, turning them into a coherent virtual machine with a single interface point. We conducted a series of experiments with the NAS Parallel Benchmarks. Results indicate that the toolkit preserves the application performance of “bare” MPI, while substantially reducing maintenance and configuration efforts. Overall, the results suggest that the envisioned client side overlay model for resource sharing may potentially be able to address some of long-standing obstacles in building heterogeneous HPC systems.
Similar content being viewed by others
References
Guo Y, Liu JG, Ghanem M, Mish K , Curcin V, Haselwimmer C, Sotiriou D, Muraleetharan K, Taylor L (2005) Bridging the macro and micro: A computing intensive earthquake study using discovery net. In: ACM/IEEE SC 2005 conference (SC’05), Seattle, USA, Nov 2005
Biomedical informatics research network (2006) http://www.nbirn.net
Yurkiewicz K (2005) Sciences on the grid, symmetry, vol 02, Nov 2005. http://www.symmetrymag.org/pdfs/200511/sciences_on_the_grid.pdf
Werner JC (2005) How to succeed using grid in high energy physics experiments, Tech Rep, high energy physics, University of Manchester, 2005. Available at http://www.hep.man.ac.uk/u/jamwer/esci2005.pdf
Chin J, Coveney PV (2004) Towards tractable toolkits for the grid: a plea for lightweight, usable middleware, Tech Rep UKeS-2004-01, UK e-Science, 2004. http://www.nesc.ac.uk/technical_papers/UKeS-2004-01.pdf
Open grid portals: portals, portlets and the grid (2006) http://www.opengridportals.org/
Fox G, Gannon D, Thomas M (2003) Grid computing, ch. overview of grid computing environments, 2003, pp 541–553. http://dx.doi.org/10.1002/0470867167.ch20
National Institute of Environmental Health Science, U.S. National Institutes of Health (2006) http://www-apps.niehs.nih.gov/Katrina/
Enabling scientific discoveries and improving education in geosciences through information technology research (2006) http://www.geongrid.org/
Center for computation and technology HPC portal (2006) http://portal.cct.lsu.edu
NGS P-GRADE (2006) http://www.cpc.wmin.ac.uk/ngsportal
National grid service (2006) http://www.grid-support.ac.uk/
Globus alliance (2006) http://www.globus.org
Legion (2006) http://legion.virginia.edu
Grimshaw AS, Humphrey MA, Natrajan A (2004) A philosophical and technical comparison of legion and globus, IBM J Res Dev 48(2). http://www.research.ibm.com/journal/rd/482/grimshaw.html
Kurzyniec D, Wrzosek T, Drzewiecki D, Sunderam V (2003) Towards self-organizing distributed computing frameworks: the H2O approach. Parallel Process Lett 13(2):273–290
The H2O project (2006) http://www.mathcs.emory.edu/dcl/h2o/
Jurczyk P, Golenia M, Malawski M, Kurzyniec D, Bubak M Sunderam VS (2004) A system for distributed computing based on H2O and JXTA. In: Cracow grid workshop 2004, Kraków, Poland, 2004
IBM, Practical autonomic computing: roadmap to self managing technology (2006) http://www-03.ibm.com/autonomic/pdfs/AC_Practical_Roadmap_Whitepaper.pdf
Arjav GB, Chakravarti J, Lauria M (2005) The organic grid: self-organizing computation on a peer-to-peer network. IEEE Trans Syst Man Cybern—Part A: Syst Human 35:373–384
Han J, Park D (2003) A lightweight personal grid using a supernode network. In: Proc of the 3rd international conference on peer-to-peer computing, Linköping, Sweden. IEEE Computer Society, 2003, pp 168–175
Yalagandula P, Alvisi L, Dahlin M, Vin H (2002) C0PE: consistent 0-administration personal environment. In: Proc of the sixth international workshop on object-oriented real-time dependable systems (WORDS’01). IEEE Computer Society, 2001, pp 34–41
Kaminsky M et al (2004) REX: secure, extensible remote execution. In: Proc of 2004 USENIX annual technical conference (USENIX ’04), Boston, Massachusetts, USA, June 2004, pp 199–212
Walker E, Minyard T, Boisseau J (2004) GridShell: a login shell for orchestrating and coordinating applications in a grid enabled environment. In: Proc of international conference on computing communications and control technologies, Austin, Texas, USA, 2004, pp 182–187
Abramson D, Giddy J, Kotler L (2000) High performance parametric modeling with nimrod/G: killer application for the global grid? In: Proc of the 2000 international parallel and distributed processing symposium. IEEE Computer Society, Cancun, Mexico, 2000, pp 520–528
Eclipse.org (2006) http://www.eclipse.org
Scalable systems software for terascale computer centers (2006) http://www.scidac.org/ScalableSystems
Dubois PF, Kumfert GK, Epperly TGW (2003) Why Johnny can’t build. Comput Sci Engn 5:83–88
Kumfert GK, Epperly TGW (2002) Software in the DOE: the hidden overhead of “the build,” Tech Rep UCRL-ID-147343, Lawrence Livermore National Laboratory, 2002
Czajkowski K et al (1998) A resource management architecture for metacomputing systems. In: Proc of the IPPS/SPDP ’98 workshop on job scheduling strategies for parallel processing, Orlando, FL, USA, 1998, pp 62–82
Fagg G, Gabriel E, Bosilca G, Angskun T, Chen Z, Pjesivac-Grbovic J, London K, Dongarra, J (2004) Extending the MPI specification for process fault tolerance on high performance computing systems. In: Proceedings of ISC2004, Heidelberg, Germany, June 2004. Available at http://icl.cs.utk.edu/projectsfiles/ftmpi/pubs/isc2004-FT-MPI.pdf
The internet engineering task force network working group—the secure shell (SSH) connection protocol—RFC 4254, Jan 2006. http://www.ietf.org/rfc/rfc4254.txt
Satyanarayanan M (2002) The evolution of coda. ACM Trans Comput Syst (TOCS) 20(2):85–124
Muthitacharoen A, Morris R, Gil T, Chen B (2002) Ivy: a read/write peer-to-peer file system. In: Proc of the 5th USENIX symposium on operating systems design and implementation (OSDI ’02), Boston, MA, USA, 2002, pp 31–44
Ong E, Lusk E, Gropp W (2001) Scalable unix commands for parallel processors: a high-performance implementation. In: Recent advances in parallel virtual machine and message passing interface: 8th European PVM/MP users’, vol 2131, LNCS. Springer, Berlin, Jan 2001
NASA advanced supercomputing (NAS) division: NAS parallel benchmarks (2006) http://www.nas.nasa.gov
JCraft, JSCH—Java secure channel (2006) http://www.jcraft.com/jsch/
Wong FC, Martin RP, Arpaci-Dusseau RH, Culler DE (1999) Architectural requirements and scalability of the NAS parallel benchmarks. In: Proceedings of the 1999 ACM/IEEE conference on supercomputing (CDROM), ACM Press, New York, NY, USA, 1999
Cheliotis G, Kenyon C, Buyya R (2003) Grid economics: 10 Lessons from finance, daily news and information for the global grid community, vol 2, Jul 2003. http://www.gridbus.org/papers/grid_lessons.pdf
Author information
Authors and Affiliations
Corresponding author
Additional information
Research supported in part by U.S. DoE grant DE-FG02-02ER25537 and NSF grant ACI-0220183. An earlier version of the material in Section 5 of this paper was submitted to PDP 2007.
Rights and permissions
About this article
Cite this article
Kurzyniec, D., Sławińska, M., Sławiński, J. et al. Unibus: a contrarian approach to grid computing. J Supercomput 42, 125–144 (2007). https://doi.org/10.1007/s11227-006-0033-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-006-0033-0