Metacomputing across intercontinental networks

doi:10.1016/S0167-739X(01)00032-2

Future Generation Computer Systems

Volume 17, Issue 8, June 2001, Pages 911-918

https://doi.org/10.1016/S0167-739X(01)00032-2 Get rights and content

Abstract

An intercontinental network of supercomputers spanning more than 10 000 miles and running challenging scientific applications was realized at the Supercomputing ’99 (SC99) conference in Portland, OR using PACX-MPI and ATM PVCs. In this paper, we describe how we constructed the heterogeneous cluster of supercomputers, the problems we confronted in terms of multi-architecture and the way several applications handled the specific requirements of a metacomputer.

Section snippets

Overview of the SC99 global network

During SC99 a network connection was set up that connected systems in Europe, the US and Japan. For the experiments described in this paper, four supercomputers were linked together.

•
A Hitachi SR8000 at Tsukuba/Japan with 512 processors (64 nodes).
•
A Cray T3E-900/512 at Pittsburgh/USA with 512 processors.
•
A Cray T3E-1200/576 at Manchester/UK with 576 processors.
•
A Cray T3E-900/512 at Stuttgart/Germany with 512 processors.

Together this virtual supercomputer had a peak performance of roughly 2.1

Problems of multi-architecture

Heterogeneous metacomputing introduces some problems that are similar to those well known from cluster computing and some that are very specific [2]. The most important ones are

•
different data representations on each system,
•
different processor speeds on each system,
•
different communication speeds for internal messages on each system,
•
different communication speeds of messages internal to a system and messages between systems,
•
lack of a common file system,
•
lack of resource management.

The problem of

Use of experimental data

In this section, we describe a novel use of metacomputing, namely the processing of data from an experimental facility (in this case the Jodrell Bank radio telescope). This presents particular challenges to our metacomputer since the distribution of the data makes severe demands on the intercontinental bandwidth. This experiment involved coupling of the three T3E machines only.

Computational fluid dynamics

Another application used during SC99 was a CFD-code called URANUS (Upwind Algorithm for Nonequilibrium Flows of the University of Stuttgart) [4]. This program has been developed to simulate the reentry phase of a space vehicle in a wide altitude velocity range. The reason why URANUS was tested in such a metacomputing environment is that the non-equilibrium part has been finished and will be parallelized soon. For this version of URANUS, the memory requirements for a large configuration exceeds

Molecular dynamics

A further application that was adapted for metacomputing is a molecular dynamic program for short range interaction [5] simulating granular matter. The parallel paradigm applied here is domain decomposition and message passing with MPI (message passing interface). Therefore every CPU is responsible for a part of the domain and has to exchange informations about the border of its domain with its adjacent neighbors. Static load balancing can be implemented by assigning domains of different size

Conclusion and future plan

We have described the realization of a global metacomputer which comprised supercomputers in Japan, the USA, Germany and the UK. The network spanning 10 000 miles and several research networks was optimized by using either dedicated PVCs or improved routes. During SC99, the usability of this metacomputer has been demonstrated by several demanding applications. This takes the concept of a global metacomputer forward in two major ways. Firstly, we have shown that it is possible to link together

Acknowledgements

The authors would like to thank Tsukuba Advanced Computing Center, Pittsburgh Supercomputing Center, Manchester Computing Center, High-performance Computing Center Stuttgart, BelWü, DFN, JANET, TEN-155, Abilene, vBNS, STAR-TAP, TransPAC, APAN and IMnet for their support in these experiments.

Stephen Pickles has been involved in both commercial and scientific computing since 1982 when he joined ICL (Australia) as trainee programmer. He graduated in physics from Macquarie University in 1994, where he won the university medal. He obtained his PhD in lattice QCD from the University of Edinburgh in 1998. He is currently the leader of the CSAR Technology Refresh Team at the University of Manchester. He has publications in quantum optics, lattice QCD, and grid computing. His current

References (6)

M. Resch et al.
Metacomputing experience in a transatlantic wide area application testbed
Future Gener. Comp. Syst.
(1999)
M.A. Brune et al.
Message-passing environments for metacomputing
Future Gener. Comp. Syst.
(1999)
E. Gabriel, M. Resch, T. Beisel, R. Keller, Distributed computing in a heterogeneous computing environment, in: V....

There are more references available in the full text version of this article.

Cited by (15)

Large scale computational science on federated international grids: The role of switched optical networks
2010, Future Generation Computer Systems
Citation Excerpt :
Finally, we describe the scientific studies that we have conducted, which have benefited from the high bandwidth and QoS provided by the UKLight network. As the power of computational resources grows, the applications that seek to exploit them become more ambitious; applications that can successfully orchestrate grid resources to build a system more useful than the sum of its parts can produce a step jump in the scope of research possible, such as the work of Pickles et al. [7]. Applications that require access to high QoS networks typically use the network to support interactive visualisation, coupled models, computational steering [8], and/or distributed applications where one model is run over a number of sites simultaneously, for example using MPICH-G2/MPI-g [9].
The provision of high performance compute and data resources on a grid has often been the primary concern of grid resource providers, with the network links used to connect them only a secondary matter. Certain large scale distributed scientific simulations, especially ones which involve cross-site runs or interactive visualisation and steering capabilities, often require high quality of service, high bandwidth, low latency network interconnects between resources. In this paper, we describe three applications which require access to such network infrastructure, together with the middleware and policies needed to make them possible.
Supporting data management on cluster grids
2008, Future Generation Computer Systems
Citation Excerpt :
ePVM supports the execution of large-scale PVM applications in a different way from Condor and Globus. It overcomes the limitations affecting PVM and Beolin by adopting an approach that succeeds in being more similar to the ones followed by PACX-MPI (Parallel Computer extension) [25–28] and H2O [29,30] in the context of MPI programming. PACX-MPI is a library that makes it possible to seamlessly run MPI applications on computational grids without any adaptation of the application code.
Cluster grid computing is considered a promising alternative to grid computing, since it uses networked computing resources widely available within the so-called “departmental” organizations as high performance $/$ cost ratio computing platforms. ePVM is an extension of the well-known PVM programming system, and has been purposely developed to run large-scale PVM applications on cluster grids spanning across multidomain, non-routable networks. To this end, ePVM has been also provided with a parallel file system that enables the enormous volumes of data usually generated by large-scale parallel applications to be managed and distributed among the multiple disks available within cluster grids. In particular, the parallel file system, called ePIOUS, is an optimized implementation of the PIOUS parallel file system under ePVM. The implementation has been developed taking into account the two-level physical network topology characterizing the cluster grids built by ePVM. Furthermore, in order not to penalize the application performance, ePIOUS has been also provided with a file caching service that is able to speed up file accesses across clusters.
POD acceleration of fully implicit solver for unsteady nonlinear flows and its application on grid architecture
2007, Advances in Engineering Software
A method for the acceleration of a fully implicit solution of nonlinear unsteady boundary value problem is presented. The principle of acceleration is for provide to the inexact Newton backtracking method a better initial guess, for the current time step, than the conventional choice from the previous time step. This initial guess is built on the reduced model obtained by a proper orthogonal decomposition of solutions at the previous time steps. This approach is appealing to GRID computing: spare processors may help to improve the numerical efficiency and to manage the computing in a reliable way.
Efficient metacomputing of elliptic linear and non-linear problems
2003, Journal of Parallel and Distributed Computing
Citation Excerpt :
They differ, however, in completeness of implementing the MPI-standard and in the degree of optimization. For the experiments described in this paper we have chosen PACX-MPI [8] from the High Performance Computing Center Stuttgart (HLRS) which allows metacomputing for MPI-codes without any changes [20]. Based on this and the experience of several projects [21,22] the PACX-MPI library therefore relies on four main concepts:
Metacomputing is a method of using the GRID, which originated in the US and quickly was also picked up by European and Japanese researchers where a number of challenging projects were aiming at the exploitation of such distributed resources. Especially the GLOBUS project (The globus project: a status report, Proceedings IPPS/SPDP’98 Heterogeneous Computing Workshop, 1998, pp. 4–18) has contributed to the success of metacomputing substantially. However, high latency and low bandwidth have made people doubt the feasibility of this concept for big simulation codes. In this paper we present new numerical methods that help to exploit such configurations and overcome the problems of low network performance. To proof the feasibility of our approach we show results of simulations in an innovative GRID environment of supercomputers.
Parallel and distributed computing on multidomain non-routable networks
2011, International Journal of High Performance Computing and Networking
Workflows for e-Science: Scientific Workflows for Grids
2007, Workflows for e-Science: Scientific Workflows for Grids

View all citing articles on Scopus

John Brooke graduated in mathematics from the University of Manchester in 1973 and gained a Post-Graduate certificate in education in 1974. From 1974 to 1990 he taught mathematics to children aged between 7 and 16, specializing in work with pupils from disadvantaged social and economic backgrounds. In 1990, he joined the University of Manchester to develop a training programme in high-performance computing and completed a PhD in mathematics in 1997. He currently leads a team of researchers and consultants working on projects to develop grid computing on local, national and global scales. He is also active in research in astrophysics and nonlinear mathematics, with special reference to the study of intermittency and symmetry-breaking in rotating systems. He is a Fellow of the Royal Astronomical Society and an honorary lecturer in mathematics at the University of Manchester. More details of his research can be found at http://www.csar.cfs.ac.uk/staff/brooke.

F.C. Costen Fumie Costen received the B.E. and M.E. degrees in electronics from Kyoto University, Japan in 1991 and 1993, respectively. In 1993 and 1996, she joined respectively Optical and Radio Communications Research Laboratories and Adaptive Communications Research Laboratories in Advanced Telecommunication Research International to continue her work on the direction-of-arrival estimation. In 1998, she joined Manchester Computing in the University of Manchester and carried out her research on Metacomputing. Since 2000, she has been a lecturer in the Department of Computer Science at the University of Machester. She is widening her research area from the DOA estimation to the signal processing for the wireless mobile telecommunication with CDMA/OFDM.

Edgar Gabriel received his diploma degree at the University of Stuttgart in 1998 in mechanical engineering. Since June 1998 he is a member of the Parallel Computing Department at the High-performance Computing Center Stuttgart (HLRS). There, he is involved in the metacomputing activities, and coordinating the development of PACX-MPI. Furthermore, he is responsible for the technical management of several national and international projects. Since January 2001, he is leading the working group Parallel and Distributed Systems.

Matthias Mueller is now a research scientist at HLRS. He received his diploma degree in physics at the University of Stuttgart in 1996. His master thesis about quasicrystals was carried out at the department of theoretical and applied physics. After his master thesis he wrote his PhD thesis at the Institute of Computer Applications/Stuttgart about fast algorithms for particle simulations. During this period he started to work in a metacomputing project where he participated with his applications. His scientific interests are metacomputing with a focus on applications, physics on high-performance computers and object oriented programming for parallel computing.

Michael M. Resch received his diploma degree in technical mathematics from the Technical University of Graz/Austria in 1990. He presented his PhD thesis about “Metacomputing for engineering applications” at the University of Stuttgart/Germany in 2001. From 1990 to 1993 he was with JOANNEUM RESEARCH — a leading Austrian research company. Since 1993 he is with the High-performance Computing Center Stuttgart (HLRS). Since 1998 he is the head of the Parallel Computing Group of HLRS and since 2000, head of HLRS. His current research interests include parallel programming models, metacomputing and numerical simulation of blood flow. Michael M. Resch was and is involved in a number of European projects like CAESAR, HPS-ICE, METODIS, DECAST and DAMIEN and is responsible for the international collaborations of the HLRS.

Stephen Ord has obtained an honours degree in Physics from Leicester University; a Masters degree in Astronomy from the University of Sussex and has recently completed a PhD in Radio Astronomy at the University of Manchester. He is currently working as a research fellow in the Astrophysics and Supercomputing Department of Swinburne University of Technology in Melbourne, Australia.

View full text