Metacomputing across intercontinental networks

https://doi.org/10.1016/S0167-739X(01)00032-2Get rights and content

Abstract

An intercontinental network of supercomputers spanning more than 10 000 miles and running challenging scientific applications was realized at the Supercomputing ’99 (SC99) conference in Portland, OR using PACX-MPI and ATM PVCs. In this paper, we describe how we constructed the heterogeneous cluster of supercomputers, the problems we confronted in terms of multi-architecture and the way several applications handled the specific requirements of a metacomputer.

Section snippets

Overview of the SC99 global network

During SC99 a network connection was set up that connected systems in Europe, the US and Japan. For the experiments described in this paper, four supercomputers were linked together.

  • A Hitachi SR8000 at Tsukuba/Japan with 512 processors (64 nodes).

  • A Cray T3E-900/512 at Pittsburgh/USA with 512 processors.

  • A Cray T3E-1200/576 at Manchester/UK with 576 processors.

  • A Cray T3E-900/512 at Stuttgart/Germany with 512 processors.

Together this virtual supercomputer had a peak performance of roughly 2.1

Problems of multi-architecture

Heterogeneous metacomputing introduces some problems that are similar to those well known from cluster computing and some that are very specific [2]. The most important ones are

  • different data representations on each system,

  • different processor speeds on each system,

  • different communication speeds for internal messages on each system,

  • different communication speeds of messages internal to a system and messages between systems,

  • lack of a common file system,

  • lack of resource management.

The problem of

Use of experimental data

In this section, we describe a novel use of metacomputing, namely the processing of data from an experimental facility (in this case the Jodrell Bank radio telescope). This presents particular challenges to our metacomputer since the distribution of the data makes severe demands on the intercontinental bandwidth. This experiment involved coupling of the three T3E machines only.

Computational fluid dynamics

Another application used during SC99 was a CFD-code called URANUS (Upwind Algorithm for Nonequilibrium Flows of the University of Stuttgart) [4]. This program has been developed to simulate the reentry phase of a space vehicle in a wide altitude velocity range. The reason why URANUS was tested in such a metacomputing environment is that the non-equilibrium part has been finished and will be parallelized soon. For this version of URANUS, the memory requirements for a large configuration exceeds

Molecular dynamics

A further application that was adapted for metacomputing is a molecular dynamic program for short range interaction [5] simulating granular matter. The parallel paradigm applied here is domain decomposition and message passing with MPI (message passing interface). Therefore every CPU is responsible for a part of the domain and has to exchange informations about the border of its domain with its adjacent neighbors. Static load balancing can be implemented by assigning domains of different size

Conclusion and future plan

We have described the realization of a global metacomputer which comprised supercomputers in Japan, the USA, Germany and the UK. The network spanning 10 000 miles and several research networks was optimized by using either dedicated PVCs or improved routes. During SC99, the usability of this metacomputer has been demonstrated by several demanding applications. This takes the concept of a global metacomputer forward in two major ways. Firstly, we have shown that it is possible to link together

Acknowledgements

The authors would like to thank Tsukuba Advanced Computing Center, Pittsburgh Supercomputing Center, Manchester Computing Center, High-performance Computing Center Stuttgart, BelWü, DFN, JANET, TEN-155, Abilene, vBNS, STAR-TAP, TransPAC, APAN and IMnet for their support in these experiments.

Stephen Pickles has been involved in both commercial and scientific computing since 1982 when he joined ICL (Australia) as trainee programmer. He graduated in physics from Macquarie University in 1994, where he won the university medal. He obtained his PhD in lattice QCD from the University of Edinburgh in 1998. He is currently the leader of the CSAR Technology Refresh Team at the University of Manchester. He has publications in quantum optics, lattice QCD, and grid computing. His current

References (6)

There are more references available in the full text version of this article.

Cited by (15)

  • Large scale computational science on federated international grids: The role of switched optical networks

    2010, Future Generation Computer Systems
    Citation Excerpt :

    Finally, we describe the scientific studies that we have conducted, which have benefited from the high bandwidth and QoS provided by the UKLight network. As the power of computational resources grows, the applications that seek to exploit them become more ambitious; applications that can successfully orchestrate grid resources to build a system more useful than the sum of its parts can produce a step jump in the scope of research possible, such as the work of Pickles et al. [7]. Applications that require access to high QoS networks typically use the network to support interactive visualisation, coupled models, computational steering [8], and/or distributed applications where one model is run over a number of sites simultaneously, for example using MPICH-G2/MPI-g [9].

  • Supporting data management on cluster grids

    2008, Future Generation Computer Systems
    Citation Excerpt :

    ePVM supports the execution of large-scale PVM applications in a different way from Condor and Globus. It overcomes the limitations affecting PVM and Beolin by adopting an approach that succeeds in being more similar to the ones followed by PACX-MPI (Parallel Computer extension) [25–28] and H2O [29,30] in the context of MPI programming. PACX-MPI is a library that makes it possible to seamlessly run MPI applications on computational grids without any adaptation of the application code.

  • Efficient metacomputing of elliptic linear and non-linear problems

    2003, Journal of Parallel and Distributed Computing
    Citation Excerpt :

    They differ, however, in completeness of implementing the MPI-standard and in the degree of optimization. For the experiments described in this paper we have chosen PACX-MPI [8] from the High Performance Computing Center Stuttgart (HLRS) which allows metacomputing for MPI-codes without any changes [20]. Based on this and the experience of several projects [21,22] the PACX-MPI library therefore relies on four main concepts:

  • Parallel and distributed computing on multidomain non-routable networks

    2011, International Journal of High Performance Computing and Networking
  • Workflows for e-Science: Scientific Workflows for Grids

    2007, Workflows for e-Science: Scientific Workflows for Grids
View all citing articles on Scopus

Stephen Pickles has been involved in both commercial and scientific computing since 1982 when he joined ICL (Australia) as trainee programmer. He graduated in physics from Macquarie University in 1994, where he won the university medal. He obtained his PhD in lattice QCD from the University of Edinburgh in 1998. He is currently the leader of the CSAR Technology Refresh Team at the University of Manchester. He has publications in quantum optics, lattice QCD, and grid computing. His current research interests are in parallel and distributed computing techniques and their application to scientific problems.

John Brooke graduated in mathematics from the University of Manchester in 1973 and gained a Post-Graduate certificate in education in 1974. From 1974 to 1990 he taught mathematics to children aged between 7 and 16, specializing in work with pupils from disadvantaged social and economic backgrounds. In 1990, he joined the University of Manchester to develop a training programme in high-performance computing and completed a PhD in mathematics in 1997. He currently leads a team of researchers and consultants working on projects to develop grid computing on local, national and global scales. He is also active in research in astrophysics and nonlinear mathematics, with special reference to the study of intermittency and symmetry-breaking in rotating systems. He is a Fellow of the Royal Astronomical Society and an honorary lecturer in mathematics at the University of Manchester. More details of his research can be found at http://www.csar.cfs.ac.uk/staff/brooke.

F.C. Costen Fumie Costen received the B.E. and M.E. degrees in electronics from Kyoto University, Japan in 1991 and 1993, respectively. In 1993 and 1996, she joined respectively Optical and Radio Communications Research Laboratories and Adaptive Communications Research Laboratories in Advanced Telecommunication Research International to continue her work on the direction-of-arrival estimation. In 1998, she joined Manchester Computing in the University of Manchester and carried out her research on Metacomputing. Since 2000, she has been a lecturer in the Department of Computer Science at the University of Machester. She is widening her research area from the DOA estimation to the signal processing for the wireless mobile telecommunication with CDMA/OFDM.

Edgar Gabriel received his diploma degree at the University of Stuttgart in 1998 in mechanical engineering. Since June 1998 he is a member of the Parallel Computing Department at the High-performance Computing Center Stuttgart (HLRS). There, he is involved in the metacomputing activities, and coordinating the development of PACX-MPI. Furthermore, he is responsible for the technical management of several national and international projects. Since January 2001, he is leading the working group Parallel and Distributed Systems.

Matthias Mueller is now a research scientist at HLRS. He received his diploma degree in physics at the University of Stuttgart in 1996. His master thesis about quasicrystals was carried out at the department of theoretical and applied physics. After his master thesis he wrote his PhD thesis at the Institute of Computer Applications/Stuttgart about fast algorithms for particle simulations. During this period he started to work in a metacomputing project where he participated with his applications. His scientific interests are metacomputing with a focus on applications, physics on high-performance computers and object oriented programming for parallel computing.

Michael M. Resch received his diploma degree in technical mathematics from the Technical University of Graz/Austria in 1990. He presented his PhD thesis about “Metacomputing for engineering applications” at the University of Stuttgart/Germany in 2001. From 1990 to 1993 he was with JOANNEUM RESEARCH — a leading Austrian research company. Since 1993 he is with the High-performance Computing Center Stuttgart (HLRS). Since 1998 he is the head of the Parallel Computing Group of HLRS and since 2000, head of HLRS. His current research interests include parallel programming models, metacomputing and numerical simulation of blood flow. Michael M. Resch was and is involved in a number of European projects like CAESAR, HPS-ICE, METODIS, DECAST and DAMIEN and is responsible for the international collaborations of the HLRS.

Stephen Ord has obtained an honours degree in Physics from Leicester University; a Masters degree in Astronomy from the University of Sussex and has recently completed a PhD in Radio Astronomy at the University of Manchester. He is currently working as a research fellow in the Astrophysics and Supercomputing Department of Swinburne University of Technology in Melbourne, Australia.

View full text