Supporting data management on cluster grids

doi:10.1016/j.future.2007.04.002

Future Generation Computer Systems

Volume 24, Issue 2, February 2008, Pages 166-176

https://doi.org/10.1016/j.future.2007.04.002 Get rights and content

Abstract

Cluster grid computing is considered a promising alternative to grid computing, since it uses networked computing resources widely available within the so-called “departmental” organizations as high performance $/$ cost ratio computing platforms. ePVM is an extension of the well-known PVM programming system, and has been purposely developed to run large-scale PVM applications on cluster grids spanning across multidomain, non-routable networks. To this end, ePVM has been also provided with a parallel file system that enables the enormous volumes of data usually generated by large-scale parallel applications to be managed and distributed among the multiple disks available within cluster grids. In particular, the parallel file system, called ePIOUS, is an optimized implementation of the PIOUS parallel file system under ePVM. The implementation has been developed taking into account the two-level physical network topology characterizing the cluster grids built by ePVM. Furthermore, in order not to penalize the application performance, ePIOUS has been also provided with a file caching service that is able to speed up file accesses across clusters.

Introduction

Cluster grid computing [1], [2] can be considered an actual alternative to both grid computing [3], [4], [5] and traditional parallel computing based on supercomputing systems and on clusters of workstations exploited as a unique, coherent, high performance computing resource. However, while clusters of workstations are currently used to build high performance $/$ cost ratio computing platforms, grid computing still means to integrate heterogeneous computing resources of widely varying capabilities, connected by potentially unreliable, heterogeneous networks, and located in different administrative domains. In fact, this can create many problems for programmers, who have to deal with highly variable communication delays, security threats, machine and network failures, and the distributed ownership of computing resources, if they want to properly configure and optimize their large-scale applications in a grid computational context.

On the other hand, traditional parallel computing, particularly that based on clusters of workstations, tends to use networked computing resources widely available within the so-called “departmental” organizations, such as research centers, universities, and business enterprises, where problems concerning security, ownership, and configuration of the used networked resources are commonly and easily solved within a single administrative domain [6].

However, it is also worth noting that most of the computing resources existing within departmental organizations are often represented by computing nodes belonging to non-routable, private networks and connected to the Internet through publicly addressable IP front-end nodes [6]. As a consequence, such resources cannot be considered as actually available to run large-scale applications, since they cannot be easily exploited by widely used, conventional or standard parallel programming supports, such as PVM (Parallel Virtual Machine) [7] or MPI (Message Passing Interface) [8], or middlewares for grid computing [3], [5].

ePVM [6], [9] is an extension of PVM [7], [10], [11], a well-known programming system mainly used by the scientific community to develop high performance, large-scale parallel applications on high performance $/$ cost ratio computing platforms, such as clusters of workstations (COWs). The main goal of ePVM is to enable PVM applications to efficiently run on “cluster grids” spanning across multidomain, non-routable networks. In fact, ePVM enables computing nodes, even those not provided with public IP addresses but connected to the Internet through publicly addressable IP front-end computing nodes, to be used as hosts in a unique parallel virtual machine. Therefore, ePVM allows programmers to build and dynamically reconfigure a PVM compliant parallel virtual machine that can be extended to comprise collections of COWs provided that these are supplied with publicly addressable IP front-end nodes. As a consequence, even if all the computing nodes taking part in an ePVM virtual machine result in being actually arranged according to a physical network topology consisting of two hierarchical levels (the level of the publicly addressable IP nodes and the level of the non publicly addressable IP nodes belonging to COWs), they virtually appear as arranged according to a flat logical network topology.

Furthermore, it is also worth noting that large-scale PVM applications often generate enormous volumes of data, which can be efficiently managed and distributed among the computing nodes by specific parallel file systems (PFS’s). To this end, in the past, a number of PFS’s were developed [12], [13], [14]. Among them, PIOUS (Parallel Input/Output System) [15], [16], [17] was purposely designed to incorporate parallel I/O into existing parallel programming systems. It has been widely used as a normal parallel application within PVM. Therefore, PIOUS could be directly exploited within ePVM. However, the execution of PIOUS under ePVM usually results in it being penalized by the two-level physical network topology characterizing the cluster grids normally built by ePVM, and this ends up also penalizing the applications using both ePVM and PIOUS.

This paper presents ePIOUS [18], the optimized implementation of PIOUS under ePVM. The implementation has been carried out taking into account the basic ideas and architecture of ePVM. Therefore, the implementation fully exploits the two-level physical network topology characterizing the cluster grids built by ePVM. To this end, ePIOUS has been also provided with a specific file caching service that can speed up file accesses across clusters.

The outline of the paper is as follows. Section 2 summarizes the main PVM concepts. Section 3 describes the ePVM approach, whereas Section 4 briefly presents the ePVM programming system. Section 5 describes the ePIOUS parallel file system. Section 6 reports on some experimental results. In Section 7 a brief conclusion is presented.

Section snippets

The PVM programming system

PVM [7], [10], [11] is a programming system based on the concept of a “virtual machine”, that is, a software abstraction of a distributed computing platform consisting of a set of cooperating processes, called “daemons”, that together supply the services required to run parallel applications as if they were on a distributed memory parallel computer (see Fig. 1).

Daemons can run on heterogeneous distributed computing nodes connected by a variety of networks. In particular, all the nodes making up

The ePVM approach

PVM requires that all the hosts making up a virtual machine are IP addressable. In fact, PVM cannot exploit a COW provided only with one publicly addressable IP front-end computing node that hides from the Internet all the other nodes of the cluster. This means that PVM running on hosts outside such a COW cannot exploit the cluster’s internal nodes. Therefore, PVM appears to be inflexible in many respects, which can be constraining when the main goal of a software infrastructure is to aggregate

The ePVM programming system

ePVM extends PVM by introducing the abstraction of the “cluster” [6]. A cluster is a set of interconnected computing nodes provided with private IP addresses and hidden behind a publicly addressable IP front-end computing node (see Fig. 2). During computation, it is managed as a normal PVM virtual machine, where a “master” pvmd daemon is started on the front-end node, whereas “slaves” are started on all the other nodes of the cluster. However, the front-end node is also provided with a specific

ePIOUS

ePIOUS is the optimized implementation of PIOUS [15], [16], [17] under ePVM. It implements a PFS on cluster grids built by using the ePVM system. However, ePIOUS cannot provide the same performance and functions characterizing commercial PFS’s, such as, for example, GPFS [31], which are available only on the specific computing platforms on which the vendor has implemented them. On the other hand, ePIOUS cannot be considered a distributed file system, such as NFS [32], which is commonly designed

Experimental results

This section reports on a number of tests designed to estimate the performance potential provided by ePIOUS. The tests mainly measure the aggregated transfer rate in a variety of configurations, and are intended primarily as an indicator of how ePIOUS actually performs. However, many other tests have been conducted on ePIOUS, even if they are not reported in this section for the sake of brevity. To this end, it is worth noting that the behavior of ePIOUS is strongly influenced by ePVM, whose

Conclusions and future work

ePIOUS is the PIOUS implementation under ePVM, a PVM extension that enables large-scale PVM applications to run on cluster grids spanning across multidomain, non-routable networks. The limited modifications to the PIOUS library and the addition of a caching service that can speed up file accesses involving different clusters have enabled ePIOUS to achieve a good performance in all the executed tests, and this essentially demonstrates that existing PVM applications can run on cluster grids

References (38)

V.S. Sunderam et al.
Parallel I/O for distributed systems: Issues and implementation
Future Generation Computer Systems
(1996)
I. Foster et al.
The Globus project: A status report
Future Generation Computer Systems
(1999)
M.A. Brune et al.
Message-passing environments for metacomputing
Future Generation Computer Systems
(1999)
S.M. Pickles et al.
Metacomputing across intercontinental networks
Future Generation Computer Systems
(2001)
G. Stuer et al.
Towards OGSA compatibility in the H2O metacomputing framework
Future Generation Computer Systems
(2005)
W. Gentzsch, Grid computing, a vendor’s vision, in: Procs of the 2nd IEEE/ACM Int’l Symposium on Cluster Computing and...
P. Stefán, The hungarian clustergrid project, in: Procs of the Int’l Convention MIPRO 2003, Opatija, Croatia,...
J. Joseph et al.
Grid Computing
(2003)

F. Frattolillo

Running large-scale applications on cluster grids

International Journal of High Performance Computing Applications

(2005)

A. Geist et al.

PVM: Parallel Virtual Machine. A Users’ Guide and Tutorial for Networked Parallel Computing

(1994)

F. Frattolillo

A PVM extension to exploit cluster grids

G. Geist, J. Kohl, R. Manchel, P. Papadopoulos, New features of PVM 3.4 and beyond, in: Procs of the 2nd Euro PVM...

A. Beguelin et al.

Recent enhancements to PVM

International Journal of Supercomputer Applications and High Performance Computing

(1995)

T. Ludwig, R. Wismuller, The Tool-Set—An integrated tool environment for PVM, in: Procs of the 2nd Euro PVM Users’...

W.B. Ligon III, R.B. Ross, Implementation and performance of a parallel file system for high performance distributed...

E. Schikuta et al.

ViPIOS: The Vienna parallel input/output system

Cited by (3)

An efficient data replication method for parallel download
2012, International Journal of Innovative Computing, Information and Control
Modeling and analysis of communication networks in multicluster systems under spatio-temporal bursty traffic
2012, IEEE Transactions on Parallel and Distributed Systems
Integrating and accessing medical data resources within the ViroLab Virtual Laboratory
2008, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Franco Frattolillo received his Laurea degree “cum laude” in electronic engineering from the University of Napoli “Federico II”, Italy, in 1989/1990 and his Ph.D. degree in computer engineering, applied electromagnetics, and telecommunications from University of Salerno, Italy. He was a researcher with the Department of Electrical and Information Engineering of the University of Salerno from 1991 to 1998. In 1999 he joined the Faculty of Engineering of the University of Sannio, Italy, where he currently teaches high performance computing systems, advanced topics in computer networks, network security, and system programming. He is also a research leader with the Research Centre on Software Technology of the University of Sannio and at the Competence Centre on Information Technologies of the Campania Region, Italy. He has written several papers in the areas of parallel programming systems. His research interests also include metacomputing systems, cluster and grid computing, and network security.

View full text

Supporting data management on cluster grids

Abstract

Introduction

Section snippets

The PVM programming system

The ePVM approach

The ePVM programming system

ePIOUS

Experimental results

Conclusions and future work

Future Generation Computer Systems

Future Generation Computer Systems

Future Generation Computer Systems

Future Generation Computer Systems

Future Generation Computer Systems

Grid Computing

Running large-scale applications on cluster grids

International Journal of High Performance Computing Applications

PVM: Parallel Virtual Machine. A Users’ Guide and Tutorial for Networked Parallel Computing

A PVM extension to exploit cluster grids

Recent enhancements to PVM

International Journal of Supercomputer Applications and High Performance Computing

ViPIOS: The Vienna parallel input/output system