Supporting data management on cluster grids
Introduction
Cluster grid computing [1], [2] can be considered an actual alternative to both grid computing [3], [4], [5] and traditional parallel computing based on supercomputing systems and on clusters of workstations exploited as a unique, coherent, high performance computing resource. However, while clusters of workstations are currently used to build high performancecost ratio computing platforms, grid computing still means to integrate heterogeneous computing resources of widely varying capabilities, connected by potentially unreliable, heterogeneous networks, and located in different administrative domains. In fact, this can create many problems for programmers, who have to deal with highly variable communication delays, security threats, machine and network failures, and the distributed ownership of computing resources, if they want to properly configure and optimize their large-scale applications in a grid computational context.
On the other hand, traditional parallel computing, particularly that based on clusters of workstations, tends to use networked computing resources widely available within the so-called “departmental” organizations, such as research centers, universities, and business enterprises, where problems concerning security, ownership, and configuration of the used networked resources are commonly and easily solved within a single administrative domain [6].
However, it is also worth noting that most of the computing resources existing within departmental organizations are often represented by computing nodes belonging to non-routable, private networks and connected to the Internet through publicly addressable IP front-end nodes [6]. As a consequence, such resources cannot be considered as actually available to run large-scale applications, since they cannot be easily exploited by widely used, conventional or standard parallel programming supports, such as PVM (Parallel Virtual Machine) [7] or MPI (Message Passing Interface) [8], or middlewares for grid computing [3], [5].
ePVM [6], [9] is an extension of PVM [7], [10], [11], a well-known programming system mainly used by the scientific community to develop high performance, large-scale parallel applications on high performancecost ratio computing platforms, such as clusters of workstations (COWs). The main goal of ePVM is to enable PVM applications to efficiently run on “cluster grids” spanning across multidomain, non-routable networks. In fact, ePVM enables computing nodes, even those not provided with public IP addresses but connected to the Internet through publicly addressable IP front-end computing nodes, to be used as hosts in a unique parallel virtual machine. Therefore, ePVM allows programmers to build and dynamically reconfigure a PVM compliant parallel virtual machine that can be extended to comprise collections of COWs provided that these are supplied with publicly addressable IP front-end nodes. As a consequence, even if all the computing nodes taking part in an ePVM virtual machine result in being actually arranged according to a physical network topology consisting of two hierarchical levels (the level of the publicly addressable IP nodes and the level of the non publicly addressable IP nodes belonging to COWs), they virtually appear as arranged according to a flat logical network topology.
Furthermore, it is also worth noting that large-scale PVM applications often generate enormous volumes of data, which can be efficiently managed and distributed among the computing nodes by specific parallel file systems (PFS’s). To this end, in the past, a number of PFS’s were developed [12], [13], [14]. Among them, PIOUS (Parallel Input/Output System) [15], [16], [17] was purposely designed to incorporate parallel I/O into existing parallel programming systems. It has been widely used as a normal parallel application within PVM. Therefore, PIOUS could be directly exploited within ePVM. However, the execution of PIOUS under ePVM usually results in it being penalized by the two-level physical network topology characterizing the cluster grids normally built by ePVM, and this ends up also penalizing the applications using both ePVM and PIOUS.
This paper presents ePIOUS [18], the optimized implementation of PIOUS under ePVM. The implementation has been carried out taking into account the basic ideas and architecture of ePVM. Therefore, the implementation fully exploits the two-level physical network topology characterizing the cluster grids built by ePVM. To this end, ePIOUS has been also provided with a specific file caching service that can speed up file accesses across clusters.
The outline of the paper is as follows. Section 2 summarizes the main PVM concepts. Section 3 describes the ePVM approach, whereas Section 4 briefly presents the ePVM programming system. Section 5 describes the ePIOUS parallel file system. Section 6 reports on some experimental results. In Section 7 a brief conclusion is presented.
Section snippets
The PVM programming system
PVM [7], [10], [11] is a programming system based on the concept of a “virtual machine”, that is, a software abstraction of a distributed computing platform consisting of a set of cooperating processes, called “daemons”, that together supply the services required to run parallel applications as if they were on a distributed memory parallel computer (see Fig. 1).
Daemons can run on heterogeneous distributed computing nodes connected by a variety of networks. In particular, all the nodes making up
The ePVM approach
PVM requires that all the hosts making up a virtual machine are IP addressable. In fact, PVM cannot exploit a COW provided only with one publicly addressable IP front-end computing node that hides from the Internet all the other nodes of the cluster. This means that PVM running on hosts outside such a COW cannot exploit the cluster’s internal nodes. Therefore, PVM appears to be inflexible in many respects, which can be constraining when the main goal of a software infrastructure is to aggregate
The ePVM programming system
ePVM extends PVM by introducing the abstraction of the “cluster” [6]. A cluster is a set of interconnected computing nodes provided with private IP addresses and hidden behind a publicly addressable IP front-end computing node (see Fig. 2). During computation, it is managed as a normal PVM virtual machine, where a “master” pvmd daemon is started on the front-end node, whereas “slaves” are started on all the other nodes of the cluster. However, the front-end node is also provided with a specific
ePIOUS
ePIOUS is the optimized implementation of PIOUS [15], [16], [17] under ePVM. It implements a PFS on cluster grids built by using the ePVM system. However, ePIOUS cannot provide the same performance and functions characterizing commercial PFS’s, such as, for example, GPFS [31], which are available only on the specific computing platforms on which the vendor has implemented them. On the other hand, ePIOUS cannot be considered a distributed file system, such as NFS [32], which is commonly designed
Experimental results
This section reports on a number of tests designed to estimate the performance potential provided by ePIOUS. The tests mainly measure the aggregated transfer rate in a variety of configurations, and are intended primarily as an indicator of how ePIOUS actually performs. However, many other tests have been conducted on ePIOUS, even if they are not reported in this section for the sake of brevity. To this end, it is worth noting that the behavior of ePIOUS is strongly influenced by ePVM, whose
Conclusions and future work
ePIOUS is the PIOUS implementation under ePVM, a PVM extension that enables large-scale PVM applications to run on cluster grids spanning across multidomain, non-routable networks. The limited modifications to the PIOUS library and the addition of a caching service that can speed up file accesses involving different clusters have enabled ePIOUS to achieve a good performance in all the executed tests, and this essentially demonstrates that existing PVM applications can run on cluster grids
Franco Frattolillo received his Laurea degree “cum laude” in electronic engineering from the University of Napoli “Federico II”, Italy, in 1989/1990 and his Ph.D. degree in computer engineering, applied electromagnetics, and telecommunications from University of Salerno, Italy. He was a researcher with the Department of Electrical and Information Engineering of the University of Salerno from 1991 to 1998. In 1999 he joined the Faculty of Engineering of the University of Sannio, Italy, where he
References (38)
- et al.
Parallel I/O for distributed systems: Issues and implementation
Future Generation Computer Systems
(1996) - et al.
The Globus project: A status report
Future Generation Computer Systems
(1999) - et al.
Message-passing environments for metacomputing
Future Generation Computer Systems
(1999) - et al.
Metacomputing across intercontinental networks
Future Generation Computer Systems
(2001) - et al.
Towards OGSA compatibility in the H2O metacomputing framework
Future Generation Computer Systems
(2005) - W. Gentzsch, Grid computing, a vendor’s vision, in: Procs of the 2nd IEEE/ACM Int’l Symposium on Cluster Computing and...
- P. Stefán, The hungarian clustergrid project, in: Procs of the Int’l Convention MIPRO 2003, Opatija, Croatia,...
- et al.
Grid Computing
(2003)
Running large-scale applications on cluster grids
International Journal of High Performance Computing Applications
PVM: Parallel Virtual Machine. A Users’ Guide and Tutorial for Networked Parallel Computing
A PVM extension to exploit cluster grids
Recent enhancements to PVM
International Journal of Supercomputer Applications and High Performance Computing
ViPIOS: The Vienna parallel input/output system
Cited by (3)
An efficient data replication method for parallel download
2012, International Journal of Innovative Computing, Information and ControlModeling and analysis of communication networks in multicluster systems under spatio-temporal bursty traffic
2012, IEEE Transactions on Parallel and Distributed SystemsIntegrating and accessing medical data resources within the ViroLab Virtual Laboratory
2008, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Franco Frattolillo received his Laurea degree “cum laude” in electronic engineering from the University of Napoli “Federico II”, Italy, in 1989/1990 and his Ph.D. degree in computer engineering, applied electromagnetics, and telecommunications from University of Salerno, Italy. He was a researcher with the Department of Electrical and Information Engineering of the University of Salerno from 1991 to 1998. In 1999 he joined the Faculty of Engineering of the University of Sannio, Italy, where he currently teaches high performance computing systems, advanced topics in computer networks, network security, and system programming. He is also a research leader with the Research Centre on Software Technology of the University of Sannio and at the Competence Centre on Information Technologies of the Campania Region, Italy. He has written several papers in the areas of parallel programming systems. His research interests also include metacomputing systems, cluster and grid computing, and network security.