Virtual private grid: a command shell for utilizing hundreds of machines efficiently
Introduction
Today, computer users commonly have an access to hundreds of machines across multiple subnets and geographically distributed places. In such environments, they can potentially achieve high performance for certain types of parallel applications (e.g., parameter sweep applications). Job submission tools, such as Condor [6] and PBS [11], enable a user to submit many jobs to clusters/supercomputers in a local network.
However, when available machines are distributed over multiple subnets, it becomes difficult and cumbersome to utilize them with these tools. We explain this problem below.
Machines distributed over multiple subnets are usually managed by different administrators, who impose various restrictions on their use for the sake of security and ease of administration. Examples of these restrictions are:
- (1)
Firewall. A firewall protects local machines from malicious attacks by restricting accesses from external machines. For example, IP filtering—a typical firewall configuration—restricts connections from/to machines with a particular IP address.
- (2)
Private IP. A private IP is an IP address that is visible only within a subnet. Machines outside a subnet cannot establish direct connections to machines that have only a private IP address. In addition, since a private IP address is visible only within a subnet, machines in different subnets may have the same private IP address without confusion. This breaks the uniqueness of IP addresses.
- (3)
DHCP client. DHCP is a mechanism that enables machines to extract their network configuration from a DHCP server. An IP address of a DHCP client changes dynamically whenever it extracts a new configuration (typically when it reboots).
Because of the above restrictions, machines cannot necessarily establish a direct connection to every other machine or may even not have unique addresses. Hence, a user may not submit a job easily across multiple subnets with existing tools; she/he has to work around the restrictions with ad hoc methods, which are found on a case-by-case basis with human intervention. For example, when submitting a job to machines behind a firewall, a user usually has to first log onto a gateway machine and then to the target. Accessing a DHCP client requires some database that stores its IP address. A situation becomes more complicated if those addresses are private IP addresses.
In addition, we must consider that the number of available machines is large (e.g., >100) and that the topology of the network usually changes dynamically (e.g., machines may crash or the network may become disconnected). These also make it difficult for a user to manually work around the administrative restrictions. For instance, a user has to know which machines are available at present, and she/he may need to find a new job forwarding route whenever a machine crashes.
To summarize, the administrative restrictions significantly increase the user’s cost for utilizing remote machines, and consequently, obstruct smooth utilization of computational resources. A user would like to have a solution in which all machines can be reached directly and transparently, with names fixed over time.
To this end, we have developed virtual private grid (VPG), a command shell that can utilize hundreds of machines efficiently. It enables a user to easily submit jobs to remote machines by providing the following mechanisms:
- (1)
Nickname mechanism. It gives each machine a unique name that does not depend on a DNS name or a fixed IP address.
- (2)
Communication mechanism. A user can directly access remote machines that would be reachable by using existing tools (e.g., rsh, SSH [12]) several times from her/his local machine. A user can access these machines merely by specifying them with their nickname.
The above mechanisms can be implemented without modifying administrative policies and tolerate dynamic changes of the topology of the network, though some manual configurations are required.
The remainder of this paper is organized as follows. Section 2 shows the difficulty of working around administrative restrictions through a practical motivating scenario and discusses issues in the implementation of the communication mechanism. Section 3 describes the user interface of VPG and the manual configurations that it requires. Section 4 presents the details of the communication mechanism. Section 5 shows the experimental results. Section 6 mentions related work. The final section summarizes the paper and states future work.
Section snippets
A practical scenario
In this section, we show the difficulty of working around administrative restrictions through a practical motivating scenario. Consider the network shown in Fig. 1. Harp, tuba, … in Fig. 1 represent host names. This network consists of three subnets including a DHCP client and a machine that has only a private IP. Firewalls restrict connections between different subnets; SSH to the gateway machines (harp, cscl0, and ise0) is the only allowed in-bound connection. Such a configuration is fairly
Overview of VPG
The following summarizes the functions provided by VPG:
- •
It gives each machine a (per-user) unique name that does not depend on a DNS name or a fixed IP address (nicknaming).
- •
It provides a job submission to any nicknamed machine.
- •
It provides a redirection from/to a file on any nicknamed machine.
- •
It provides a pipe between commands executed on any nicknamed machine.
We make the above functions accessible by a combination of the simple shell syntax and existing commands (see Fig. 2). For example, a
Communication mechanism
VPG implements the communication mechanism by constructing a spanning tree and forwarding messages via a path in the tree. In this section, we describe details of this communication mechanism. First, we formalize administrative restrictions. Next, we mention the self-stabilizing spanning tree algorithm [2], [3]. With this algorithm, VPG daemons select necessary connections to make all the machines available. Then, we present the algorithm that calculates routes to participating machines for job
Experimental environment
We ran VPG in the network shown in Fig. 6. The network consists of three subnets, and machines are equipped with several operating systems (Solaris, Linux, and IRIX) and CPUs (SPARC, x86, PowerPC, and MIPS). We ran VPG daemons on about 100 nodes. In this experiment, daemons constructed a spanning tree that had a diameter of 5.
Comparison to other job submission tools
We compared VPG with three other job submission tools: rsh, SSH, and globus-job-run (globus-job-run is a remote job submission tool provided by Globus). rsh used Rhost
Resource management
Many remote job submission tools on clusters or on Grid environments have been developed (e.g., Globus [5], Condor [6], Nimrod [4]).
To the author’s knowledge, none of them are focusing on integrating many ‘desktop’ resources that are typically configured without DNS names and with DHCP/private IP addresses. Such machines constitute a large fraction of compute resources. The original Globus is blocked by typical firewall configurations and cannot submit jobs from outside firewall to the inside.
Summary and future work
In this paper, we have described VPG, a shell that can easily utilize hundreds of machines distributed over multiple subnets. The shell gives each machine a unique nickname that does not depend on a DNS name or a fixed IP address. In addition, it provides a job submission, redirection, and pipe on any nicknamed machine. VPG implements these functions by constructing a self-stabilizing spanning tree among machines and forwarding messages via a path in the tree. The latest implementation of VPG
Kenji Kaneda is a Master Course student at the Department of Computer Science, University of Tokyo. He received the BE degree in Information Science from University of Tokyo in 2001. His current major research interests are in the area of parallel and distributed systems.
References (13)
- C. Scott, P. Wolfe, M. Erwin, Virtual Private Networks, 2nd ed., O’Reilly,...
- Y. Afex, S. Kutten, M. Yung, Memory efficient self-stabilizing protocols for general network, in: Proceedings of the...
- S. Aggarwal, S. Kutten, Time optimal self-stabilizing spanning tree algorithms, in: Proceedings of the 13th Conferences...
- R. Buyya, D. Abramson, J. Giddy, Nimrod/G: an architecture for a resource management and scheduling system in a global...
- K. Czajkowski, I. Foster, N. Karonis, C. Kesselman, S. Martin, W. Smith, S. Tuecke, A resource management architecture...
- J. Frey, T. Tannenbaum, I. Foster, M. Livny, S. Tuecke, Condor-G: a computation management agent for...
Cited by (10)
Privacy Preservation Based on Separation Sensitive Attributes for Cloud Computing
2021, Research Anthology on Privatizing and Securing DataPrivacy Preservation Based on Separation Sensitive Attributes for Cloud Computing
2021, Research Anthology on Architectures, Frameworks, and Integration Strategies for Distributed and Cloud ComputingPrivacy preservation based on separation sensitive attributes for cloud computing
2019, International Journal of Information Security and PrivacyNew framework for dynamic policy management in grid environments
2011, Communications in Computer and Information ScienceA login shell for computing grid
2008, Proceedings - 4th IEEE International Conference on eScience, eScience 2008A large-scale Web data collection as a natural language processing infrastructure
2008, Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008
Kenji Kaneda is a Master Course student at the Department of Computer Science, University of Tokyo. He received the BE degree in Information Science from University of Tokyo in 2001. His current major research interests are in the area of parallel and distributed systems.
Kenjiro Taura is an Associate Professor at the Department of Information and Communication Engineering, University of Tokyo. He was born in 1969, and received his BS, MS, and DSc degrees from University of Tokyo in 1992, 1994, and 1997, respectively. His major research interests include parallel/distributed computing and programming languages. He is a Member of ACM and IEEE.
Akinori Yonezawa is a Professor at the Department of Computer Science, University of Tokyo, and an ACM Fellow. He received his PhD in Computer Science from MIT in 1977. His current major research interests are in the areas of concurrent/parallel computation models, programming languages, object-oriented computing, and distributed computing. He is the author and editor of several books including: “Object-oriented concurrent programming” (MIT Press, 1987); “ABCL: an object-oriented concurrent system” (MIT Press, 1990). He served as an Associate Editor of ACM Transaction of Programming Languages and Systems (TOPLAS).