skip to main content
10.1145/1341811.1341857acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesmardi-grasConference Proceedingsconference-collections
abstract

A common application platform for the SURAgrid (CAP)

Published: 29 January 2008 Publication History

Abstract

From our experience in developing and deploying research applications on a regional grid infrastructure (SURAgrid, www.sura.org/suragrid), we observe that there are significant entry-barriers to "grid-enable" applications, and therefore, to realize the full benefit of a grid environment. In order to increase both the number and the variety of applications that can run on the SURAgrid, we propose to develop an integrated environment that can directly support MPI-based* applications on a subset of SURAgrid resources. Our goal is to emulate the environment of a single Beowulf-style cluster on SURAgrid for easy access by MPI-based distributed-memory applications that are readily available in almost all fields of science and engineering. Although a single application can engineer a similar environment and performance with the various grid services, the creation of a persistent environment such as CAP can significantly reduce the complexity for deploying an application while extending the benefits to a much larger set of applications.
This paper describes the architecture and initial implementation of the Common Application Platform (CAP) on SURAgrid. CAP provides an integrated platform for scheduling and execution of sequential and parallel jobs on the CAP-enabled resources in a user-friendly environment. In particular, we aim to provide the following capabilities:
Meta-scheduling: Co-scheduling capability is needed for applications with large-scale memory requirements that can be met only by simultaneous use of resources at multiple sites. Automatic resource selections across multiple sites will enhance the overall utilization of the grid. The scheduling and job management capabilities in CAP are provided by the GridWay metascheduler.
Orchestration: For parallel applications on CAP, the orchestration capabilities are provided by MPICH-G2, a grid-enabled version of the popular MPI implementation -- MPICH. Both GridWay and MPICH-G2 depend on the Globus Toolkit to provide the basic grid functionalities.
Fast data transfer: High-performance networks, such as the National Lambda Rail that connects many of SURAgrid resources can be exploited by performing striped (parallel) file transfers, and will be explored in this project for enhancing inter-cluster communications.
There are significant issues and challenges in several areas affecting practical deployment of CAP: load balancing, routing, resource heterogeneity, and network performance. In order to explore these issues, a prototype involving the SURAgrid resources at Old Dominion University and University of Alabama at Birmingham will be developed. The prototype leverages existing software solutions in a coordinated infrastructure to minimize development efforts, utilize SURAgrid infrastructure to provide simplified job control through the SURAgrid portal, and enable inter-institutional resource allocation through the SURAgrid authentication and authorization mechanism.
This paper will disseminate lessons learned through the construction of this prototype as well as share experiences and perspectives towards next-step development. The goal of CAP is to provide immediately useful benefits to the growing SURAgrid application community in a way that also serves as a model for effective integration of existing technologies for the grid use and development community at large.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
MG '08: Proceedings of the 15th ACM Mardi Gras conference: From lightweight mash-ups to lambda grids: Understanding the spectrum of distributed computing requirements, applications, tools, infrastructures, interoperability, and the incremental adoption of key capabilities
January 2008
178 pages
ISBN:9781595938350
DOI:10.1145/1341811
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • National e-Science Institute (Edinburgh, UK)
  • Louisiana State University (USA)

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 January 2008

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Abstract

Conference

Mardi Gras'08
Sponsor:
Mardi Gras'08: 15th Mardi Gras Conference on Distributed Applications
January 29 - February 3, 2008
Louisiana, Baton Rouge, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 106
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media