research-article

Falkon: a Fast and Light-weight tasK executiON framework

Authors:
Ioan Raicu

University of Chicago, IL

University of Chicago, IL
View Profile

,
Yong Zhao

University of Chicago, IL

University of Chicago, IL
View Profile

,
Catalin Dumitrescu

University of Chicago, IL

University of Chicago, IL
View Profile

,
Ian Foster

University of Chicago and Argonne National Laboratory, Argonne, IL

University of Chicago and Argonne National Laboratory, Argonne, IL
View Profile

,
Mike Wilde

University of Chicago and Argonne National Laboratory, Argonne, IL

University of Chicago and Argonne National Laboratory, Argonne, IL
View Profile

SC '07: Proceedings of the 2007 ACM/IEEE conference on SupercomputingNovember 2007Article No.: 43Pages 1–12https://doi.org/10.1145/1362622.1362680

Published:10 November 2007Publication History

SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing

Pages 1–12

ABSTRACT

To enable the rapid execution of many tasks on compute clusters, we have developed Falkon, a Fast and Light-weight tasK executiON framework. Falkon integrates (1) multi-level scheduling to separate resource acquisition (via, e.g., requests to batch schedulers) from task dispatch, and (2) a streamlined dispatcher. Falkon's integration of multi-level scheduling and streamlined dispatchers delivers performance not provided by any other system. We describe Falkon architecture and implementation, and present performance results for both microbenchmarks and applications. Microbenchmarks show that Falkon throughput (487 tasks/sec) and scalability (to 54,000 executors and 2,000,000 tasks processed in just 112 minutes) are one to two orders of magnitude better than other systems used in production Grids. Large-scale astronomy and medical applications executed under Falkon by the Swift parallel programming system achieve up to 90% reduction in end-to-end run time, relative to versions that execute tasks via separate scheduler submissions.

References

D. Thain, T. Tannenbaum, and M. Livny, "Distributed Computing in Practice: The Condor Experience" Concurrency and Computation: Practice and Experience, Vol. 17, No. 2--4, pages 323--356, February-April, 2005. Google ScholarDigital Library
Swift Workflow System: www.ci.uchicago.edu/swift, 2007.Google Scholar
Y. Zhao, M. Hategan, B. Clifford, I. Foster, G. von Laszewski, I. Raicu, T. Stef-Praun, M. Wilde. "Swift: Fast, Reliable, Loosely Coupled Parallel Computation", IEEE Workshop on Scientific Workflows 2007.Google ScholarCross Ref
I. Foster, J. Voeckler, M. Wilde, Y. Zhao. "Chimera: A Virtual Data System for Representing, Querying, and Automating Data Derivation", SSDBM 2002. Google ScholarDigital Library
J.-P Goux, S. Kulkarni, J. T. Linderoth, and M. E. Yoder, "An Enabling Framework for Master-Worker Applications on the Computational Grid," IEEE International Symposium on High Performance Distributed Computing, 2000. Google ScholarDigital Library
I. Foster, C. Kesselman, S. Tuecke, "The Anatomy of the Grid: Enabling Scalable Virtual Organizations", International Journal of Supercomputer Applications, 15 (3). 200--222. 2001. Google ScholarDigital Library
G. Banga, P. Druschel, J. C. Mogul. "Resource Containers: A New Facility for Resource Management in Server Systems." Symposium on Operating Systems Design and Implementation, 1999. Google ScholarDigital Library
J. A. Stankovic, K. Ramamritham, D. Niehaus, M. Humphrey, G. Wallace, "The Spring System: Integrated Support for Complex Real-Time Systems", Real-Time Systems, May 1999, Vol 16, No. 2/3, pp. 97--125. Google ScholarDigital Library
J. Frey, T. Tannenbaum, I. Foster, M. Frey, S. Tuecke, "Condor-G: A Computation Management Agent for Multi-Institutional Grids," Cluster Computing, 2002. Google ScholarDigital Library
G. Singh, C. Kesselman, E. Deelman, "Optimizing Grid-Based Workflow Execution." Journal of Grid Computing, Volume 3(3--4), December 2005, pp. 201--219.Google ScholarCross Ref
E. Walker, J. P. Gardner, V. Litvin, E. L. Turner, "Creating Personal Adaptive Clusters for Managing Scientific Tasks in a Distributed Computing Environment", Workshop on Challenges of Large Applications in Distributed Environments, 2006.Google Scholar
G. Singh, C. Kesselman E. Deelman. "Performance Impact of Resource Provisioning on Workflows", USC ISI Technical Report 2006.Google Scholar
G. Mehta, C. Kesselman, E. Deelman. "Dynamic Deployment of VO-specific Schedulers on Managed Resources," USC ISI Technical Report, 2006.Google Scholar
D. Thain, T. Tannenbaum, and M. Livny, "Condor and the Grid", Grid Computing: Making The Global Infrastructure a Reality, John Wiley, 2003. ISBN: 0-470-85319-0.Google Scholar
E. Robinson, D. J. DeWitt. "Turning Cluster Management into Data Management: A System Overview", Conference on Innovative Data Systems Research, 2007.Google Scholar
B. Bode, D. M. Halstead, R. Kendall, Z. Lei, W. Hall, D. Jackson. "The Portable Batch Scheduler and the Maui Scheduler on Linux Clusters", Usenix, 4th Annual Linux Showcase & Conference, 2000. Google ScholarDigital Library
S. Zhou. "LSF: Load sharing in large-scale heterogeneous distributed systems," Workshop on Cluster Computing, 1992.Google Scholar
W. Gentzsch, "Sun Grid Engine: Towards Creating a Compute Power Grid," 1st International Symposium on Cluster Computing and the Grid, 2001. Google ScholarDigital Library
D. P. Anderson. "BOINC: A System for Public-Resource Computing and Storage." 5th IEEE/ACM International Workshop on Grid Computing, 2004. Google ScholarDigital Library
D. P. Anderson, E. Korpela, R. Walton. "High-Performance Task Distribution for Volunteer Computing." IEEE Conference on e-Science and Grid Technologies, 2005. Google ScholarDigital Library
The Functional Magnetic Resonance Imaging Data Center, http://www.fmridc.org/, 2007.Google Scholar
G. B. Berriman, et al., "Montage: a Grid Enabled Engine for Delivering Custom Science-Grade Image Mosaics on Demand." SPIE Conference on Astronomical Telescopes and Instrumentation. 2004.Google Scholar
K. Appleby, S. Fakhouri, L. Fong, G. Goldszmidt, M. Kalantar, S. Krishnakumar, D. Pazel, J. Pershing, and B. Rochwerger, "Oceano - SLA Based Management of a Computing Utility," 7th IFIP/IEEE International Symposium on Integrated Network Management, 2001.Google Scholar
L. Ramakrishnan, L. Grit, A. Iamnitchi, D. Irwin, A. Yumerefendi, J. Chase. "Toward a Doctrine of Containment: Grid Hosting with Adaptive Resource Control," IEEE/ACM International Conference for High Performance Computing, Networking, Storage, and Analysis (SC06), 2006. Google ScholarDigital Library
J. Bresnahan. "An Architecture for Dynamic Allocation of Compute Cluster Bandwidth", MS Thesis, Department of Computer Science, University of Chicago, December 2006.Google Scholar
Catlett, C. et al., "TeraGrid: Analysis of Organization, System Architecture, and Middleware Enabling New Types of Applications," HPC 2006.Google Scholar
M. Feller, I. Foster, and S. Martin. "GT4 GRAM: A Functionality and Performance Study", TeraGrid Conference 2007.Google Scholar
I. Foster, "Globus Toolkit Version 4: Software for Service-Oriented Systems," Conference on Network and Parallel Computing, 2005. Google ScholarDigital Library
The Globus Security Team. "Globus Toolkit Version 4 Grid Security Infrastructure: A Standards Perspective," Technical Report, Argonne National Laboratory, MCS, 2005.Google Scholar
I. Raicu, I. Foster, A. Szalay. "Harnessing Grid Resources to Enable the Dynamic Analysis of Large Astronomy Datasets", IEEE/ACM International Conference for High Performance Computing, Networking, Storage, and Analysis (SC06), 2006. Google ScholarDigital Library
I. Raicu, I. Foster, A. Szalay, G. Turcu. "AstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis", TeraGrid Conference 2006.Google Scholar
J. C. Jacob, et al. "The Montage Architecture for Grid-Enabled Science Processing of Large, Distributed Datasets." Earth Science Technology Conference 2004.Google Scholar
E. Deelman, et al. "Pegasus: a Framework for Mapping Complex Scientific Workflows onto Distributed Systems", Scientific Programming Journal, Vol 13(3), 2005, pp. 219--237. Google ScholarDigital Library
T. Tannenbaum. "Condor RoadMap", Condor Week 2007.Google Scholar
K. Ranganathan, I. Foster, "Simulation Studies of Computation and Data Scheduling Algorithms for Data Grids", Journal of Grid Computing, V1(1) 2003.Google Scholar

Index Terms

Falkon: a Fast and Light-weight tasK executiON framework
1. Software and its engineering
  1. Software creation and management
    1. Designing software
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems

Recommendations

A Data-Intensive Workflow Scheduling Algorithm for Grid Computing
CHINAGRID '09: Proceedings of the 2009 Fourth ChinaGrid Annual Conference

The data-intensive workflow in scientific and enterprise grids has gained popularity in recent times. Data-intensive workflow needs to access, process and transfer large datasets that may each be replicated on different data hosts. Because of the large ...
Read More
Specification and runtime workflow support in the ASKALON Grid environment
Dynamic Computational Workflows: Discovery, Optimization and Scheduling

We describe techniques to support the runtime execution of scientific workflows in the ASKALON Grid environment. We present a formal model and three middleware services that support in combination the effective execution in heterogeneous and dynamic ...
Read More
Easy distributed grid architecture for research: easy access to supercomputing

Current distributed systems present many challenges for students who may not be very skilled at programming parallel applications for use on such systems. Grid computing is a cost effective means of providing supercomputing computation for both ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing
November 2007
723 pages
ISBN:9781595937643
DOI:10.1145/1362622
General Chair:
Becky Verastegui
Oak Ridge National Laboratory
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 November 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
dynamic resource provisioning
grid computing
parallel programming
scheduling
Qualifiers
- research-article
Conference

Acceptance Rates
SC '07 Paper Acceptance Rate54of268submissions,20%Overall Acceptance Rate1,516of6,373submissions,24%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 203
  Total Citations
  View Citations
- 541
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Falkon: a Fast and Light-weight tasK executiON framework

SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Data-Intensive Workflow Scheduling Algorithm for Grid Computing

Specification and runtime workflow support in the ASKALON Grid environment

Easy distributed grid architecture for research: easy access to supercomputing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Falkon: a Fast and Light-weight tasK executiON framework

SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Data-Intensive Workflow Scheduling Algorithm for Grid Computing

Specification and runtime workflow support in the ASKALON Grid environment

Easy distributed grid architecture for research: easy access to supercomputing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media