research-article

AME: an anyscale many-task computing engine

Authors:
Zhao Zhang

University of Chicago, Chicago, IL, USA

University of Chicago, Chicago, IL, USA
View Profile

,
Daniel S. Katz

University of Chicago & Argonne National Laboratory, Chicago, IL, USA

University of Chicago & Argonne National Laboratory, Chicago, IL, USA
View Profile

,
Matei Ripeanu

University of British Columbia, Vancouver, BC, Canada

University of British Columbia, Vancouver, BC, Canada
View Profile

,
Michael Wilde

University of Chicago & Argonne National Laboratory, Chicago, IL, USA

University of Chicago & Argonne National Laboratory, Chicago, IL, USA
View Profile

,
Ian T. Foster

University of Chicago & Argonne National Laboratory, Chicago, IL, USA

University of Chicago & Argonne National Laboratory, Chicago, IL, USA
View Profile

WORKS '11: Proceedings of the 6th workshop on Workflows in support of large-scale scienceNovember 2011Pages 137–146https://doi.org/10.1145/2110497.2110513

Published:14 November 2011Publication History

WORKS '11: Proceedings of the 6th workshop on Workflows in support of large-scale science

Pages 137–146

ABSTRACT

Many-Task Computing (MTC) is a new application category that encompasses increasingly popular applications in biology, economics, and statistics. The high inter-task parallelism and data-intensive processing capabilities of these applications pose new challenges to existing supercomputer hardware-software stacks. These challenges include resource provisioning; task dispatching, dependency resolution, and load balancing; data management; and resilience.

This paper examines the characteristics of MTC applications which create these challenges, and identifies related gaps in the middleware that supports these applications on extreme-scale systems. Based on this analysis, we propose AME, an Anyscale MTC Engine, which addresses the scalability aspects of these gaps. We describe the AME framework and present performance results for both synthetic benchmarks and real applications. Our results show that AME's dispatching performance linearly scales up to 14,120 tasks/second on 16,384 cores with high efficiency. The overhead of the intermediate data management scheme does not increase significantly up to 16,384 cores. AME eliminates 73% of the file transfer between compute nodes and the global filesystem for the Montage astronomy application running on 2,048 cores. Our results indicate that AME scales well on today's petascale machines, and is a strong candidate for exascale machines.

References

S. Al-Kiswany, A. Gharaibeh, and M. Ripeanu. The case for a versatile storage system. SIGOPS Oper. Syst. Rev., 44:10--14, March 2010. Google ScholarDigital Library
B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel, and S. Tuecke. Data management and transfer in high-performance computational grid environments. Parallel Comput., 28:749--771, May 2002. Google ScholarDigital Library
D. Borthakur. HDFS architecture. http://hadoop.apache.org/hdfs/docs/current/hdfs\_design.pdf.Google Scholar
P. H. Carns, W. B. Ligon, III, R. B. Ross, and R. Thakur. PVFS: a parallel file system for linux clusters. In Proceedings of the 4th annual Linux Showcase & Conference - Volume 4, pages 28--28, Berkeley, CA, USA, 2000. USENIX Association. Google ScholarDigital Library
S. Donovan, G. Huizenga, A. J. Hutton, A. J. Hutton, C. C. Ross, C. C. Ross, L. Symposium, L. Symposium, L. Symposium, M. K. Petersen, W. O. Source, and P. Schwan. Lustre: Building a file system for 1,000-node clusters, 2003.Google Scholar
J. Frey, T. Tannenbaum, M. Livny, I. Foster, and S. Tuecke. Condor-G: A computation management agent for multi-institutional grids. Cluster Computing, 5:237--246, 2002. Google ScholarDigital Library
K. Iskra, J. W. Romein, K. Yoshii, and P. Beckman. ZOID: I/O-forwarding infrastructure for petascale architectures. In Proc. of 13th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, PPoPP'08, pages 153--162, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
D. S. Katz, J. C. Jacob, G. B. Berriman, J. Good, A. C. Laity, E. Deelman, C. Kesselman, and G. Singh. A comparison of two methods for building astronomical image mosaics on a grid. In Proc. 2005 Intl. Conf. on Parallel Proc. Workshops, pages 85--94, 2005. Google ScholarDigital Library
I. Raicu, Y. Zhao, C. Dumitrescu, I. Foster, and M. Wilde. Falkon: a Fast and Light-weight tasK executiON framework. In Proc. IEEE/ACM Supercomputing 2007, pages 1--12, 2007. Google ScholarDigital Library
F. Schmuck and R. Haskin. GPFS: A shared-disk file system for large computing clusters. In In Proceedings of the 2002 Conference on File and Storage Technologies FAST, pages 231--244, 2002. Google ScholarDigital Library
I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the ACM SIGCOMM '01 Conference, August 2001. Google ScholarDigital Library
D. Thain, C. Moretti, and J. Hemmes. Chirp: a practical global filesystem for cluster and grid computing. Journal of Grid Computing, 7(1):51--72, 2009.Google ScholarCross Ref
R. Thakur, W. Gropp, and E. Lusk. Data sieving and collective I/O in ROMIO. Symp. on Frontiers of Massively Par. Proc., page 182, 1999. Google ScholarDigital Library
M. Wilde, I. Foster, K. Iskra, P. Beckman, Z. Zhang, A. Espinosa, M. Hategan, B. Clifford, and I. Raicu. Parallel scripting for applications at the petascale and beyond. Computer, 42:50--60, 2009. Google ScholarDigital Library
M. Wilde, M. Hategan, J. M. Wozniak, B. Clifford, D. S. Katz, and I. Foster. Swift: A language for distributed parallel scripting. Parallel Computing, pages 633--652, September 2011. Google ScholarDigital Library
J. M. Wozniak and M. Wilde. Case studies in storage access by loosely coupled petascale applications. In Proc. 4th Annual Workshop on Petascale Data Storage, pages 16--20, 2009. Google ScholarDigital Library
Z. Zhang, A. Espinosa, K. Iskra, I. Raicu, I. Foster, and M. Wilde. Design and evaluation of a collective I/O model for loosely coupled petascale programming. In Proceedings of Many-Task Computing on Grids and Supercomputers, 2008, pages 1--10, 2008.Google ScholarCross Ref

Index Terms

AME: an anyscale many-task computing engine
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments

Recommendations

SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
HPC '13: Proceedings of the High Performance Computing Symposium

Exascale computers (expected to be composed of millions of nodes and billions of threads of execution) will enable the unraveling of significant scientific mysteries. Many-task computing is a distributed paradigm, which can potentially address three of ...
Read More
Data driven workflow planning in cluster management systems
HPDC '07: Proceedings of the 16th international symposium on High performance distributed computing

Traditional scientific computing has been associated with harnessing computation cycles within and across clusters of machines. In recent years, scientific applications have become increasingly data-intensive. This is especially true in the fields of ...
Read More
Middleware support for many-task computing

Many-task computing aims to bridge the gap between two computing paradigms, high throughput computing and high performance computing. Many-task computing denotes high-performance computations comprising multiple distinct activities, coupled via file ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WORKS '11: Proceedings of the 6th workshop on Workflows in support of large-scale science
November 2011
154 pages
ISBN:9781450311007
DOI:10.1145/2110497
General Chairs:
Ian Taylor
Cardiff University, UK
,
Johan Montagnat
CNRS, France
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 November 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data management
load balancing
many-task computing
scheduling
supercomputer systems
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate30of54submissions,56%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 96
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

AME: an anyscale many-task computing engine

WORKS '11: Proceedings of the 6th workshop on Workflows in support of large-scale science

ABSTRACT

References

Cited By

Index Terms

Recommendations

SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

Data driven workflow planning in cluster management systems

Middleware support for many-task computing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

AME: an anyscale many-task computing engine

WORKS '11: Proceedings of the 6th workshop on Workflows in support of large-scale science

ABSTRACT

References

Cited By

Index Terms

Recommendations

SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

Data driven workflow planning in cluster management systems

Middleware support for many-task computing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media