poster

Poster: The relentless computing paradigm: a data-oriented programming model for distributed-memory computation

Authors:
Lucas A. Wilson

The University of Texas at Austin, Austin, TX, USA

The University of Texas at Austin, Austin, TX, USA
View Profile

,
John A. Lockman

The University of Texas at Austin, Austin, TX, USA

The University of Texas at Austin, Austin, TX, USA
View Profile

SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis CompanionNovember 2011Pages 53–54https://doi.org/10.1145/2148600.2148628

Published:12 November 2011Publication History

SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion

Pages 53–54

ABSTRACT

The possibility of hardware failures occurring during the execution of application software continues to increase along with the scale of modern systems. Existing parallel development approaches cannot effectively recover from these errors except by means of expensive checkpoint/restart files. As a result, many CPU hours of scientific simulation are lost due to hardware failures.

Relentless Computing is a data-oriented approach to software development that allows for many classes of distributed and parallel algorithms, from no data-sharing to intense data-sharing, to be solved in both loosely- and tightly- coupled environments. Each process requires no knowledge of the current runtime status of the others to begin contributing, meaning that the execution pool can shrink and grow, as well as recover from hardware failure, automatically.

We present motivations for the development of Relentless Computing, how it works, examples of using Relentless Computing to solve several types of problems, and initial scaling results.

Supplemental Material

Available for Download

pdf

post188.pdf (21.7 MB)

Index Terms

Poster: The relentless computing paradigm: a data-oriented programming model for distributed-memory computation
1. Software and its engineering
  1. Software creation and management
    1. Software development techniques

Recommendations

A fully informed model-based checkpointing protocol for preventing useless checkpoints

Checkpointing and rollback recovery are widely used techniques for handling failures in distributed systems. When processes involved in a distributed computation are allowed to take checkpoints independently without any coordination with each other, ...
Read More
Towards Resilient Chapel: Design and implementation of a transparent resilience mechanism for Chapel
EASC '15: Proceedings of the 3rd International Conference on Exascale Applications and Software

The exponential increase of components in modern High Performance Computing (HPC) systems poses a challenge on their resilience: predictions of time between failures on ExaScale systems range from hours to minutes, yet the prevalent HPC programming ...
Read More
Multilevel Diskless Checkpointing

Extreme scale systems available before the end of this decade are expected to have 100 million to 1 billion CPU cores. The probability that a failure occurs during an application execution is expected to be much higher than today's systems. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion
November 2011
166 pages
ISBN:9781450310307
DOI:10.1145/2148600
Conference Chair:
Scott Lathrop
University of Chicago
,
Program Chairs:
Jim Costa
Sandia National Laboratories
,
William Kramer
National Center for Supercomputing Applications
Copyright © 2011 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 November 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
fault tolerance
runtime systems
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate1,516of6,373submissions,24%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 91
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Poster: The relentless computing paradigm: a data-oriented programming model for distributed-memory computation

SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion

ABSTRACT

Supplemental Material

Available for Download

Cited By

Index Terms

Recommendations

A fully informed model-based checkpointing protocol for preventing useless checkpoints

Towards Resilient Chapel: Design and implementation of a transparent resilience mechanism for Chapel

Multilevel Diskless Checkpointing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Poster: The relentless computing paradigm: a data-oriented programming model for distributed-memory computation

SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion

ABSTRACT

Supplemental Material

Available for Download

Cited By

Index Terms

Recommendations

A fully informed model-based checkpointing protocol for preventing useless checkpoints

Towards Resilient Chapel: Design and implementation of a transparent resilience mechanism for Chapel

Multilevel Diskless Checkpointing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media