skip to main content
10.1145/2148600.2148646acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
poster

Poster: FOX: a fault-oblivious extreme scale execution environment

Published: 12 November 2011 Publication History

Abstract

Exascale computing systems will provide a thousand-fold increase in parallelism and a proportional increase in failure rate relative to today's machines. Systems software for exascale machines must provide the infrastructure to support existing applications while simultaneously enabling efficient execution of new programming models that naturally express dynamic, adaptive, irregular computation; coupled simulations; and massive data analysis in a highly unreliable hardware environment with billions of threads of execution. Further, these systems must be designed with failure in mind. FOX is a new system for the exascale that will support distributed data objects as first class objects in the operating system itself. This memory-based data store will be named and accessed as part of the file system name space of the application. We can build many types of objects with this data store, including data-driven work queues, which will in turn support applications with inherent resilience.

Supplementary Material

PDF File (post245.pdf)

References

[1]
N. Ali, S. Krishnamoorthy, N. Govind, K. Kowalski, and P. Sadayappan. Application-specific fault tolerance via data access characterization. In Euro-Par 2011 Parallel Processing, pages 340--352, 2011.
[2]
N. Ali, S. Krishnamoorthy, N. Govind, and B. Palmer. A redundant communication approach to scalable fault tolerance in pgas programming models. In Parallel, Distributed and Network-Based Processing (PDP), 2011 19th Euromicro International Conference on, pages 24--31, Feb 2011.
[3]
N. Ali, S. Krishnamoorthy, M. Halappanavar, and J. Daily. Tolerating correlated failures for generalized cartesian distributions via bipartite matching. In Proceedings of the 8th ACM International Conference on Computing Frontiers, CF'11, pages 36:1--36:10, 2011.
[4]
N. Evans, E. Soriano, F. J. Ballesteros, J. Mckie, G. Guardiola, and C. Forsyth. High performance cloud computing is nix. In Bell Labs Technical Conference, October 2011.
[5]
D. Schatzberg, J. Appavoo, O. Krieger, and E. V. Hensbergen. Scalable elastic systems architecture. RESoLVE 2011 Proc, pages 1--2, 2011.

Index Terms

  1. Poster: FOX: a fault-oblivious extreme scale execution environment

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion
    November 2011
    166 pages
    ISBN:9781450310307
    DOI:10.1145/2148600

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 November 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tag

    1. fault-oblivious

    Qualifiers

    • Poster

    Conference

    SC '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 83
      Total Downloads
    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media