skip to main content
article

MOLAR: adaptive runtime support for high-end computing operating and runtime systems

Published: 01 April 2006 Publication History

Abstract

MOLAR is a multi-institutional research effort that concentrates on adaptive, reliable, and efficient operating and runtime system (OS/R) solutions for ultra-scale high-end scientific computing on the next generation of supercomputers. This research addresses the challenges outlined in FAST-OS (forum to address scalable technology for runtime and operating systems) and HECRTF (high-end computing revitalization task force) activities by exploring the use of advanced monitoring and adaptation to improve application performance and predictability of system interruptions, and by advancing computer reliability, availability and serviceability (RAS) management systems to work cooperatively with the OS/R to identify and preemptively resolve system issues. This paper describes recent research of the MOLAR team in advancing RAS for high-end computing OS/Rs.

References

[1]
G. Bosilca, Z. Chen, J. J. Dongarra, and J. Langou. Recovery patterns for iterative methods in a parallel unstable environment. Submitted to SIAM Journal on Scientific Computing, 2005.]]
[2]
Z. Chen, G. E. Fagg, E. Gabriel, J. Langou, T. Angskun, G. Bosilca, and J. J. Dongarra. Building fault survivable MPI programs with FT-MPI using diskless checkpointing. Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPoPP), 2005.]]
[3]
D. E. Culler, R. M. Karp, D. A. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eicken. LogP: Towards a realistic model of parallel computation. Proceedings of 4th Symposion on Principles and Practice of Parallel Programming (PPoPP), pages 1--12, 1993.]]
[4]
X. Defago, A. Schiper, and P. Urban. Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Computing Surveys, 36(4):372--421, 2004.]]
[5]
C. Engelmann and G. A. Geist. A diskless checkpointing algorithm for super-scale architectures applied to the fast fourier transform. Proceedings of Challenges of Large Applications in Distributed Environments (CLADE) Workshop, pages 47--52, 2003.]]
[6]
C. Engelmann and G. A. Geist. A lightweight kernel for the Harness metacomputing framework. Proceedings of 14th Heterogeneous Computing Workshop (HCW), pages 120--126, 2005.]]
[7]
C. Engelmann and G. A. Geist. Super-scalable algorithms for computing on 100,000 processors. Lecture Notes in Computer Science: Proceedings of International Conference on Computational Science (ICCS), 3514:313--320, 2005.]]
[8]
C. Engelmann and G. A. Geist. RMIX: A dynamic, heterogeneous, reconfigurable communication framework. Lecture Notes in Computer Science: Proceedings of International Conference on Computational Science (ICCS), 2006.]]
[9]
C. Engelmann and S. L. Scott. Concepts for high availability in scientific high-end computing. Proceedings of High Availability and Performance Computing Workshop (HAPCW), 2005.]]
[10]
C. Engelmann, S. L. Scott, and G. A. Geist. Distributed peer-to-peer control in Harness. Lecture Notes in Computer Science: Proceedings of International Conference on Computational Science (ICCS), 2330:720--728, 2002.]]
[11]
C. Engelmann, S. L. Scott, and G. A. Geist. High availability through distributed control. Proceedings of High Availability and Performance Computing Workshop (HAPCW), 2004.]]
[12]
C. Engelmann, S. L. Scott, C. Leangsuksun, and X. He. Active/active replication for highly available hpc system services. Proceedings of International Symposium on Frontiers in Availability, Reliability and Security (FARES), 2006.]]
[13]
G. E. Fagg, A. Bukovsky, and J. J. Dongarra. Harness and fault tolerant MPI. Parallel Computing, 27(11):1479--1495, 2001.]]
[14]
Forum to Address Scalable Technology for Runtime and Operating Systems. FAST-OS at http://www.fastos.org.]]
[15]
Forum to Address Scalable Technology for Runtime and Operating Systems (FAST-OS). MOLAR project at http://www.fastos.org/molar.]]
[16]
Fault Tolerant MPI (FT-MPI) Project at University of Tennessee, Knoxville, TN, USA. At http://icl.cs.utk.edu/ftmpi.]]
[17]
E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra, J. M. Squyres, V. Sahay, P. Kambadur, B. Barrett, A. Lumsdaine, R. H. Castain, D. J. Daniel, R. L. Graham, and T. S. Woodall. Open MPI: Goals, concept, and design of a next generation MPI implementation. Proceedings of 11th European PVM/MPI Users' Group Meeting, 2004.]]
[18]
G. A. Geist, A. Beguelin, J. J. Dongarra, W. Jiang, R. Manchek, and V. S. Sunderam. PVM: Parallel Virtual Machine: A Users' Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge, MA, USA, 1994.]]
[19]
G. A. Geist, J. A. Kohl, S. L. Scott, and P. M. Papadopoulos. HARNESS: Adaptable virtual machine environment for heterogeneous clusters. Parallel Processing Letters, 9(2):253--273, 1999.]]
[20]
N. R. Gottumukkala, C. Leangsuksun, and S. L. Scott. Reliability-aware approach to improve job completion time for large-scale parallel applications. Proceedings of 2nd Workshop on High Performance Computing Reliability Issues (HPCRI), 2006.]]
[21]
HA-OSCAR at Louisiana Tech University, Ruston, LA, USA. http://xcr.cenit.latech.edu/ha-oscar.]]
[22]
I. Haddad, C. Leangsuksun, and S. L. Scott. HA-OSCAR: Towards highly available linux clusters. Linux World Magazine, March 2004.]]
[23]
X. He, L. Ou, S. L. Scott, and C. Engelmann. A highly available cluster storage system using scavenging. Proceedings of High Availability and Performance Computing Workshop (HAPCW), 2004.]]
[24]
High-End Computing Revitalization Task Force. HECRTF at http://www.nitrd.gov/subcommittee/hec/hecrtf-outreach.]]
[25]
InfiniBand. http://www.infinibandta.org/home.]]
[26]
Lawrence Berkeley National Laboratory, Berkeley, CA, USA. Berkeley Lab Checkpoint Restart (BLCR) Project at http://ftg.lbl.gov/checkpoint.]]
[27]
Lawrence Livermore National Laboratory, Livermore, CA, USA. Trace logs at http://www.llnl.gov/asci/platforms/white.]]
[28]
C. Leangsuksun, V. K. Munganuru, T. Liu, S. L. Scott, and C. Engelmann. Asymmetric active-active high availability for high-end computing. Proceedings of 2nd International Workshop on Operating Systems, Programming Environments and Management Tools for High-Performance Computing on Clusters (COSET-2), 2005.]]
[29]
S. Moore, D. Cronk, K. London, and J. Dongarra. Review of Performance Analysis Tools for MPI Parallel Programs. Lecture Notes in Computer Science: 8th European PVM/MPI Users, 2331:241--248, September 2001.]]
[30]
L. Moser, Y. Amir, P. Melliar-Smith, and D. Agarwal. Extended virtual synchrony. Proceedings of 14th International Conference on Distributed Computing Systems (ICDCS), pages 56--65, 1994.]]
[31]
MPICH-V Project at University of Paris - South, France. http://www.Iri.fr/~gk/mpich-v.]]
[32]
MPICH2. http://www-unix.mcs.anl.gov/mpi/mpich2.]]
[33]
MVAPICH2, MPI over InfiniBand Project. http://nowlab.cse.ohio-state.edu/projects/mpi-iba.]]
[34]
J. Nieplocha, V. Tipparaju, M. Krishnan, G. Santhanaraman, and D. Panda. Optimizing Mechanisms for Latency Tolerance in Remote Memory Access Communication on Clusters. IEEE Cluster Computing 2003, December 2003.]]
[35]
Oak Ridge National Laboratory, TN, USA. Harness project at http://www.csm.ornl.gov/harness.]]
[36]
Open MPI Project. http://www.open-mpi.org.]]
[37]
OpenPBS resource manager at Altair Engineering, Troy, MI, USA. http://www.openpbs.org.]]
[38]
PVFS at Clemson University, Clemson, SC, USA. http://www.parl.clemson.edu/pvfs.]]
[39]
PVM Project at Oak Ridge National Laboratory. Oak Ridge, TN, USA. http://www.csm.ornl.gov/pvm.]]
[40]
R. I. Resnick. A modern taxonomy of high availability, 1996. http://www.generalconcepts.com/resources/reliability/resnick/HA.htm.]]
[41]
Science Case for Large-scale Simulation. SCaLeS at http://www.pnl.gov/scales.]]
[42]
A. G. Shet and P. Sadayappan. Performance Instrumentation to Characterize Computation-Communication Overlap in Message-Passing Systems. Technical Report OSU-CISRC-2/06-TR25, The Ohio State University, February 2006.]]
[43]
SLURM resource manager at Lawrence Livermore National Laboratory, Livermore, CA, USA. http://www.llnl.gov/linux/slurm.]]
[44]
TORQUE resource manager at Cluster Resources, Inc., Spanish Fork, UT, USA. http://www.clusterresources.com.]]
[45]
K. Uhlemann. High availability for ultra-scale high-end scientific computing. Master Thesis at the Department of Computer Science of the University of Reading, UK, March 2006.]]
[46]
J. B. White and S. W. Bova. Where's the Overlap? An Analysis of Popular MPI Implementations. Third MPI Developers' and Users' Conference, March 1999.]]

Cited By

View all
  • (2008)Holistic aggregate resource environmentACM SIGOPS Operating Systems Review10.1145/1341312.134132742:1(85-91)Online publication date: 1-Jan-2008
  • (2007)Symmetric active/active metadata service for highly available cluster storage systemsProceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems10.5555/1647539.1647613(404-409)Online publication date: 6-Nov-2007
  • (2007)A Fast Delivery Protocol for Total Order Broadcasting2007 16th International Conference on Computer Communications and Networks10.1109/ICCCN.2007.4317904(730-734)Online publication date: Aug-2007
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 40, Issue 2
April 2006
107 pages
ISSN:0163-5980
DOI:10.1145/1131322
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 2006
Published in SIGOPS Volume 40, Issue 2

Check for updates

Author Tags

  1. RAS
  2. availability
  3. fault tolerance
  4. group membership
  5. high-end computing
  6. monitoring
  7. reliability

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2008)Holistic aggregate resource environmentACM SIGOPS Operating Systems Review10.1145/1341312.134132742:1(85-91)Online publication date: 1-Jan-2008
  • (2007)Symmetric active/active metadata service for highly available cluster storage systemsProceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems10.5555/1647539.1647613(404-409)Online publication date: 6-Nov-2007
  • (2007)A Fast Delivery Protocol for Total Order Broadcasting2007 16th International Conference on Computer Communications and Networks10.1109/ICCCN.2007.4317904(730-734)Online publication date: Aug-2007
  • (2007)Reliability-aware resource allocation in HPC systemsProceedings of the 2007 IEEE International Conference on Cluster Computing10.1109/CLUSTR.2007.4629245(312-321)Online publication date: 17-Sep-2007
  • (2007)Transparent Symmetric Active/Active Replication for Service-Level High AvailabilityProceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid10.1109/CCGRID.2007.116(755-760)Online publication date: 14-May-2007
  • (2006)JOSHUA: Symmetric Active/Active Replication for Highly Available HPC Job and Resource Management2006 IEEE International Conference on Cluster Computing10.1109/CLUSTR.2006.311855(1-10)Online publication date: Sep-2006

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media