OPIOM: Off-Processor I/O with Myrinet

https://doi.org/10.1016/S0167-739X(01)00074-7Get rights and content

Abstract

As processors become more powerful and clusters larger, users will exploit this increased power to progressively run larger and larger problems. Today’s datasets in biology, physics or multimedia applications are huge and require high-performance storage sub-systems. As a result, the hot spot of cluster computing is gradually moving from high performance computing to high performance storage I/O. The solutions proposed by the parallel file-system community try to improve performance by working at the kernel level to enhance the regular I/O design or by using a dedicated Storage Area Network like Fiber Channel. We propose a new design to merge the communication network and the storage network at the best price. We have implemented it in OPIOM with the Myrinet interconnect: OPIOM moves data asynchronously from SCSI disks to the embedded memory of a Myrinet interface in order to send it to a remote node. This design presents attractive features: high performance and extremely low host overhead.

Introduction

The availability of powerful microprocessors and high-speed networks as commodity components is making clusters an appealing solution for cost-effective high performance computing. However, the bottleneck for users’ applications tends to shift from the computation and the communication sides to the I/O domain: the problem sizes are bigger and bigger and the time to load datasets into the cluster work pool and write the results to disks cannot be neglected any more.

The new generation of commodity components like storage controllers and high-speed networks can be efficiently used to break some architectural limitations inherited from the past, while keeping the price/performance ratio as low as possible. Our research effort improves a basic feature for the usage of parallel I/O on clusters by removing bottlenecks.

In Section 2, we present the motivation of this work, one current limitation of the parallel I/O design for clusters, and some related work that tries to improve it. We propose a new design in Section 3, describing our contribution and detailing the issues that occurred during the implementation. Then, Section 4 shows the results of some experimental benchmarks to highlight the benefit of our work for parallel I/O implementations. Finally, the conclusion in Section 5 summarizes our work and presents the short and medium term perspectives of our project.

Section snippets

Motivation

Today’s clusters are larger and more powerful than ever before. They start to be used to face some Grand Challenges in genomics or nuclear simulations, or even for intensive multimedia applications like Video-on-Demand (VOD). The datasets used in these contexts are very large, and require the I/O sub-system to be as efficient as the computation or the communication components.

There are two ways to achieve high-performance I/O in a cluster environment:

  • To use a dedicated Storage Area Network

Contribution

The data path between the storage controller and the network interface passes through the host memory despite the fact that the data is not processed by the main processor before being sent to the I/O request emitter. The data goes through the host because of system constraints: the interactions with a local storage controller are traditionally operated from a user application and the communication interface of the NIC assumes the data to be present in the main memory at the beginning of a

Experimentation

We have conducted experiments with OPIOM to validate the implementation and measure the performance gain versus the regular I/O implementation.

Conclusions and perspectives

Parallel I/O is a very important research domain for high performance cluster computing. Today’s clusters cannot be used at the maximum of their capacity because of disappointing I/O performance compared to computational power. We have designed a basic interface to optimize the data movement between disks and an intelligent network interface with Linux. Our implementation with SCSI and Myrinet, OPIOM, provides high-performance throughput and very low host overhead as well as a UNIX-like

Patrick Geoffray received his PhD from the University of Lyon (France) in 2001. He is currently a senior programmer at Myricom in the branch office located in Oak Ridge, TN, USA. He is in charge of the high performance middleware on top of Myrinet: MPI, VIA, SHMEM. His points of interest are high speed interconnects, message passing and high performance storage I/O.

References (9)

  • F.C.I.A. (FCIA), Fiber Channel, WWW,...
  • Finisar, M.SoftTech, NCSA, High-performance solution uses FC, VIA, and MPI, Headlines, NCSA, November 1998....
  • H. Stern, Managing NFS and NIS, O’Reilly & Associates, Inc.,...
  • P.F Corbett

    Parallel file systems for the IBM SP computers

    IBM Syst. J.

    (1995)
There are more references available in the full text version of this article.

Cited by (11)

View all citing articles on Scopus

Patrick Geoffray received his PhD from the University of Lyon (France) in 2001. He is currently a senior programmer at Myricom in the branch office located in Oak Ridge, TN, USA. He is in charge of the high performance middleware on top of Myrinet: MPI, VIA, SHMEM. His points of interest are high speed interconnects, message passing and high performance storage I/O.

This work has been realized with the help of Sycomore—Aerospatiale Matra in France, and Jack Dongarra and Rich Wolski at the University of Tennessee, USA.

1

URL: http://www.myri.com.

View full text