OPIOM: Off-Processor I/O with Myrinet

doi:10.1016/S0167-739X(01)00074-7

Future Generation Computer Systems

Volume 18, Issue 4, March 2002, Pages 491-499

https://doi.org/10.1016/S0167-739X(01)00074-7 Get rights and content

Abstract

As processors become more powerful and clusters larger, users will exploit this increased power to progressively run larger and larger problems. Today’s datasets in biology, physics or multimedia applications are huge and require high-performance storage sub-systems. As a result, the hot spot of cluster computing is gradually moving from high performance computing to high performance storage I/O. The solutions proposed by the parallel file-system community try to improve performance by working at the kernel level to enhance the regular I/O design or by using a dedicated Storage Area Network like Fiber Channel. We propose a new design to merge the communication network and the storage network at the best price. We have implemented it in OPIOM with the Myrinet interconnect: OPIOM moves data asynchronously from SCSI disks to the embedded memory of a Myrinet interface in order to send it to a remote node. This design presents attractive features: high performance and extremely low host overhead.

Introduction

The availability of powerful microprocessors and high-speed networks as commodity components is making clusters an appealing solution for cost-effective high performance computing. However, the bottleneck for users’ applications tends to shift from the computation and the communication sides to the I/O domain: the problem sizes are bigger and bigger and the time to load datasets into the cluster work pool and write the results to disks cannot be neglected any more.

The new generation of commodity components like storage controllers and high-speed networks can be efficiently used to break some architectural limitations inherited from the past, while keeping the price/performance ratio as low as possible. Our research effort improves a basic feature for the usage of parallel I/O on clusters by removing bottlenecks.

In Section 2, we present the motivation of this work, one current limitation of the parallel I/O design for clusters, and some related work that tries to improve it. We propose a new design in Section 3, describing our contribution and detailing the issues that occurred during the implementation. Then, Section 4 shows the results of some experimental benchmarks to highlight the benefit of our work for parallel I/O implementations. Finally, the conclusion in Section 5 summarizes our work and presents the short and medium term perspectives of our project.

Section snippets

Motivation

Today’s clusters are larger and more powerful than ever before. They start to be used to face some Grand Challenges in genomics or nuclear simulations, or even for intensive multimedia applications like Video-on-Demand (VOD). The datasets used in these contexts are very large, and require the I/O sub-system to be as efficient as the computation or the communication components.

There are two ways to achieve high-performance I/O in a cluster environment:

•
To use a dedicated Storage Area Network

Contribution

The data path between the storage controller and the network interface passes through the host memory despite the fact that the data is not processed by the main processor before being sent to the I/O request emitter. The data goes through the host because of system constraints: the interactions with a local storage controller are traditionally operated from a user application and the communication interface of the NIC assumes the data to be present in the main memory at the beginning of a

Experimentation

We have conducted experiments with OPIOM to validate the implementation and measure the performance gain versus the regular I/O implementation.

Conclusions and perspectives

Parallel I/O is a very important research domain for high performance cluster computing. Today’s clusters cannot be used at the maximum of their capacity because of disappointing I/O performance compared to computational power. We have designed a basic interface to optimize the data movement between disks and an intelligent network interface with Linux. Our implementation with SCSI and Myrinet, OPIOM, provides high-performance throughput and very low host overhead as well as a UNIX-like

Patrick Geoffray received his PhD from the University of Lyon (France) in 2001. He is currently a senior programmer at Myricom in the branch office located in Oak Ridge, TN, USA. He is in charge of the high performance middleware on top of Myrinet: MPI, VIA, SHMEM. His points of interest are high speed interconnects, message passing and high performance storage I/O.

References (9)

F.C.I.A. (FCIA), Fiber Channel, WWW,...
Finisar, M.SoftTech, NCSA, High-performance solution uses FC, VIA, and MPI, Headlines, NCSA, November 1998....
H. Stern, Managing NFS and NIS, O’Reilly & Associates, Inc.,...
P.F Corbett
Parallel file systems for the IBM SP computers
IBM Syst. J.
(1995)

There are more references available in the full text version of this article.

Cited by (11)

Exploring I/O virtualization data paths for MPI applications in a cluster of VMs: A networking perspective
2011, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Communication-aware load balancing for parallel applications on clusters
2010, IEEE Transactions on Computers
Dynamic load balancing for I/O-intensive applications on clusters
2009, ACM Transactions on Storage
Synchronized send operations for efficient streaming block I/O over myrinet
2008, IPDPS Miami 2008 - Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium, Program and CD-ROM
Efficient block device sharing over myrinet with memory bypass
2007, Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM
Improving the performance of I/O-intensive applications on clusters of workstations
2006, Cluster Computing

View all citing articles on Scopus

^☆: This work has been realized with the help of Sycomore—Aerospatiale Matra in France, and Jack Dongarra and Rich Wolski at the University of Tennessee, USA.

¹: URL: http://www.myri.com.

View full text

OPIOM: Off-Processor I/O with Myrinet☆