ABSTRACT
Parallel I/O needs to keep pace with the demand of high performance computing applications on systems with ever-increasing speed. Exploiting high-end interconnect technologies to reduce the network access cost and scale the aggregated bandwidth is one of the ways to increase the performance of storage systems. In this paper, we explore the challenges of supporting parallel file system with modern features of Quadrics, including user-level communication and RDMA operations. We design and implement a Quadrics-capable version of a parallel file system (PVFS2). Our design overcomes the challenges imposed by Quadrics static communication model to dynamic client/server architectures. Quadrics QDMA and RDMA mechanisms are integrated and optimized for high performance data communication. Zero-copy PVFS2 list IO is achieved with a Single Event Associated MUltiple RDMA (SEAMUR) mechanism. Experimental results indicate that the performance of PVFS2, with Quadrics user-level protocols and RDMA operations, is significantly improved in terms of both data transfer and management operations. With four IO server nodes, our implementation improves PVFS2 aggregated read bandwidth by up to 140% compared to PVFS2 over TCP on top of Quadrics IP implementation. Moreover, it delivers significant performance improvement to application benchmarks such as mpi-tile-io [24] and BTIO [26]. To the best of our knowledge, this is the first work in the literature to report the design of a high performance parallel file system over Quadrics user-level communication protocols.
- The Parallel Virtual File System, version 2. http://www.pvfs.org/pvfs2.Google Scholar
- The Public Netperf Homepage. http://www.netperf.org/netperf/NetperfPage.html.Google Scholar
- J. Beecroft, D. Addison, F. Petrini, and M. McLaren. QsNet-Il: An Interconnect for Supercomputing Applications. In the Proceedings of Hot Chips '03, Stanford, CA, August 2003.Google Scholar
- N. J. Boden, D. Cohen, R. E. Felderman, A. E. Kulawik, C. L. Seitz, J. N. Seizovic, and W.-K. Su. Myrinet: A Gigabit-per-Second Local Area Network. IEEE Micro, 15(1):29--36, 1995. Google ScholarDigital Library
- D. Bonachea, C. Bell, P. Hargrove, and M. Welcome. GAS-Net 2: An Alternative High-Performance Communication Interface, Nov. 2004.Google Scholar
- P. H. Carns, W. B. Ligon III, R. Ross, and P. Wyckoff. BMI: A Network Abstraction Layer for Parallel I/O, April 2005.Google Scholar
- A. Ching, A. Choudhary, W. Liao, R. Ross, and W. Gropp. Noncontiguous I/O through PVFS. In Proceedings of the IEEE International Conference on Cluster Computing, Chicago, IL, September 2002. Google ScholarDigital Library
- Cluster File System, Inc. Lustre: A Scalable, High Performance File System. http://www.lustre.org/docs.html.Google Scholar
- A. M. David Nagle, Denis Serenyi. The Panasas ActiveScale Storage Cluster -- Delivering Scalable High Bandwidth Storage. In Proceedings of Supercomputing '04, November 2004. Google ScholarDigital Library
- M. DeBergalis, P. Corbett, S. Kleiman, A. Lent, D. Noveck. T. Talpey. and M. Wittle. The Direct Access File System. In Proceedings of Second USENIX Conference on File and Storage Technologies (FAST '03). 2003. Google ScholarDigital Library
- J. Duato. S. Yalamanchili. and L. Ni. Interconnection Networks: An Engineering Approach. The IEEE Computer Society Press. 1997. Google ScholarDigital Library
- J. Huber. C. L. Elford. D. A. Reed. A. A. Chien. and D. S. Blumenthal. PPFS: A High Performance Portable Parallel File System. In Proceedings of the 9th ACM International Conference on Supercomputing. pages 385--394. Barcelona. Spain. July 1995. ACM Press. Google ScholarDigital Library
- IBM Corp. IBM AIX Parallel I/O File System: Installation. Administration. and Use. Document Number SH34-6065-01. August 1995.Google Scholar
- Infiniband Trade Association. http://www.infinibandta.org.Google Scholar
- Intel Scalable Systems Division. Paragon System User's Guide, May 1995.Google Scholar
- R. Latham, R. Ross, and R. Thakur. The impact of file systems on mpi-io scalability. In Proceedings of the 11th European PVM/MPI Users' Group Meeting (Euro PVM/MPI 2004), pages 87--96, September 2004.Google ScholarCross Ref
- J. Liu, M. Banikazemi, B. Abali, and D. K. Panda. A Portable Client/Server Communication Middleware over SANs: Design and Performance Evaluation with InfiniBand. In SAN-02 Workshop (in conjunction with HPCA), February 2003.Google Scholar
- Message Passing Interface Forum. MPI-2: Extensions to the Message-Passing Interface, Jul 1997.Google Scholar
- N. Nieuwejaar and D. Kotz. The Galley Parallel File System. Parallel Computing, (4):447--476, June 1997. Google ScholarDigital Library
- P. H. Carns and W. B. Ligon III and R. B. Ross and R. Thakur. PVFS: A Parallel File System For Linux Clusters. In Proceedings of the 4th Annual Linux Showcase and Conference, pages 317--327, Atlanta, GA, October 2000. Google ScholarDigital Library
- D. A. Patterson, G. Gibson, and R. H. Katz. A Case for Redundant Arrays of Inexpensive Disks. In Proceedings of the 1988 ACM SIGMOD International Conference on Management of Data, Chicago, IL, 1988. Google ScholarDigital Library
- F. Petrini, W.-C. Feng, A. Hoisie, S. Coll, and E. Frachtenberg. The Quadrics Network: High Performance Clustering Technology. IEEE Micro, 22(1):46--57, January-February 2002. Google ScholarDigital Library
- Quadrics, Inc. Quadrics Linux Cluster Documentation.Google Scholar
- R. B. Ross. Parallel i/o benchmarking consortium. http://www-unix.mcs.anl.gov/rross/pio-benchmark/html/.Google Scholar
- R. Thakur, W. Gropp, and E. Lusk. On Implementing MPI-IO Portably and with High Performance. In Proceedings of the 6th Workshop on I/O in Parallel and Distributed Systems, pages 23--32. ACM Press, May 1999. Google ScholarDigital Library
- P. Wong and R. F. Van der Wijngaart. NAS Parallel Benchmarks I/O Version 2.4. Technical Report NAS-03-002, Computer Sciences Corporation, NASA Advanced Supercomputing (NAS) Division.Google Scholar
- J. Wu, P. Wychoff, and D. K. Panda. PVFS over InfiniBand: Design and Performance Evaluation. In Proceedings of the International Conference on Parallel Processing '03, Kaohsiung, Taiwan, October 2003.Google ScholarCross Ref
- J. Wu, P. Wychoff, and D. K. Panda. Supporting Efficient Noncontiguous Access in PVFS over InfiniBand. In Proceedings of Cluster Computing '03, Hong Kong, December 2004.Google Scholar
- W. Yu, T. S. Woodall, R. L. Graham, and D. K. Panda. Design and Implementation of Open MPI over Quadrics/Elan4. In Proceedings of the International Conference on Parallel and Distributed Processing Symposium '05, Colorado, Denver, April 2005. Google ScholarDigital Library
- R. Zahir. Lustre Storage Networking Transport Layer. http://www.lustre.org/docs.html.Google Scholar
- Y. Zhou, A. Bilas, S. Jagannathan, C. Dubnicki, J. F. Philbin, and K. Li. Experiences with VI Communication for Database Storage. In Proceedings of the 29th Annual International Symposium on Computer Architecture, pages 257--268. IEE Computer Society, 2002. Google ScholarDigital Library
Recommendations
pNFS/PVFS2 over InfiniBand: early experiences
PDSW '07: Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07The computing power of clusters has been rapidly growing up towards petascale capability, which requires petascale I/O systems to provide data in a sustained high-throughput manner. Network File System (NFS), a ubiquitous standard used in most existing ...
A high-performance distributed parallel file system for data-intensive computations
One of the challenges brought by large-scale scientific applications is how to avoid remote storage access by collectively using sufficient local storage resources to hold huge amounts of data generated by the simulation while providing high-performance ...
Comments