ABSTRACT
As part of the SCinet Research Sandbox at the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC11), Indiana University utilized a dedicated 100 Gbps wide area network (WAN) link spanning more than 3,500 km (2,175 mi) to demonstrate the capabilities of the Lustre high performance parallel file system in a high bandwidth, high latency WAN environment. This demonstration functioned as a proof of concept and provided an opportunity to study Lustre's performance over a 100 Gbps WAN. To characterize the performance of the network and file system, a series of benchmarks and tests were undertaken. These included low level iperf network tests, Lustre networking (LNET) tests, file system tests with the IOR benchmark, and a suite of real-world applications reading and writing to the file system. All of the benchmarks were run over a the WAN link with a latency of 50.5 ms. In this article, we describe the configuration and constraints of the demonstration, and focus on the key findings made regarding the Lustre networking layer for this extremely high bandwidth, high latency connection. Of particular interest is the relationship between the peer_credits and max_rpcs_in_flight settings when considering LNET performance.
- Lustre 1.8 Operations Manual. http://wiki.lustre.org/manual/LustreManual18_HTML/LustreProc.html#50651263_pgfId-1290515.Google Scholar
- R. Henschel, S. Michael, and S. Simms. A distributed workflow for an astrophysical OpenMP application: using the data capacitor over WAN to enhance productivity. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC '10, pages 644--650, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- R. Henschel, S. Simms, D. Hancock, S. Michael, T. Johnson, N. Heald, T. William, M. Allen, R. Knepper, M. Davy, M. Link, and C. Stewart. Demonstrating Lustre over a 100Gbps Wide Area Network of 3500km. In Proceedings of 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '12, Submitted, 2012. ACM. Google ScholarDigital Library
- iperf Team. Home page. http://sourceforge.net/projects/iperf/, 2012.Google Scholar
- M. Kluge, S. Simms, T. Wiliam, R. Henschel, A. Georgi, C. Meyer, M. Mueller, C. Stewart, W. Wuensch, and W. Nagel. Performance and quality of service of data and video movement over a 100 Gbps testbed. Future Generation Computer Systems, Accepted, 2012. Google ScholarDigital Library
- S. Michael. LNET self-test SRS demonstration scripts. https://github.com/scamicha/Sandbox-Scripts.Google Scholar
- R. Riesen, R. Brightwell, K. Pedretti, A. B. Maccabe, and T. Hudson. The Portals 3.3 Message Passing Interface. Technical Report SAND2006-0420, Sandia National Laboratories, 2006.Google Scholar
- S. C. Simms. private communcation, 2012.Google Scholar
- S. C. Simms, G. G. Pike, S. Teige, B. Hammond, Y. Ma, L. L. Simms, C. Westneat, and D. A. Balog. Empowering distributed workflow with the data capacitor: maximizing lustre performance across the wide area network. In SOCP '07: Proceedings of the 2007 workshop on Service-oriented computing performance: aspects, issues, and approaches, pages 53--58, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
Index Terms
- A study of lustre networking over a 100 gigabit wide area network with 50 milliseconds of latency
Recommendations
Demonstrating lustre over a 100Gbps wide area network of 3,500km
SC '12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisAs part of the SCinet Research Sandbox at the Supercomputing 2011 conference, Indiana University (IU) demonstrated use of the Lustre high performance parallel file system over a dedicated 100 Gbps wide area network (WAN) spanning more than 3,500 km (2,...
The Lustre File System and 100 Gigabit Wide Area Networking: An Example Case from SC11
NAS '12: Proceedings of the 2012 IEEE Seventh International Conference on Networking, Architecture, and StorageAs part of the SCinet Research Sandbox at the IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC11), Indiana University utilized a dedicated 100 Gbps wide area network (WAN) link spanning more than 3,...
Provisioning ZFS Pools On Lustre
PEARC '19: Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning)While Lustre's parallelism, performance, and scalability make it desirable as a storage solution for clusters, its limitations prevent it from being suitable as a general purpose storage for all of a cluster's needs. In particular, Lustre's relatively ...
Comments