Skip to main content
Log in

Gfarm/BB — Gfarm File System for Node-Local Burst Buffer

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Burst buffer has become a major component to meet the I/O performance requirement of HPC bursty traffic. This paper proposes Gfarm/BB that is a file system for a burst buffer efficiently exploiting node-local storage systems. Although node-local storages improve storage performance, they are only available during the job allocation. Gfarm/BB should have better access and metadata performance while it should be constructed on-demand before the job execution. To improve the read and write performance, it exploits the file descriptor passing and remote direct memory access (RDMA). It improves the metadata performance by omitting the persistency and the redundancy since it is a temporal file system. Using RDMA, writes and reads bandwidth are improved by 1.7x and 2.2x compared with IP over InfiniBand (IPoIB), respectively. It achieves 14 700 operations per second in the directory creation performance, which is 13.4x faster than the fully persistent and redundant case. The construction of Gfarm/BB takes 0.31 seconds using 2 nodes. IOR benchmark and ARGOT-IO application I/O benchmark show the scalable performance improvement by exploiting the locality of node-local storages. Compared with BeeOND, Gfarm/BB shows 2.6x and 2.4x better performance in IOR write and read benchmarks, respectively, and it shows 2.5x better performance in ARGOT-IO.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bhimji W, Bard D, Romanus M et al. Accelerating science with the NERSC burst buffer early user program. In Proc. the 2016 Cray User Group, May 2016.

  2. Bent J, Gibson G, Grider G, McClelland B, Nowoczynski P, Nunez J, Polte M, Wingate M. PLFS: A checkpoint filesystem for parallel applications. In Proc. the 2009 ACM/IEEE Conference on High Performance Computing Networking, Storage and Analysis, Nov. 2009, Article No. 6.

  3. Nisar A, Liao W, Choudhary A. Delegation-based I/O mechanism for high performance computing systems. IEEE Trans. Parallel and Distributed Systems, 2012, 23(2): 271-279.

    Article  Google Scholar 

  4. Tatebe O, Hiraga K, Soda N. Gfarm grid file system. New Generation Computing, 2010, 28(3): 257-275.

    Article  Google Scholar 

  5. Callaghan B, Lingutla-Raj T, Chiu A, Staubach P, Asad O. NFS over RDMA. In Proc. the ACM SIGCOMM Workshop on Network-I/O Convergence: Experience, Lessons, Implications, August 2003, pp.196-208.

  6. Talpey T, Callaghan B. Remote direct memory access transport for remote procedure call. https://tools.ietf.org/html/rfc5666, Sept. 2019.

  7. Talpey T, Callaghan B. Network file system (NFS) direct data placement. https://tools.ietf.org/html/rfc5667, Sept. 2019.

  8. Islam N S, Rahman M W, Jose J, Rajachandrasekar R, Wang H, Subramoni H, Murthy C, Panda D K. High performance RDMA-based design of HDFS over InfiniBand. In Proc. the 2012 Int. Conference on High Performance Computing, Networking, Storage and Analysis, November 2012, Article No. 35.

  9. Cooper B F, Silberstein A, Tam E, Ramakrishnan R, Sears R. Benchmarking cloud serving systems with YCSB. In Proc. the 1st ACM Symp. Cloud Computing, June 2010, pp.143-154.

  10. Sasaki S, Takahashi K, Oyama Y, Tatebe O. RDMA-based direct transfer of file data to remote page cache. In Proc. the 2015 IEEE Int. Conference on Cluster Computing, September 2015, pp.214-225.

  11. Rajachandrasekar R, Moody A, Mohror K, Panda D K. A 1 PB/s file system to checkpoint three million MPI tasks. In Proc. the 22nd Int. Symp. High-performance Parallel and Distributed Computing, June 2013, pp.143-154.

  12. Moody A, Bronevetsky G, Mohror K, de Supinski B R. Design, modeling, and evaluation of a scalable multi-level checkpointing system. In Proc. the 2010 ACM/IEEE Int. Conference for High Performance Computing, Networking, Storage and Analysis, November 2010, Article No. 22.

  13. Wang T, Mohror K, Moody A, Sato K, Yu W K. An ephemeral burst-buffer file system for scientific applications. In Proc. the 2016 Int. Conference for High Performance Computing, Networking, Storage and Analysis, November 2016, pp.807-818.

  14. Greenberg H, Bent J, Grider G. MDHIM: A parallel key/value framework for HPC. In Proc. the 7th USENIX Workshop on Hot Topics in Storage and File Systems, July 2015, Article No. 10.

  15. Wang T, Moody A, Zhu Y, Mohror K, Sato K, Islam T, Yu W.MetaKV: A key-value store for metadata management of distributed burst buffers. In Proc. the 2017 IEEE Int. Parallel and Distributed Processing Symp., May 2017, pp.1174-1183.

  16. Vazhkudai S S, de Supinski B R, Bland A S et al. The design, deployment, and evaluation of the CORAL preexascale systems. In Proc. the 2018 Int. Conference for High Performance Computing, Networking, Storage, and Analysis, November 2018, Article No. 52.

  17. Hilland J, Culley P, Pinkerton J, Recio R. RDMA Protocol Verbs Specification. https://tools.ietf.org/html/drafthilland-rddp-verbs-00, Sept. 2019.

  18. Vangoor B K R, Tarasov V, Zadok E. To FUSE or not to FUSE: Performance of user-pace file systems. In Proc. the 15th USENIX Conference on File and Storage Technologies, February 2017, pp.59-72.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Osamu Tatebe.

Electronic supplementary material

ESM 1

(PDF 102 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tatebe, O., Moriwake, S. & Oyama, Y. Gfarm/BB — Gfarm File System for Node-Local Burst Buffer. J. Comput. Sci. Technol. 35, 61–71 (2020). https://doi.org/10.1007/s11390-020-9803-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-020-9803-z

Keywords

Navigation