Fair bandwidth allocating and strip-aware prefetching for concurrent read streams and striped RAIDs in distributed file systems

Lee, Sangmin; Hyun, Soon J.; Kim, Hong-Yeon; Kim, Young-Kyun

doi:10.1007/s11227-018-2396-4

Fair bandwidth allocating and strip-aware prefetching for concurrent read streams and striped RAIDs in distributed file systems

Published: 05 May 2018

Volume 74, pages 3904–3932, (2018)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Sangmin Lee ORCID: orcid.org/0000-0002-3283-7564^1,2,
Soon J. Hyun¹,
Hong-Yeon Kim² &
…
Young-Kyun Kim²

200 Accesses
2 Citations
Explore all metrics

Abstract

With a striped RAID (Redundant Array of Independent Disks) which consists of multiple disks and spreads data across them in parallel, distributed file systems (DFSs) easily enhance the performance of a single read stream (i.e., a series of sequential reads by a process). However, most existing DFSs suffer from performance degradation in concurrent read streams (i.e., multiple series of sequential reads by concurrent processes). Furthermore, research on the performance of concurrent ones for a striped RAID in DFSs has been rarely reported so far. In this paper, we define the problems that degrade it at different configurations of striped RAIDs, and resolve them by proposing the following two methods: (1) a fair allocating of network bandwidth for concurrent read streams and (2) a strip-aware prefetching for each individual read stream. We show that our proposal outperforms all the existing DFSs by at least two times for all kinds and configurations of striped RAIDs. Furthermore, the performance gap between our proposal and the existing DFSs becomes wider according to the increasing number of striped disks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

APS: adaptable prefetching scheme to different running environments for concurrent read streams in distributed file systems

Article 23 March 2018

Sangmin Lee, Soon J. Hyun, … Young-Kyun Kim

Dynamic Stripe Management Mechanism in Distributed File Systems

A Prefetching Mechanism Based on MooseFS

References

Gluster File System. http://www.gluster.org. Accessed Apr 2018
Palankar MR et al (2008) Amazon S3 for science grids: a viable solution? In: Proceedings of the 2008 International Workshop on Data-Aware Distributed Computing. ACM
Weil SA et al (2006) Ceph: a scalable, high-performance distributed file system. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation. USENIX Association
Calder B et al (2011) Windows Azure storage: a highly available cloud storage service with strong consistency. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. ACM
Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. In: ACM SIGOPS Operating Systems Review, vol. 37, no 5. ACM, pp 29–43
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
Shvachko K et al (2010) The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). IEEE
Wang F et al (2009) Understanding lustre filesystem internals. Oak Ridge National Laboratory, National Center for Computational Sciences, Technical Report
Welch B et al. (2008) Scalable performance of the Panasas parallel file system. In: FAST, vol 8, pp 1–17
Chen Y (2011) Towards scalable I/O architecture for exascale systems. In: Proceedings of the 2011 ACM International Workshop on Many Task Computing on Grids and Supercomputers. ACM
Xu Q et al (2014) Efficient and scalable metadata management in EB-scale file systems. IEEE Trans Parallel Distrib Syst 25.11:2840–2850
Article Google Scholar
Xiong J et al (2011) Metadata distribution and consistency techniques for large-scale cluster file systems. IEEE Trans Parallel Distrib Syst 22.5:803–816
Article Google Scholar
Kim Y, Gunasekaran R (2015) Understanding I/O workload characteristics of a peta-scale storage system. J Supercomput 71(3):761–780
Article Google Scholar
Lai WK et al (2014) Towards a framework for large-scale multimedia data storage and processing on Hadoop platform. J Supercomput 68.1:488–507
Article Google Scholar
Mao B, Wu S, Duan L (2018) Improving the SSD performance by exploiting request characteristics and internal parallelism. IEEE Trans Comput Aided Des Integr Circuits Syst 37(2):472–484
Article Google Scholar
Sur S et al (2010) Can high-performance interconnects benefit hadoop distributed file system. In: Workshop on Micro Architectural Support for Virtualization, Data Center Computing, and Clouds (MASVDC). Held in Conjunction with MICRO
Kolli A et al (2016) High-performance transactions for persistent memories. ACM SIGPLAN Not 51.4:399–411
Article Google Scholar
Matsui C, Sun C, Takeuchi K (2017) Design of hybrid SSDs with storage class memory and NAND flash memory. In: Proceedings of the IEEE
Qiu S, Reddy ALN (2013) NVMFS: a hybrid file system for improving random write in nand-flash SSD. In: 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST). IEEE
Huang TC, Chang DW (2016) TridentFS: a hybrid file system for nonvolatile RAM, flash memory and magnetic disk. Softw Pract Exp 46.3:291–318
Article Google Scholar
Fan Z et al (2017) Hibachi: a cooperative hybrid cache with NVRAM and DRAM for storage arrays. In: Proceedings of IEEE Conference on Mass Storage Systems and Technologies (MSST)
Chandy JA (2008) RAID0. 5: design and implementation of a low cost disk array data protection method. J Supercomput 46(2):108–123
Article Google Scholar
Shriver EAM, Small C, Smith KA (1999) Why does file system prefetching work? In: USENIX Annual Technical Conference, General Track
Fengguang WU, Hongsheng XI, Chenfeng XU (2008) On the design of a new linux readahead framework. ACM SIGOPS Oper Syst Rev 42(5):75–84
Article Google Scholar
Pai R, Pulavarty B, Cao M (2004) Linux 2.6 performance improvement through readahead optimization. In: Proceedings of the Linux Symposium, vol 2
Wu F et al (2007) Linux readahead: less tricks for more. In: Proceedings of the Linux Symposium, vol 2
Li C, Shen K, Papathanasiou AE (2007) Competitive prefetching for concurrent sequential I/O. In: ACM SIGOPS Operating Systems Review, vol 41, no 3. ACM
Ding X et al (2007) DiskSeen: exploiting disk layout and access history to enhance I/O prefetch. In: USENIX Annual Technical Conference, vol 7
Jiang S et al (2013) A prefetching scheme exploiting both data layout and access history on disk. ACM Trans Storage (TOS) 9.3:10
Google Scholar
Gill BS, Bathen LAD (2007) Optimal multistream sequential prefetching in a shared cache. ACM Trans Storage (TOS) 3.3:10
Article Google Scholar
Baek SH, Park KH (2009) Striping-aware sequential prefetching for independency and parallelism in disk arrays with concurrent accesses. IEEE Trans Comput 58(8):1146–1152
Article MathSciNet MATH Google Scholar
Shi X, Feng D (2012) LSP: a locality-aware strip prefetching scheme for striped disk array systems with concurrent accesses. J Comput 7(6):1303–1311
Article MathSciNet Google Scholar
Pratt S, Heger DA (2004) Workload dependent performance evaluation of the linux 2.6 i/o schedulers. In: 2004 Linux Symposium
Lee Y-J et al (2009) Fast-path I/O architecture for high performance streaming server. J Supercomput 50.2:99
Article Google Scholar
Roselli DS, Lorch JR, Anderson TE (2000) A comparison of file system workloads. In: USENIX Annual Technical Conference, General Track
Cooper BF et al (2010) Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing. ACM
Shafer J, Rixner S, Cox AL (2010) The hadoop distributed filesystem: balancing portability and performance. In: 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS). IEEE
Saini S et al (2012) I/O performance characterization of Lustre and NASA applications on Pleiades. In: 2012 19th International Conference on High Performance Computing (HiPC). IEEE
Chen PM et al (1994) RAID: high-performance, reliable secondary storage. ACM Comput Surv (CSUR) 26.2:145–185
Article Google Scholar
Moon S et al (2015) Optimizing the Hadoop MapReduce Framework with high-performance storage devices. J Supercomput 71.9:3525–3548
Article Google Scholar
Liang S, Jiang S, Zhang X (2007) STEP: sequentiality and thrashing detection based prefetching to improve performance of networked storage servers. In: 27th International Conference on Distributed Computing Systems (ICDCS’07). IEEE
Zhang Z et al (2008) Pfc: transparent optimization of existing prefetching strategies for multi-level storage systems. In: The 28th International Conference on Distributed Computing Systems, 2008. ICDCS’08. IEEE
Soundararajan G, Mihailescu M, Amza C (2008) Context-aware prefetching at the storage server. In: USENIX Annual Technical Conference
Lee HK, An BS, Kim EJ (2009) Adaptive prefetching scheme using web log mining in Cluster-based web systems. In: IEEE International Conference on Web Services, 2009. ICWS 2009. IEEE
Gala Y et al (2011) Management of multilevel, multiclient cache hierarchies with application hints. ACM Trans Comput Syst (TOCS) 29(2):5
Google Scholar
Yadgar G et al (2008) Mc2: multiple clients on a multilevel cache. In: The 28th International Conference on Distributed Computing Systems, 2008. ICDCS’08. IEEE
Dong B et al (2010) Correlation based file prefetching approach for hadoop. In: 2010 IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom). IEEE
Lee S, Hyun SJ, Kim HY et al. (2018) APS: adaptable prefetching scheme to different running environments for concurrent read streams in distributed file systems. J Supercomput https://doi.org/10.1007/s11227-018-2333-6
The IOzone Benchmark. http://www.iozone.org. Accessed Apr 2018

Download references

Acknowledgements

This work was supported by Institute for Information & communications Technology Promotion (IITP) Grant funded by the Korea Government (MSIP) (No. R0126-15-1082, Management of Developing ICBMS (IoT, Cloud, Bigdata, Mobile, Security) Core Technologies and Development of Exascale Cloud Storage Technology)

Author information

Authors and Affiliations

Department of School of Computing, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
Sangmin Lee & Soon J. Hyun
High Performance Computing Research Group, Electronics and Telecommunications Research Institute (ETRI), Daejeon, Korea
Sangmin Lee, Hong-Yeon Kim & Young-Kyun Kim

Authors

Sangmin Lee
View author publications
You can also search for this author in PubMed Google Scholar
Soon J. Hyun
View author publications
You can also search for this author in PubMed Google Scholar
Hong-Yeon Kim
View author publications
You can also search for this author in PubMed Google Scholar
Young-Kyun Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sangmin Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, S., Hyun, S.J., Kim, HY. et al. Fair bandwidth allocating and strip-aware prefetching for concurrent read streams and striped RAIDs in distributed file systems. J Supercomput 74, 3904–3932 (2018). https://doi.org/10.1007/s11227-018-2396-4

Download citation

Published: 05 May 2018
Issue Date: August 2018
DOI: https://doi.org/10.1007/s11227-018-2396-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fair bandwidth allocating and strip-aware prefetching for concurrent read streams and striped RAIDs in distributed file systems

Abstract

Access this article

Similar content being viewed by others

APS: adaptable prefetching scheme to different running environments for concurrent read streams in distributed file systems

Dynamic Stripe Management Mechanism in Distributed File Systems

A Prefetching Mechanism Based on MooseFS

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fair bandwidth allocating and strip-aware prefetching for concurrent read streams and striped RAIDs in distributed file systems

Abstract

Access this article

Similar content being viewed by others

APS: adaptable prefetching scheme to different running environments for concurrent read streams in distributed file systems

Dynamic Stripe Management Mechanism in Distributed File Systems

A Prefetching Mechanism Based on MooseFS

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation