Abstract
Distributed file systems (DFSs) are widely used in various areas. One of the key issues is to provide high performance of concurrent read streams (i.e., multiple series of sequential reads by concurrent processes) for their applications. Despite the many studies on local file systems (LFSs), research has seldom been done on concurrent read streams in DFSs with different running environments (i.e., different types of storage devices and various network delays). Furthermore, most of the existing DFSs have a sharply degraded performance compared with a LFS (i.e., EXT4). Therefore, to achieve high performance in concurrent read streams, this study introduces a populating effect that keeps sending subsequent reads to a storage server and then proposes an adaptable prefetching scheme (APS) to obtain the effect even in different running environments. Hence, our APS resolves all the problems that we identified as dramatically degrading the performance in existing DFSs. In three different types of storage devices and in various network delays, the evaluation results show that our prefetching scheme (1) achieves almost the same performance as a LFS from an individual server and (2) minimizes the performance degradation of random reads.













Similar content being viewed by others
References
A file system and storage benchmark. https://github.com/filebench/filebench/wiki. Accessed Mar 2018
Baek SH, Park KH (2009) Striping-aware sequential prefetching for independency and parallelism in disk arrays with concurrent accesses. IEEE Trans Comput 58(8):1146–1152
Chen M et al (2017) vNFS: maximizing NFS performance with compounds and vectorized I/O. ACM Trans Storage (TOS) 13(3):21
Cooper BF et al (2010) Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing. ACM
Ding X et al (2007) DiskSeen: exploiting disk layout and access history to enhance I/O prefetch. In; USENIX Annual Technical Conference, vol 7
Dong B et al (2010) Correlation based file prefetching approach for hadoop. In: 2010 IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom). IEEE
Ellard D, Seltzer MI (2003) NFS tricks and benchmarking traps. In: USENIX Annual Technical Conference, FREENIX Track
Feiyi W et al (2009) Understanding lustre filesystem internals. Oak Ridge National Laboratory, National Center for Computational Sciences, Technical Report
Fengguang WU, Hongsheng XI, Chenfeng XU (2008) On the design of a new linux readahead framework. ACM SIGOPS Oper Syst Rev 42(5):75–84
Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. In: ACM SIGOPS Operating Systems Review, vol 37, no. 5. ACM
Gill BS, Bathen LAD (2007) Optimal multistream sequential prefetching in a shared cache. ACM Trans Storage (TOS) 3(3):10
Gluster File System. http://www.gluster.org. Accessed Mar 2018
Hong J et al (2016) Optimizing Hadoop framework for solid state drives. In: IEEE International Congress on Big Data (BigData Congress), 2016. IEEE
Islam NS et al (2016) High performance design for HDFS with byte-addressability of NVM and RDMA. In: Proceedings of the 2016 International Conference on Supercomputing. ACM
Jiang S et al (2013) A prefetching scheme exploiting both data layout and access history on disk. ACM Trans Storage (TOS) 9(3):10
Lee HK, An BS, Kim EJ (2009) Adaptive prefetching scheme using web log mining in Cluster-based web systems. In: IEEE International Conference on Web Services, 2009. ICWS 2009. IEEE
Liang S, Jiang S, Zhang X (2007) STEP: sequentiality and thrashing detection based prefetching to improve performance of networked storage servers. In: 27th International Conference on Distributed Computing Systems (ICDCS’07). IEEE
Li C, Shen K, Papathanasiou AE (2007) Competitive prefetching for concurrent sequential I/O. In: ACM SIGOPS Operating Systems Review, vol 41(3). ACM
Martin RP, Culler DE (1999) NFS sensitivity to high performance networks. ACM SIGMETRICS Perform Eval Rev 27(1):71–82
Mikami S, Ohta K, Tatebe O (2011) Using the Gfarm File System as a POSIX compatible storage platform for Hadoop MapReduce applications. In: Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing. IEEE Computer Society
Pai R, Pulavarty B, Cao M (2004) Linux 2.6 performance improvement through readahead optimization. In: Proceedings of the Linux Symposium, vol 2
Palankar MR et al (2008) Amazon S3 for science grids: a viable solution? In: Proceedings of the 2008 International Workshop on Data-Aware Distributed Computing. ACM
Papagiannaki K et al (2002) Analysis of measured single-hop delay from an operational backbone network. In: Proceedings of the Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies. INFOCOM 2002. IEEE, vol 2. IEEE
Papagiannaki K et al (2003) Measurement and analysis of single-hop delay on an IP backbone network. IEEE J Sel Areas Commun 21(6):908–921
Pillai TS et al (2017) Application crash consistency and performance with CCFS. FAST, vol 15
Rago S, Bohra A, Ungureanu C (2013) Using eager strategies to improve NFS I/O performance. Int J Parallel Emerg Distrib Syst 28(2):134–158
Roselli DS, Lorch JR, Anderson TE (2000) A comparison of file system workloads. In: USENIX Annual Technical Conference, General Track
Saini S et al (2012) I/O performance characterization of Lustre and NASA applications on Pleiades. In: 2012 19th International Conference on High Performance Computing (HiPC). IEEE
Shafer J, Rixner S, Cox AL (2010) The hadoop distributed filesystem: balancing portability and performance. In: 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS). IEEE
Shriver EAM, Small C, Smith KA (1999) Why does file system prefetching work? USENIX Annual Technical Conference, General Track
Shvachko K et al (2010) The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). IEEE
Soundararajan G, Mihailescu M, Amza C (2008) Context-aware prefetching at the storage server. In: USENIX Annual Technical Conference
Sur S et al (2010) Can high-performance interconnects benefit hadoop distributed file system. In: Workshop on Micro Architectural Support for Virtualization, Data Center Computing, and Clouds (MASVDC). Held in Conjunction with MICRO
The IOzone Benchmark. http://www.iozone.org. Accessed Mar 2018
Walker E (2006) A distributed file system for a wide-area high performance computing infrastructure. WORLDS. Vol. 6
Weil SA et al (2006) Ceph: A scalable, high-performance distributed file system. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation. USENIX Association
Welch B et al (2008) Scalable performance of the panasas parallel file system. FAST, vol 8
Wu F et al (2007) Linux readahead: less tricks for more. In: Proceedings of the Linux Symposium, vol 2
Yadgar G et al (2008) Mc2: multiple clients on a multilevel cache. In: The 28th International Conference on Distributed Computing Systems, 2008. ICDCS’08. IEEE
Yadgar G et al (2011) Management of multilevel, multiclient cache hierarchies with application hints. ACM Trans Comput Syst (TOCS) 29(2):5
Zhang Z et al (2008) Pfc: transparent optimization of existing prefetching strategies for multi-level storage systems. In: The 28th International Conference on Distributed Computing Systems, 2008. ICDCS’08. IEEE
Acknowledgements
This work was supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. R0126-15-1082, Management of Developing ICBMS (IoT, Cloud, Bigdata, Mobile, Security) Core Technologies and Development of Exascale Cloud Storage Technology).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lee, S., Hyun, S.J., Kim, HY. et al. APS: adaptable prefetching scheme to different running environments for concurrent read streams in distributed file systems. J Supercomput 74, 2870–2902 (2018). https://doi.org/10.1007/s11227-018-2333-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2333-6