Informed Prefetching for Distributed Multi-Level Storage Systems

Al Assaf, Maen M.; Jiang, Xunfei; Qin, Xiao; Abid, Mohamed Riduan; Qiu, Meikang; Zhang, Jifu

doi:10.1007/s11265-017-1277-z

Informed Prefetching for Distributed Multi-Level Storage Systems

Published: 30 August 2017

Volume 90, pages 619–640, (2018)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Maen M. Al Assaf¹,
Xunfei Jiang²,
Xiao Qin³,
Mohamed Riduan Abid⁴,
Meikang Qiu⁵ &
…
Jifu Zhang⁶

281 Accesses
5 Citations
Explore all metrics

Abstract

In this paper, we present an informed prefetching technique called IPODS that makes use of application-disclosed access patterns to prefetch hinted blocks in distributed multi-level storage systems. We develop a prefetching pipeline in IPODS, where an informed prefetching process is divided into a set of independent prefetching steps and separated among multiple storage levels in a distributed system. In the IPODS system, while data blocks are prefetched from hard disks to memory buffers in remote storage servers, data blocks buffered in the servers are prefetched through networks to the clients’ local cache. We show that these two prefetching steps can be handled in a pipelining manner to improve I/O performance of distributed storage systems. Our IPODS technique differs from existing prefetching schemes in two ways. First, it reduces applications’ I/O stalls by keeping hinted data in clients’ local caches and storage servers’ fast buffers (e.g., solid state disks). Second, in a prefetching pipeline, multiple informed prefetching mechanisms coordinate semi-dependently to fetch blocks (1) from low-level (slow) to high-level (fast) storage devices in servers and (2) from high-level devices in servers to the clients’ local cache. The prefetching pipeline in IPODS judiciously hides network latency in distributed storage systems, thereby reducing the overall I/O access time in distributed systems. Using a wide range of real-world I/O traces, our experiments show that IPODS can noticeably improve I/O performance of distributed storage systems by 6%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

I/O Acceleration via Multi-Tiered Data Buffering and Prefetching

Article 17 January 2020

Anthony Kougkas, Hariharan Devarajan & Xian-He Sun

APS: adaptable prefetching scheme to different running environments for concurrent read streams in distributed file systems

Article 23 March 2018

Sangmin Lee, Soon J. Hyun, … Young-Kyun Kim

Application and user-specific data prefetching and parallel read algorithms for distributed file systems

Article 28 October 2023

Anusha Nalajala, T. Ragunathan, … Sudheer Kumar Battula

References

Patterson, R.H., Gibson, G., Stodolsky, D., & Zelenka, J. (1995). Informed prefetching and caching. In Proceedings of the 15th ACM symposium on operating system principles (pp. 79–95). CO, USA.
Chen, Y., Byna, S., Sun, X., Thakur, R., & Gropp, W. (2008). Hiding I/O latency with pre-execution prefetching for parallel applications. In Proceedings of the 2008 ACM/IEEE conference on supercomputing (pp. 1–10). Austin, TX, USA.
Yang, C.K., Mitra, T., & Chiueh, T. (2002). A decoupled architecture for application-specific file prefetching. In Freenix track of USENIX 2002 annual conference.
Griffioen, J., & Appleton, R. (1994). Reducing file system latency using a predictive approach. In Proceedings of the 1994 USENIX annual technical conference (pp. 197–207). Berkeley, CA, USA.
Nijim, M. (2010). Modelling speculative prefetching for hybrid storage systems. In IEEE fifth international conference on networking, architecture and storage (NAS), 2010 (pp. 143–151). Macau.
Thomasian, A. (2006). Multi-level RAID for very large disk arrays. ACM SIGMETRICS Performance Evaluation Review, 33(4). https://doi.org/10.1145/1138085.1138091.
Kaneko, T. (1974). Optimal task switching policy for a multilevel storage system. IBM Journal of Research and Development, 18(4), 310–315.
Article MATH Google Scholar
Huizinga, D.M., & Desai, S. (2000). Implementation of informed prefetching and caching in linux. In Proceedings of the international conference on information technology (pp. 443–448). Las Vegas, NV, USA.
Patterson, R.H., Gibson, G.A., & Satyanarayanan, M. (1993). A status report on research in transparent informed prefetching. ACM SIGOPS Operating Systems Review, 27(2), 21–34.
Article Google Scholar
Patterson, R. H., Gibson, G. A., & Satyanarayanan, M. (1992). Using transparent informed prefetching (TIP) to reduce file read latency. In Proceedings of conference on mass storage systems and technologies (pp. 329–342). Greenbelt, MD.
Patterson, R.H., & Gibson, G. (1994). Exposing I/O concurrency with informed prefetching. In Proceedings of the third international conference on on parallel and distributed information systems (pp. 7–16). Austin, TX, USA.
Chen, Y., Byna, S., Sun, X., Thakur, R., & Gropp, W. (2008). Exploring parallel I/O concurrency with speculative prefetching. In Proceedings of the 2008 37th international conference on parallel processing (pp. 422–429). Portland, OR, USA.
Tomkins, A., Patterson, R.H., & Gibson, G. (1997). Informed multi-process prefetching and caching. In Proceedings of the 1997 ACM SIGMETRICS international conference on measurement and modeling of computer systems (pp. 100–114). Seattle, WA, USA.
Kimbrel, T., Cao, P., Felten, E., Karlin, A., & Li, K. (1996). Integrated parallel prefetching and caching. In Proceedings of the 1996 ACM SIGMETRICS international conference on measurement and modeling of computer systems (pp. 262–263). PA, USA.
Ganger, G.R., Worthington, B.L., Hou, R.Y., & Patt, Y.N. (1994). Disk arrays: high-performance, high-reliability storage subsystems. Journal: Computer, 27, 30–36. https://doi.org/10.1109/2.268882. issn: 0018-9162, Ann Arbor, MI, USA.
Google Scholar
Chang, F., & Gibson, G. A. (1999). Automatic I/O hint generation through speculative execution. In Proceedings of the third symposium on operating systems design and implementation (pp. 1–14). New Orleans, Louisiana, United States.
Byna, S., Chen, Y., Sun, X.-H., Thakur, R., & Gropp, W. (2008). Parallel I/O prefetching using MPI file caching and I/O signatures. In Proceedings of the 2008 ACM/IEEE conference on supercomputing. Austin, Texas.
Al Assaf, M.M., Jiang, X., Abid, M.R., & Qin, X. (2013). Eco-storage: a hybrid storage system with energy-efficient informed prefetching. Journal of Signal Processing Systems, Springer US. https://doi.org/10.1007/s11265-013-0784-9.
Jiang, X., Al Assaf, M.M., Zhang, J., Alghamdi, M.I., Ruan, X., Muzaffar, T., & Qin, X. (2013). Thermal modeling of hybrid storage clusters. Journal of Signal Processing Systems, Springer US. https://doi.org/10.1007/s11265-013-0787-6.
Lee, E. K., & Thekkath, C. A. (1996). Petal: distributed virtual disks. In Proceedings of the seventh international conference on architectural support for programming languages and operating systems (pp. 84–92). Cambridge, Massachusetts.
Long, D.D.E., Montague, B.R., & Cabrera, L. (1994). Swift/raid: a distributed raid system. University of California at Santa Cruz, Santa Cruz, CA.
Watson, R.W., & Coyne, R.A. (1995). The parallel I/O architecture of the high-performance storage system (HPSS). In Proceedings of the 14th IEEE symposium on mass storage systems (p. 27).
Hartman, J.H., & Ousterhout, J.K. (1995). The Zebra striped network file system. ACM Transactions on Computer Systems (TOCS), 13(3), 274–310.
Article Google Scholar
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., & Gruber, R.E. (2008). Bigtable: a distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 26(2), 1–26.
Article Google Scholar
Tierney, B., Lee, J., Chen, L.T., Herzog, H., Hoo, G., Jin, G., & Johnston, W.E. (1994). Distributed parallel data storage systems: a scalable approach to high speed image servers. In Proceedings of the second ACM international conference on multimedia (pp. 399–405). San Francisco, CA.
Moyer, S.A., & Sunderam, V. (1994). PIOUS: a scalable parallel I/O system for distributed computing environments. In Proceedings of scalable high-performance computing conference (pp. 71–78). Knoxville, TN.
Cabrera, L., & Long, D.D.E. (1991). Swift: using distributed disk striping to provide high I/O data rates. University of California at Santa Cruz, Santa Cruz, CA.
Tierney, B.L., Johnston, W.E., Herzog, H., Hoo, G., Jin, G., Lee, J., Chen, L.T., & Rotem, D. (1994). Using high speed networks to enable distributed parallel image server systems. In Proceedings of the 1994 conference on supercomputing (pp. 610–619). Washington, D.C.
Feng, D., Zou, Q., Jiang, H., & et al. (2008). A novel model for synthesizing parallel i/o workloads in scientific applications. In Proceedings of the IEEE international conference on cluster computing (cluster’08). Tsukuba, Japan.
Wu, Y., Dimakis, A.G., & Ramchandran, K. (2007). Deterministic regenerating codes for distributed storage, presented at the Allerton Con. Control, Computing, and Communication, Urbana-Champaign IL.
Dimakis, A.G., Godfrey, P.B., Wu, Y., Wainwright, M.J., & Ramchandran, K. (2010). Network coding for distributed storage systems. IEEE Transactions on Information Theory, 56(9), 4539–4551.
Article Google Scholar
Narayan, S., & Chandy, J.A. (2007). Parity redundancy in a clustered storage system. In International workshop on storage network architecture and parallel I/Os, 2007. SNAPI., page(s): 17–24, volume: Issue:, 24–24.
D. Borthakur (2007). The hadoop distributed file system: architecture and design. The Apache Software Foundation. http://hadoop.apache.org/common/docs/r0.18.0/hdfs_design.pdf.
D. Borthakur (2008). HDFS architecture, the apache software foundation. http://hadoop.apache.org/common/docs/r0.20.0/hdfs_design.pdf.
Shafer, J., Rixner, S., & Cox, A. (2010). The Hadoop distributed filesystem: balancing portability and performance. In IEEE international symposium on performance analysis of systems & software (ISPASS) (pp. 122–133). White Plains, NY. https://doi.org/10.1109/ISPASS.2010.5452045
Moise, D., Antoniu, G., & Bougé, L. (2010). Improving the Hadoop map/reduce framework to support concurrent appends through the BlobSeer BLOB management system. In Proceedings of the 19th ACM international symposium on high performance distributed computing (HPDC ’10) (pp. 834–840). Chicago, IL. https://doi.org/10.1145/1851476.1851596
Dean, J., & Ghemawat, S. (2010). Mapreduce: a flexible data processing tool. Communications of the ACM, 53(1). https://doi.org/10.1145/1629175.1629198.
Dean, J., & Ghemawat, S. (2008). Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1). https://doi.org/10.1145/1327452.1327492.
Baker, M.G., Hartman, J.H., Kupfer, M.D., Shirriff, K.W., & Ousterhout, J.K. (1991). Measurements of a distributed file system. In Proceedings of the thirteenth ACM symposium on operating systems principles (pp. 198–212). Pacific Grove, California, United States. https://doi.org/10.1145/121132.121164
Spasojevic, M., & Satyanarayanan, M. (1996). An empirical study of a wide-area distributed file system. ACM Transactions on Computer Systems (TOCS), 14(2), 200–222. https://doi.org/10.1145/227695.227698.
Article Google Scholar
Satyanarayanan, M. (1990). Scalable, secure and highly available distributed file access. Computer, 23(5), 9–21.
Article Google Scholar
Ghemawat, S., Gobioff, H., & Leung, S. -T. (2003). The Google file system. In Proceedings of the nineteenth ACM symposium on operating systems principles. Bolton Landing, NY, USA. https://doi.org/10.1145/945445.945450
Weil, S.A., Brandt, S.A., Miller, E.L., Long, D.D.E., & Maltzahn, C. (2006). Ceph: a scalable, high-performance distributed file system. In Proceedings of the 7th symposium on operating systems design and implementation. Seattle, Washington.
Thekkath, C.A., Mann, T., & Lee, E.K. (1997). Frangipani: a scalable distributed file system. In Proceedings of the sixteenth ACM symposium on operating systems principles (pp. 224–237). Saint Malo, France. https://doi.org/10.1145/268998.266694
Siegel, A., Birman, K., & Marzullo, K. (1990). Deceit: a flexible distributed file system. In Proceedings of the workshop on the management of replicated data, 1990 (pp. 15–17). Houston, TX, USA.
Satyanarayanan, M., Howard, J.H., Nichols, D.A., Sidebotham, R.N., Spector, A.Z., & West, M.J. (1985). The ITC distributed file system: principles and design. In Proceedings of the tenth ACM symposium on operating systems principles (pp. 35–50). Orcas Island, Washington, United States. https://doi.org/10.1145/323647.323633
Satyanarayanan, M., Kistler, J.J., Kumar, P., Okasaki, M.E., Siegel, E.H., & Steer, D.C. (1990). Coda: a highly available file system for a distributed workstation environment. IEEE Transactions on Computers, 39(4), 447–459.
Article Google Scholar
Howard, J.H., Kazar, M.L., Menees, S.G., Nichols, D.A., Satyanarayanan, M., Sidebotham, R.N., & West, M.J. (1988). Scale and performance in a distributed file system. ACM Transactions on Computer Systems (TOCS), 6(1), 51–81. https://doi.org/10.1145/35037.35059.
Article Google Scholar
Rochberg, D., & Gibson, G.A. (1997). Prefetching over a network: early experience with CTIP. ACM SIGMETRICS Performance Evaluation Review, 25(3), 29–36.
Article Google Scholar
Al Assaf, M.M. Informed prefetching in distributed multi-level storage systems, http://hdl.handle.net/10415/2935.
Madhyastha, T., Gibson, G., & Faloutsos, C. (1999). Informed prefetching of collective input/output requests. In Proceedings of the 1999 ACM/IEEE conference on supercomputing (CDROM). Portland, Oregon.
Zhang, Z., Lee, K., Ma, X., & Zhou, Y. (2008). PFC: transparent optimization of existing prefetching strategies for multi-Level storage systems. In Proceedings of 28th international conference on distributed computing system (pp. 740–751). Beijing, China.
Hadoop archive guide http://hadoop.apache.org/mapreduce/docs/r0.21.0/hadoop_archives.html.
Lasr trace machine01, http://iotta.snia.org/traces/list/subtrace?parent=LASR+traces.
Lasr trace machine06, http://iotta.snia.org/traces/list/subtrace?parent=LASR+traces.
DELL Powerconnect 2824 switch, http://www.dell.com/us/business/p/powerconnect-2824/pd.
Lewis, J., Alghamdi, M.I., Assaf, M.A., Ruan, X.-J., Ding, Z.-Y., & Qin, X. (2010). An automatic prefetching and caching system. In Proceedings of the 29th international performance computing and communications conference (IPCCC).
Ramspeed cache and memory benchmarking tool http://alasir.com/software/ramspeed/.

Download references

Acknowledgments

Xiao Qin’s work is supported by the U.S. National Science Foundation under Grants IIS-1618669, CCF-0845257 (CAREER), CNS-0917137, CNS-0757778, CCF-0742187, CNS-0831502, CNS-0855251, and OCI-0753305. Jifu Zhang’s study is supported by the National Natural Science Foundation of P.R. China under grant No.61572343.

Author information

Authors and Affiliations

King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan
Maen M. Al Assaf
Computer Science Department, Earlham College, Richmond, IN, 47374, USA
Xunfei Jiang
Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, 36849, USA
Xiao Qin
Department of Computer Science, Al Akhawayn University, Ifrane, Morocco
Mohamed Riduan Abid
Department of Computer Science, Pace University, New York, NY, 10038, USA
Meikang Qiu
School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan, 030024, China
Jifu Zhang

Authors

Maen M. Al Assaf
View author publications
You can also search for this author in PubMed Google Scholar
Xunfei Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Qin
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Riduan Abid
View author publications
You can also search for this author in PubMed Google Scholar
Meikang Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Jifu Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maen M. Al Assaf.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Al Assaf, M.M., Jiang, X., Qin, X. et al. Informed Prefetching for Distributed Multi-Level Storage Systems. J Sign Process Syst 90, 619–640 (2018). https://doi.org/10.1007/s11265-017-1277-z

Download citation

Received: 06 October 2013
Revised: 11 May 2017
Accepted: 21 August 2017
Published: 30 August 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s11265-017-1277-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Informed Prefetching for Distributed Multi-Level Storage Systems

Abstract

Access this article

Similar content being viewed by others

I/O Acceleration via Multi-Tiered Data Buffering and Prefetching

APS: adaptable prefetching scheme to different running environments for concurrent read streams in distributed file systems

Application and user-specific data prefetching and parallel read algorithms for distributed file systems

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Informed Prefetching for Distributed Multi-Level Storage Systems

Abstract

Access this article

Similar content being viewed by others

I/O Acceleration via Multi-Tiered Data Buffering and Prefetching

APS: adaptable prefetching scheme to different running environments for concurrent read streams in distributed file systems

Application and user-specific data prefetching and parallel read algorithms for distributed file systems

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation