Skip to main content
Log in

FastStor: improving the performance of a large scale hybrid storage system via caching and prefetching

  • Original Paper
  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Storing enormous amount of data on hybrid storage systems has become a widely accepted solution for today’s production level applications in order to trade off the performance and cost. However, how to improve the performance of large scale storage systems with hybrid components (e.g. solid state disks, hard drives and tapes) and complicated user behaviors is not fully explored. In this paper, we conduct an in-depth case study (we call it FastStor) on designing a high performance hybrid storage system to support one of the world’s largest satellite images distribution systems operated by the U.S. Geological Survey (USGS) Earth Resources Observation and Science (EROS) center. We demonstrate how to combine conventional caching policies with innovative current popularity oriented and user-specific prefetching algorithms to improve the performance of the EROS system. We evaluate the effectiveness of our proposed solution using over 5 million real world user download requests provided by EROS. Our experimental results show that using the Least Recently Used (LRU) caching policy alone, we are able to achieve an overall 64 % or 70 % hit ratio on a 100 TB or 200 TB FTP server farm composed of Solid State Disks (SSDs) respectively. The hit ratio can be further improved to 70 % (for 100 TB SSDs) and 76 % (for 200 TB SSDs) if intelligent prefetching algorithms are used together with LRU.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Lyman, P., Varian, H.R.: How much information 2003. Retrieved from http://www.sims.berkeley.edu/how-much-info-2003 on May 28, 2012

  2. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. In: Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles, pp. 205–220 (2007)

    Chapter  Google Scholar 

  3. Beaver, D., Kumar, S., Li, H.C., Sobel, J., Vajgel, P.: Finding a needle in haystack: facebook’s photo storage. In: Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation (2010)

    Google Scholar 

  4. Hybrid Storage Solutions, Powerfile Technical Report. Retrieved from http://www.imconsulting.ca/files/IMConsultingPowerfileHybridStorageBrochure.pdf in September 2012

  5. Faundeen, J.: Archiving strategy for USGS EROS center and our future direction. In: Proceedings of 2010 Roadmap for Digital Preservation Interoperability Framework Workshop, vol. 5 (2010)

    Google Scholar 

  6. http://earthexplorer.usgs.gov

  7. http://glovis.usgs.gov

  8. http://landsat.gsfc.nasa.gov

  9. Smith, A.J.: Design of CPU cache memories. In: Proceedings of IEEE TENCON (1987)

    Google Scholar 

  10. Chou, H.-T., Dewitt, D.J.: An evaluation of buffer management strategies for relational database systems. In: Proceedings of International Conference on Very Large Databases (VLDB) (1985)

    Google Scholar 

  11. Dar, S., Franklin, M.J., Jonsson, B., Srivastava, D., Tan, M.: Semantic data caching and replacement. In: Proceedings of International Conference on Very Large Databases (VLDB) (1996)

    Google Scholar 

  12. Megiddo, N., Modha, D.S.: ARC: a self-tuning, low overhead replacement cache. In: Proceedings of USENIX Conference on File and Storage Technologies (FAST), pp. 115–130 (2003)

    Google Scholar 

  13. Zhou, Y., Philbin, J., Li, K.: The multi-queue replacement algorithm for second level buffer caches. In: Proceedings of USENIX Technical Conference (2001)

    Google Scholar 

  14. Fares, R., Romoser, B., Qin, X., Nijim, M., Zong, Z.: Performance evaluation of traditional caching policies on a large system with petabytes of data. In: Proceedings of the 7th IEEE International Conference on Networking, Architecture, and Storage (2012)

    Google Scholar 

  15. Butt, A.R., Gniady, C., Hu, Y.C.: The performance impact of Kernel prefetching on buffer cache replacement algorithms. IEEE Trans. Comput. 56(7), 889–908 (2007)

    Article  MathSciNet  Google Scholar 

  16. Grimsrud, K.S., Archibald, J.K., Nelson, B.E.: Multiple prefetch adaptive disk caching. IEEE Trans. Knowl. Data Eng. 5(1), 88–103 (1993)

    Article  Google Scholar 

  17. Jeon, H.S.: Practical buffer cache management scheme based on simple prefetching. IEEE Trans. Consum. Electron. 52(3), 926–934 (2006)

    Article  Google Scholar 

  18. Jeon, J., Lee, G., Cho, H., Ahn, B.: A prefetching web caching method using adaptive search patterns. In: Proceedings of the IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing, vol. 1, pp. 37–40 (2003)

    Google Scholar 

  19. Cao, P., Felten, E.W., Karlin, A.R., Li, K.: A study of integrated prefetching and caching strategies. In: Proceedings of the 1995 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems, vol. 23, pp. 188–197 (1995)

    Google Scholar 

  20. Lan, B., Bressan, S., Ooi, B., Tan, K.: Rule-assisted prefetching in web-server caching. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, pp. 504–511 (2000)

    Google Scholar 

  21. Nanopoulos, A., Katsaros, D., Manolopoulos, Y.: A data mining algorithm for generalized web prefetching. IEEE Trans. Knowl. Data Eng. 15(5), 1155–1169 (2003)

    Article  Google Scholar 

  22. Griffoen, J., Appleton, R.: Reducing file system latency using a predictive approach. In: Proceedings of the Summer USENIX Technical Conference, vol. 1, p. 13 (1995)

    Google Scholar 

  23. Gindele, J.D.: Buffer block prefetching method. IBM Tech. Dis. Bull. 20(2), 696–697 (1977)

    Google Scholar 

  24. Smith, A.: Cache memories. ACM Comput. Surv. 14(3), 473–530 (1982)

    Article  Google Scholar 

  25. Srinivasan, V., Davidson, E., Tyson, G.: A prefetch taxonomy. IEEE Trans. Comput. 53(2), 126–140 (2004)

    Article  Google Scholar 

  26. Azevedo, D., Oliveira, J.: Application of data mining techniques to the storage management and online distribution of satellite image. In: Proceedings of the Seventh International Conference on Intelligent Systems Design and Applications, pp. 930–955 (2007)

    Google Scholar 

  27. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)

    Article  Google Scholar 

  28. http://en.wikipedia.org/wiki/C4.5_algorithm

  29. http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

  30. http://en.wikipedia.org/wiki/Naive_Bayes_classifier

  31. http://en.wikipedia.org/wiki/Bayesian_network

  32. http://en.wikipedia.org/wiki/Support_vector_machine

  33. Romoser, B., Fares, R., Janovics, P., Ruan, X.J., Qin, X., Zong, Z.L.: Global workload characterization of a large scale satellite image distribution system. In: Proceedings of the 2012 IEEE International Performance Computing and Communications Conference (2012)

    Google Scholar 

  34. Zong, Z.L., Romoser, B.: Architecture design of a data intensive satellite image processing and distribution system. In: International Workshop on Data-Intensive Scalable Computing Systems in Conjunction with the 2012 ACM/IEEE Supercomputing Conference (2012)

    Google Scholar 

Download references

Acknowledgements

The authors sincerely appreciate the comments and feedback from the anonymous reviewers. The work reported in this paper is supported by the U.S. National Science Foundation under Grants No. CNS-0915762 and CNS-1212535. We also gratefully acknowledge the support from the U.S. Geological Survey (USGS) Earth Resources Observation and Science (EROS) center.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ziliang Zong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zong, Z., Fares, R., Romoser, B. et al. FastStor: improving the performance of a large scale hybrid storage system via caching and prefetching. Cluster Comput 17, 593–604 (2014). https://doi.org/10.1007/s10586-013-0304-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-013-0304-5

Keywords

Navigation