Abstract
Storing enormous amount of data on hybrid storage systems has become a widely accepted solution for today’s production level applications in order to trade off the performance and cost. However, how to improve the performance of large scale storage systems with hybrid components (e.g. solid state disks, hard drives and tapes) and complicated user behaviors is not fully explored. In this paper, we conduct an in-depth case study (we call it FastStor) on designing a high performance hybrid storage system to support one of the world’s largest satellite images distribution systems operated by the U.S. Geological Survey (USGS) Earth Resources Observation and Science (EROS) center. We demonstrate how to combine conventional caching policies with innovative current popularity oriented and user-specific prefetching algorithms to improve the performance of the EROS system. We evaluate the effectiveness of our proposed solution using over 5 million real world user download requests provided by EROS. Our experimental results show that using the Least Recently Used (LRU) caching policy alone, we are able to achieve an overall 64 % or 70 % hit ratio on a 100 TB or 200 TB FTP server farm composed of Solid State Disks (SSDs) respectively. The hit ratio can be further improved to 70 % (for 100 TB SSDs) and 76 % (for 200 TB SSDs) if intelligent prefetching algorithms are used together with LRU.
Similar content being viewed by others
References
Lyman, P., Varian, H.R.: How much information 2003. Retrieved from http://www.sims.berkeley.edu/how-much-info-2003 on May 28, 2012
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. In: Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles, pp. 205–220 (2007)
Beaver, D., Kumar, S., Li, H.C., Sobel, J., Vajgel, P.: Finding a needle in haystack: facebook’s photo storage. In: Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation (2010)
Hybrid Storage Solutions, Powerfile Technical Report. Retrieved from http://www.imconsulting.ca/files/IMConsultingPowerfileHybridStorageBrochure.pdf in September 2012
Faundeen, J.: Archiving strategy for USGS EROS center and our future direction. In: Proceedings of 2010 Roadmap for Digital Preservation Interoperability Framework Workshop, vol. 5 (2010)
Smith, A.J.: Design of CPU cache memories. In: Proceedings of IEEE TENCON (1987)
Chou, H.-T., Dewitt, D.J.: An evaluation of buffer management strategies for relational database systems. In: Proceedings of International Conference on Very Large Databases (VLDB) (1985)
Dar, S., Franklin, M.J., Jonsson, B., Srivastava, D., Tan, M.: Semantic data caching and replacement. In: Proceedings of International Conference on Very Large Databases (VLDB) (1996)
Megiddo, N., Modha, D.S.: ARC: a self-tuning, low overhead replacement cache. In: Proceedings of USENIX Conference on File and Storage Technologies (FAST), pp. 115–130 (2003)
Zhou, Y., Philbin, J., Li, K.: The multi-queue replacement algorithm for second level buffer caches. In: Proceedings of USENIX Technical Conference (2001)
Fares, R., Romoser, B., Qin, X., Nijim, M., Zong, Z.: Performance evaluation of traditional caching policies on a large system with petabytes of data. In: Proceedings of the 7th IEEE International Conference on Networking, Architecture, and Storage (2012)
Butt, A.R., Gniady, C., Hu, Y.C.: The performance impact of Kernel prefetching on buffer cache replacement algorithms. IEEE Trans. Comput. 56(7), 889–908 (2007)
Grimsrud, K.S., Archibald, J.K., Nelson, B.E.: Multiple prefetch adaptive disk caching. IEEE Trans. Knowl. Data Eng. 5(1), 88–103 (1993)
Jeon, H.S.: Practical buffer cache management scheme based on simple prefetching. IEEE Trans. Consum. Electron. 52(3), 926–934 (2006)
Jeon, J., Lee, G., Cho, H., Ahn, B.: A prefetching web caching method using adaptive search patterns. In: Proceedings of the IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing, vol. 1, pp. 37–40 (2003)
Cao, P., Felten, E.W., Karlin, A.R., Li, K.: A study of integrated prefetching and caching strategies. In: Proceedings of the 1995 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems, vol. 23, pp. 188–197 (1995)
Lan, B., Bressan, S., Ooi, B., Tan, K.: Rule-assisted prefetching in web-server caching. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, pp. 504–511 (2000)
Nanopoulos, A., Katsaros, D., Manolopoulos, Y.: A data mining algorithm for generalized web prefetching. IEEE Trans. Knowl. Data Eng. 15(5), 1155–1169 (2003)
Griffoen, J., Appleton, R.: Reducing file system latency using a predictive approach. In: Proceedings of the Summer USENIX Technical Conference, vol. 1, p. 13 (1995)
Gindele, J.D.: Buffer block prefetching method. IBM Tech. Dis. Bull. 20(2), 696–697 (1977)
Smith, A.: Cache memories. ACM Comput. Surv. 14(3), 473–530 (1982)
Srinivasan, V., Davidson, E., Tyson, G.: A prefetch taxonomy. IEEE Trans. Comput. 53(2), 126–140 (2004)
Azevedo, D., Oliveira, J.: Application of data mining techniques to the storage management and online distribution of satellite image. In: Proceedings of the Seventh International Conference on Intelligent Systems Design and Applications, pp. 930–955 (2007)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Romoser, B., Fares, R., Janovics, P., Ruan, X.J., Qin, X., Zong, Z.L.: Global workload characterization of a large scale satellite image distribution system. In: Proceedings of the 2012 IEEE International Performance Computing and Communications Conference (2012)
Zong, Z.L., Romoser, B.: Architecture design of a data intensive satellite image processing and distribution system. In: International Workshop on Data-Intensive Scalable Computing Systems in Conjunction with the 2012 ACM/IEEE Supercomputing Conference (2012)
Acknowledgements
The authors sincerely appreciate the comments and feedback from the anonymous reviewers. The work reported in this paper is supported by the U.S. National Science Foundation under Grants No. CNS-0915762 and CNS-1212535. We also gratefully acknowledge the support from the U.S. Geological Survey (USGS) Earth Resources Observation and Science (EROS) center.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zong, Z., Fares, R., Romoser, B. et al. FastStor: improving the performance of a large scale hybrid storage system via caching and prefetching. Cluster Comput 17, 593–604 (2014). https://doi.org/10.1007/s10586-013-0304-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-013-0304-5