Skip to main content

Learning I/O Access Patterns to Improve Prefetching in SSDs

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases: Applied Data Science Track (ECML PKDD 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12460))

Abstract

Flash based solid state drives (SSDs) have established themselves as a higher-performance alternative to hard disk drives in cloud and mobile environments. Nevertheless, SSDs remain a performance bottleneck of computer systems due to their high I/O access latency. A common approach for improving the access latency is prefetching. Prefetching predicts future block accesses and preloads them into main memory ahead of time. In this paper, we discuss the challenges of prefetching in SSDs, explain why prior approaches fail to achieve high accuracy, and present a neural network based prefetching approach that significantly outperforms the state-of the-art. To achieve high performance, we address the challenges of prefetching in very large sparse address spaces, as well as prefetching in a timely manner by predicting ahead of time. We collect I/O trace files from several real-world applications running on cloud servers and show that our proposed approach consistently outperforms the existing stride prefetchers by up to 800\(\times \) and prior prefetching approaches based on Markov chains by up to 8\(\times \). Furthermore, we propose an address mapping learning technique to demonstrate the applicability of our approach to previously unseen SSD workloads and perform a hyperparameter sensitivity study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Microsoft snia: Traces. http://iotta.snia.org/traces/4928

  2. Msr cambridge traces. http://iotta.snia.org/traces/388

  3. Abadi, M., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)

  4. Ali, W., Shamsuddin, S.M., Ismail, A.S., et al.: A survey of web caching and prefetching. Int. J. Adv. Soft Comput. Appl 3(1), 18–44 (2011)

    Google Scholar 

  5. Averbouch, I., Birnbaum, A.J., Hsieh, J.T., Shum, C.L.K.: Automatic pattern-based operand prefetching, 10 Feb 2015. uS Patent 8,954,678

    Google Scholar 

  6. Axboe, J.: Fio-flexible i/o tester synthetic benchmark (2005). https://github.com/axboe/fio. Accessed 13 June 2015

  7. Boboila, S., Desnoyers, P.: Performance models of flash-based solid-state drives for real workloads. In: 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–6. IEEE (2011)

    Google Scholar 

  8. Bradford, J.P., Kossman, H.F., Mullins, T.J.: Context switch instruction prefetching in multithreaded computer, 10 Nov 2009. uS Patent 7,617,499

    Google Scholar 

  9. Callahan, D., Kennedy, K., Porterfield, A.: Software prefetching. ACM SIGARCH Comput. Architect. News 19(2), 40–52 (1991)

    Article  Google Scholar 

  10. Chakraborttii, C., Sinha, V., Litz, H.: SSD QOS improvements through machine learning. In: Proceedings of the ACM Symposium on Cloud Computing, p. 511 (2018)

    Google Scholar 

  11. Chung, K.L.: Markov Chains. Springer, New York (1967). https://doi.org/10.1007/978-3-642-62015-7

    Book  Google Scholar 

  12. Da Zheng, R.B., Szalay, A.S.: A parallel page cache: IOPS and caching for multicore systems. In: Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems, p. 5 (2012)

    Google Scholar 

  13. Dartois, J.E., Boukhobza, J., Knefati, A., Barais, O.: Investigating machine learning algorithms for modeling SSD I/O performance for container-based virtualization. IEEE Trans. Cloud Comput (2019)

    Google Scholar 

  14. Do, J., Kee, Y.S., Patel, J.M., Park, C., Park, K., DeWitt, D.J.: Query processing on smart SSDS: opportunities and challenges. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 1221–1230 (2013)

    Google Scholar 

  15. Fengguang, W., Hongsheng, X., Chenfeng, X.: On the design of a new linux readahead framework. ACM SIGOPS Oper. Syst. Rev. 42(5), 75–84 (2008)

    Article  Google Scholar 

  16. Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016)

    Google Scholar 

  17. Han, W.S., Whang, K.Y., Moon, Y.S.: A formal framework for prefetching based on the type-level access pattern in object-relational DBMSS. IEEE Trans. Knowl. Data Eng. 17(10), 1436–1448 (2005)

    Article  Google Scholar 

  18. Hashemi, M., et al.: Learning memory access patterns. arXiv preprint arXiv:1803.02329 (2018)

  19. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  20. Iacobovici, S., Kadambi, S., Chou, Y.C.: Multi-stride prefetcher with a recurring prefetch table, 3 Feb 2009. uS Patent 7,487,296

    Google Scholar 

  21. Iwasaki, T.O., Ning, S., Yamazawa, H., Sun, C., Tanakamaru, S., Takeuchi, K.: Machine learning prediction for 13x endurance enhancement in reram ssd system. In: 2015 IEEE International Memory Workshop (IMW), pp. 1–4. IEEE (2015)

    Google Scholar 

  22. Kavalanekar, S., Worthington, B., Zhang, Q., Sharda, V.: Characterization of storage workload traces from production windows servers. In: 2008 IEEE International Symposium on Workload Characterization, pp. 119–128. IEEE (2008)

    Google Scholar 

  23. Ki, A., Knowles, A.E.: Stride prefetching for the secondary data cache. J. Syst. Architect. 46(12), 1093–1102 (2000)

    Article  Google Scholar 

  24. Kim, H., Ramachandran, U.: Flashfire: overcoming the performance bottleneck of flash storage technology. Technical report, Georgia Institute of Technology (2010)

    Google Scholar 

  25. Kondguli, S., Huang, M.: T2: a highly accurate and energy efficient stride prefetcher. In: 2017 IEEE International Conference on Computer Design (ICCD), pp. 373–376. IEEE (2017)

    Google Scholar 

  26. Laga, A., Boukhobza, J., Koskas, M., Singhoff, F.: Lynx: a learning linux prefetching mechanism for SSD performance model. In: 2016 5th Non-Volatile Memory Systems and Applications Symposium (NVMSA), pp. 1–6. IEEE (2016)

    Google Scholar 

  27. Lee, C., Kumano, T., Matsuki, T., Endo, H., Fukumoto, N., Sugawara, M.: Understanding storage traffic characteristics on enterprise virtual desktop infrastructure. In: Proceedings of the 10th ACM International Systems and Storage Conference, pp. 1–11 (2017)

    Google Scholar 

  28. Li, B., Deng, C., Yang, J., Lilja, D., Yuan, B., Du, D.: HAML-SSD: a hardware accelerated hotness-aware machine learning based SSD management. In: 38th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2019, p. 8942140. Institute of Electrical and Electronics Engineers Inc. (2019)

    Google Scholar 

  29. Li, M., Varki, E., Bhatia, S., Merchant, A.: Tap: table-based prefetching for storage caches. In: FAST, vol. 8, pp. 1–16 (2008)

    Google Scholar 

  30. Liu, C.C., Ganusov, I., Burtscher, M., Tiwari, S.: Bridging the processor-memory performance gap with 3D IC technology. IEEE Design Test Comput. 22(6), 556–564 (2005)

    Article  Google Scholar 

  31. Liu, R.S., Yang, C.L., Li, C.H., Chen, G.Y.: DuraCache: a durable SSD cache using MLC NAND flash. In: Proceedings of the 50th Annual Design Automation Conference, pp. 1–6 (2013)

    Google Scholar 

  32. Liu, W., Wen, Y., Yu, Z., Yang, M.: Large-margin softmax loss for convolutional neural networks. In: ICML, vol. 2, p. 7 (2016)

    Google Scholar 

  33. Mehra, P.: Samsung smartSSD: accelerating data-rich applications. Flash Memory Summit

    Google Scholar 

  34. Mohan, V., Siddiqua, T., Gurumurthi, S., Stan, M.R.: How i learned to stop worrying and love flash endurance. HotStorage 10, 3 (2010)

    Google Scholar 

  35. Mowry, T.C., Demke, A.K., Krieger, O., et al.: Automatic compiler-inserted I/O prefetching for out-of-core applications. In: OSDI, vol. 96, pp. 3–17 (1996)

    Google Scholar 

  36. Narayanan, D., Donnelly, A., Rowstron, A.: Write off-loading: practical power management for enterprise storage. ACM Trans. Storage (TOS) 4(3), 1–23 (2008)

    Article  Google Scholar 

  37. Narayanan, I., et al.: SSD failures in datacenters: What? When? and Why? In: Proceedings of the 9th ACM International on Systems and Storage Conference, pp. 1–11 (2016)

    Google Scholar 

  38. Nijim, M.: Modelling speculative prefetching for hybrid storage systems. In: 2010 IEEE Fifth International Conference on Networking, Architecture, and Storage, pp. 143–151. IEEE (2010)

    Google Scholar 

  39. Nijim, M., Zong, Z., Qin, X., Nijim, Y.: Multi-layer prefetching for hybrid storage systems: algorithms, models, and evaluations. In: 2010 39th International Conference on Parallel Processing Workshops, pp. 44–49. IEEE (2010)

    Google Scholar 

  40. Srinath, S., Mutlu, O., Kim, H., Patt, Y.N.: Feedback directed prefetching: improving the performance and bandwidth-efficiency of hardware prefetchers (2006)

    Google Scholar 

  41. Pike, R.: Storage mechanism with variable block size, 13 Mar 2014. uS Patent App. 13/612,968

    Google Scholar 

  42. Puzak, T.R.: Analysis of cache replacement-algorithms (1986)

    Google Scholar 

  43. Rodeh, O., Bacik, J., Mason, C.: BTRFS: The linux b-tree filesystem. ACM Trans. Storage (TOS) 9(3), 1–32 (2013)

    Article  Google Scholar 

  44. Santos, J.R., Muntz, R.R., Ribeiro-Neto, B.: Comparing random data allocation and data striping in multimedia servers. ACM SIGMETRICS Perform. Eval. Rev. 28(1), 44–55 (2000)

    Article  Google Scholar 

  45. Smith, K.: Garbage collection. SandForce, Flash Memory Summit, Santa Clara, CA, pp. 1–9 (2011)

    Google Scholar 

  46. Tato, A., Nkambou, R.: Improving Adam optimizer (2018)

    Google Scholar 

  47. Wu, G., He, X.: Reducing SSD read latency via NAND flash program and erase suspension. In: FAST, vol. 12, p. 10 (2012)

    Google Scholar 

  48. Xu, R., Jin, X., Tao, L., Guo, S., Xiang, Z., Tian, T.: An efficient resource-optimized learning prefetcher for solid state drives. In: 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 273–276. IEEE (2018)

    Google Scholar 

  49. Xue, B., Fu, C., Shaobin, Z.: A study on sentiment computing and classification of Sina Weibo with Word2vec. In: 2014 IEEE International Congress on Big Data, pp. 358–363. IEEE (2014)

    Google Scholar 

  50. Zeng, Y.: Long short term based memory hardware prefetcher (2017)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by Samsung Semiconductor, Inc. and in part by NSF grants CCF-1823559 and CCF-1942754.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chandranil Chakraborttii .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chakraborttii, C., Litz, H. (2021). Learning I/O Access Patterns to Improve Prefetching in SSDs. In: Dong, Y., Mladenić, D., Saunders, C. (eds) Machine Learning and Knowledge Discovery in Databases: Applied Data Science Track. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12460. Springer, Cham. https://doi.org/10.1007/978-3-030-67667-4_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67667-4_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67666-7

  • Online ISBN: 978-3-030-67667-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics