skip to main content
10.1145/3620666.3651332acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections

PATHFINDER: Practical Real-Time Learning for Data Prefetching

Published:27 April 2024Publication History

ABSTRACT

Data prefetching is vital in high-performance processors and a large body of research has introduced a number of different approaches for accurate prefetching: stride detection, address correlating prefetchers, delta pattern detection, irregular pattern detection, etc. Most recently, a few works have leveraged advances in machine learning and deep neural networks to design prefetchers. These neural-inspired prefetchers observe data access patterns and develop a trained model that can then make accurate predictions for future accesses. A significant impediment to the success of these prefetchers is their high implementation cost, for both inference and training. These models cannot be trained in real-time, i.e., they have to be trained beforehand with a large benchmark suite. This results in a large model (that increases the overhead for inference), and the model can only successfully predict patterns that are similar to patterns in the training set.

In this work, we explore the potential of using spiking neural networks to learn and predict data access patterns, and specifically address deltas. While prior work has leaned on the recent success of trained artificial neural networks, we hypothesize that spiking neural networks are a better fit for real-time data prefetching. Spiking neural networks rely on the STDP algorithm to learn while performing inference - it is a low-cost and local learning algorithm that can quickly observe and react to the current stream of accesses. It is therefore possible to achieve high accuracy on previously unseen access patterns with a relatively small spiking neural network. This paper makes the case that spiking neurons and STDP offer a complexity-effective approach to leverage machine learning for data prefetching. We show that the proposed Pathfinder prefetcher is competitive with other state-of-the-art prefetchers on a range of benchmarks. Pathfinder can be implemented at 0.5 W and an area footprint of only 0.23 mm2 (12 nm technology). This work shows that neural-inspired prefetching can be both practical and capable of high performance, but more innovations in SNN/STDP prefetch algorithms are required to fully maximize their potential.

References

  1. The 2nd cache replacement championship. https://crc2.ece.tamu.edu/.Google ScholarGoogle Scholar
  2. Ml-based data prefetching competition. https://sites.google.com/view/mlarchsys/isca-2021/ml-prefetching-competition.Google ScholarGoogle Scholar
  3. J.L. Baer and T.F. Chen. An Effective On-Chip Preloading Scheme to Reduce Data Access Penalty. In Proceedings of Supercomputing, 1991.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Mohammad Bakhshalipour, Mehran Shakerinava, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. Bingo spatial data prefetcher. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 399--411. IEEE, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  5. R. Balasubramonian, A.B. Kahng, N. Muralimanohar, A. Shafiee, and V. Srinivas. CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories. ACM TACO, 14(2), 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Rahul Bera, Konstantinos Kanellopoulos, Anant V. Nori, Taha Shahroodi, Sreenivas Subramoney, and Onur Mutlu. Pythia: A customizable hardware prefetching framework using online reinforcement learning. CoRR, abs/2109.12021, 2021.Google ScholarGoogle Scholar
  7. Guo-qiang Bi and Mu-ming Poo. Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. Journal of neuroscience, 18(24):10464--10472, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  8. Andrew S. Cassidy, Jun Sawada, Paul Merolla, John V. Arthur, Rodrigo Alvarez-Icaza, Filipp Akopyan, Bryan L. Jackson, and Dharmendra S. Modha. Truenorth: A high-performance, low-power neurosynaptic processor for multi-sensory perception, action, and cognition. 2016.Google ScholarGoogle Scholar
  9. Chi F Chen, S-H Yang, Babak Falsafi, and Andreas Moshovos. Accurate and complexity-effective spatial pattern prediction. In 10th International Symposium on High Performance Computer Architecture (HPCA'04), pages 276--287. IEEE, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Tien-Fu Chen and Jean-Loup Baer. Effective hardware-based data prefetching for high-performance processors. IEEE transactions on computers, 44(5):609--623, 1995.Google ScholarGoogle Scholar
  11. Mike Davies, Narayan Srinivasa, Tsung-Han Lin, Gautham Chinya, Prasad Joshi, Andrew Lines, Andreas Wild, Hong Wang, and Deepak Mathaikutty. Loihi: A neuromorphic manycore processor with on-chip learning. IEEE Micro, PP:1--1, 01 2018.Google ScholarGoogle Scholar
  12. P. Diehl and M. Cook. Unsupervised learning of digit recognition using spike-timing-dependent plasticity. In Frontiers in Computational Neuroscience, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  13. Peter Diehl and Matthew Cook. Unsupervised learning of digit recognition using spike-timing-dependent plasticity. Frontiers in Computational Neuroscience, 9, 2015.Google ScholarGoogle Scholar
  14. John WC Fu, Janak H Patel, and Bob L Janssens. Stride directed prefetching in scalar processors. ACM SIGMICRO Newsletter, 23(1-2):102--110, 1992.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jeff Gehlhaar. Neuromorphic processing: a new frontier in scaling computer architecture. In ASPLOS, pages 317--318, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Milad Hashemi, Kevin Swersky, Jamie Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, and Parthasarathy Ranganathan. Learning memory access patterns. In International Conference on Machine Learning. PMLR, 2018.Google ScholarGoogle Scholar
  17. Milad Hashemi, Kevin Swersky, Jamie A. Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, and Parthasarathy Ranganathan. Learning memory access patterns. CoRR, abs/1803.02329, 2018.Google ScholarGoogle Scholar
  18. Hananel Hazan, Daniel J. Saunders, Hassaan Khan, Devdhar Patel, Darpan T. Sanghavi, Hava T. Siegelmann, and Robert Kozma. Bindsnet: A machine learning-oriented spiking neural networks library in python. Frontiers in Neuroinformatics, 12, 2018.Google ScholarGoogle Scholar
  19. Sorin Iacobovici, Lawrence Spracklen, Sudarshan Kadambi, Yuan Chou, and Santosh G Abraham. Effective stream-based and execution-based data prefetching. In Proceedings of the 18th annual international conference on Supercomputing, pages 1--11, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Akanksha Jain and Calvin Lin. Linearizing irregular memory accesses for improved correlated prefetching. In 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 247--259, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Akanksha Jain and Calvin Lin. Linearizing irregular memory accesses for improved correlated prefetching. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, pages 247--259, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Teresa L Johnson, Matthew C Merten, and Wen-Mei W Hwu. Runtime spatial locality detection and optimization. In Proceedings of 30th Annual International Symposium on Microarchitecture, pages 57--64. IEEE, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Norman P Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. ACM SIGARCH Computer Architecture News, 18(2SI):364--373, 1990.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M.M. Khan, D.R. Lester, L.A. Plana, A. Rast, X. Jin, E. Painkras, and S.B. Furber. Spinnaker: Mapping neural networks onto a massively-parallel chip multiprocessor. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pages 2849--2856, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  25. Saeed Reza Kheradpisheh, Mohammad Ganjtabesh, Simon J. Thorpe, and Timothée Masquelier. Stdp-based spiking deep neural networks for object recognition. CoRR, abs/1611.01421, 2016.Google ScholarGoogle Scholar
  26. Jinchun Kim, Seth H Pugsley, Paul V Gratz, AL Narasimha Reddy, Chris Wilkerson, and Zeshan Chishti. Path confidence based lookahead prefetching. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE.Google ScholarGoogle Scholar
  27. An-Chow Lai, Cem Fide, and Babak Falsafi. Dead-block prediction & dead-block correlating prefetchers. In Proceedings 28th annual international symposium on computer architecture, pages 144--154. IEEE, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Beiye Liu, Yiran Chen, Bryant Wysocki, and Tingwen Huang. Re-configurable neuromorphic computing system with memristor-based synapse design. Neural Processing Letters, 41(2):159--167, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Chenchen Liu, Bonan Yan, Chaofei Yang, Linghao Song, Zheng Li, Beiye Liu, Yiran Chen, Hai Li, Qing Wu, and Hao Jiang. A spiking neuromorphic design with resistive crossbar. In 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC), pages 1--6. IEEE, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Pierre Michaud. A best-offset prefetcher. In 2nd Data Prefetching Championship, 2015.Google ScholarGoogle Scholar
  31. Pierre Michaud. Best-offset hardware prefetching. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 469--480, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  32. Sparsh Mittal. A survey of recent prefetching techniques for processor caches. ACM Computing Surveys (CSUR), 49(2):1--35, 2016.Google ScholarGoogle Scholar
  33. Surya Narayanan, Ali Shafiee, and Rajeev Balasubramonian. Inxs: Bridging the throughput and energy gap for spiking neural networks. In 2017 International Joint Conference on Neural Networks (IJCNN), pages 2451--2459, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  34. Surya Narayanan, Karl Taht, Rajeev Balasubramonian, Edouard Giacomin, and Pierre-Emmanuel Gaillardon. Spinalflow: An architecture and dataflow tailored for spiking neural networks. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), pages 349--362, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.Google ScholarGoogle Scholar
  36. Leeor Peled, Shie Mannor, Uri Weiser, and Yoav Etsion. Semantic locality and context-based prefetching using reinforcement learning. SIGARCH Comput. Archit. News, 43(3S):285--297, jun 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Seth H Pugsley, Zeshan Chishti, Chris Wilkerson, Peng-fei Chuang, Robert L Scott, Aamer Jaleel, Shih-Lien Lu, Kingsum Chow, and Rajeev Balasubramonian. Sandbox prefetching: Safe run-time evaluation of aggressive prefetchers. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), pages 626--637, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  38. Seth H Pugsley, Zeshan Chishti, Chris Wilkerson, Peng-fei Chuang, Robert L Scott, Aamer Jaleel, Shih-Lien Lu, Kingsum Chow, and Rajeev Balasubramonian. Sandbox prefetching: Safe run-time evaluation of aggressive prefetchers. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), pages 626--637. IEEE, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  39. A. Seznec. A New Case for the TAGE Branch Predictor. In Proceedings of MICRO, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. Shevgoor, S. Koladiya, R. Balasubramonian, S. Pugsley, C. Wilkerson, and Z. Chishti. Efficiently Prefetching Complex Address Patterns. In Proceedings of MICRO, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Zhan Shi, Akanksha Jain, Kevin Swersky, Milad Hashemi, Parthasarathy Ranganathan, and Calvin Lin. A hierarchical neural model of data prefetching. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2021, page 861--873, New York, NY, USA, 2021. Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Zhan Shi, Akanksha Jain, Kevin Swersky, Milad Hashemi, Parthasarathy Ranganathan, and Calvin Lin. Voyager github repository. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2021, page 861--873, New York, NY, USA, 2021. Association for Computing Machinery.Google ScholarGoogle Scholar
  43. Ivan Sklenář. Prefetch unit for vector operations on scalar computers. ACM SIGARCH Computer Architecture News, 20(4):31--37, 1992.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Stephen Somogyi, Thomas F Wenisch, Anastasia Ailamaki, and Babak Falsafi. Spatio-temporal memory streaming. ACM SIGARCH Computer Architecture News, 37(3):69--80, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Stephen Somogyi, Thomas F Wenisch, Anastassia Ailamaki, Babak Falsafi, and Andreas Moshovos. Spatial memory streaming. ACM SIGARCH Computer Architecture News, 34(2):252--263, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Ajitesh Srivastava, Aggelos Lazaris, Benjamin Brooks, Rajgopal Kannan, and Viktor Prasanna. Predicting memory accesses: the road to compact ml-driven prefetcher. pages 461--470, 09 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Synopsys. Synopsys Design Compiler User Guide. http://www.synopsys.com/\Tools/Implementation/RTLSynthesis/DCUltra/Pages/.Google ScholarGoogle Scholar
  48. Tianqi Tang, Lixue Xia, Boxun Li, Rong Luo, Yiran Chen, Yu Wang, and Huazhong Yang. Spiking neural network with rram: Can we use it for real-world application? In 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 860--865, 2015.Google ScholarGoogle Scholar
  49. WikiChip. Ryzen 7 2700X - AMD. https://en.wikichip.org/wiki/amd/ryzen_7/2700x.Google ScholarGoogle Scholar
  50. Michael Wu, Ketaki Joshi, Andrew Sheinberg, Guilherme Cox, Anurag Khandelwal, Raghavendra Pradyumna Pothukuchi, and Abhishek Bhattacharjee. Prefetching using principles of hippocampal-neocortical interaction. In Proceedings of the 19th Workshop on Hot Topics in Operating Systems, HOTOS '23, page 53--60, New York, NY, USA, 2023. Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Pengmiao Zhang, Rajgopal Kannan, Ajitesh Srivastava, Anant V Nori, and Viktor K Prasanna. Resemble: reinforced ensemble framework for data prefetching. In SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1--14. IEEE, 2022.Google ScholarGoogle Scholar
  52. Pengmiao Zhang, Ajitesh Srivastava, Benjamin Brooks, Rajgopal Kannan, and Viktor K. Prasanna. Raop: Recurrent neural network augmented offset prefetcher. In The International Symposium on Memory Systems, MEMSYS 2020, page 352--362, New York, NY, USA, 2020. Association for Computing Machinery.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Article Metrics

    • Downloads (Last 12 months)104
    • Downloads (Last 6 weeks)104

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader