PATHFINDER: Practical Real-Time Learning for Data Prefetching

Authors:
Lin Jia

University of Utah, Salt Lake city, USA

University of Utah, Salt Lake city, USA

https://orcid.org/0009-0007-8496-0058
View Profile

,
James Patrick Mcmahon

University of Utah, Salt Lake city, United States of America

University of Utah, Salt Lake city, United States of America

https://orcid.org/0009-0006-3718-794X
View Profile

,
Sumanth Gudaparthi

University of Utah, Salt Lake City, United States of America

University of Utah, Salt Lake City, United States of America

https://orcid.org/0000-0002-5008-9870
View Profile

,
Shreyas Singh

University of Utah, Salt Lake City, United States of America

University of Utah, Salt Lake City, United States of America

https://orcid.org/0009-0004-7338-0267
View Profile

,
Rajeev Balasubramonian

University of Utah, Salt Lake City, United States of America

University of Utah, Salt Lake City, United States of America

https://orcid.org/0009-0009-4093-5904
View Profile

ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3April 2024Pages 785–800https://doi.org/10.1145/3620666.3651332

Published:27 April 2024Publication History

ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3

Pages 785–800

ABSTRACT

Data prefetching is vital in high-performance processors and a large body of research has introduced a number of different approaches for accurate prefetching: stride detection, address correlating prefetchers, delta pattern detection, irregular pattern detection, etc. Most recently, a few works have leveraged advances in machine learning and deep neural networks to design prefetchers. These neural-inspired prefetchers observe data access patterns and develop a trained model that can then make accurate predictions for future accesses. A significant impediment to the success of these prefetchers is their high implementation cost, for both inference and training. These models cannot be trained in real-time, i.e., they have to be trained beforehand with a large benchmark suite. This results in a large model (that increases the overhead for inference), and the model can only successfully predict patterns that are similar to patterns in the training set.

In this work, we explore the potential of using spiking neural networks to learn and predict data access patterns, and specifically address deltas. While prior work has leaned on the recent success of trained artificial neural networks, we hypothesize that spiking neural networks are a better fit for real-time data prefetching. Spiking neural networks rely on the STDP algorithm to learn while performing inference - it is a low-cost and local learning algorithm that can quickly observe and react to the current stream of accesses. It is therefore possible to achieve high accuracy on previously unseen access patterns with a relatively small spiking neural network. This paper makes the case that spiking neurons and STDP offer a complexity-effective approach to leverage machine learning for data prefetching. We show that the proposed Pathfinder prefetcher is competitive with other state-of-the-art prefetchers on a range of benchmarks. Pathfinder can be implemented at 0.5 W and an area footprint of only 0.23 mm² (12 nm technology). This work shows that neural-inspired prefetching can be both practical and capable of high performance, but more innovations in SNN/STDP prefetch algorithms are required to fully maximize their potential.

References

The 2nd cache replacement championship. https://crc2.ece.tamu.edu/.Google Scholar
Ml-based data prefetching competition. https://sites.google.com/view/mlarchsys/isca-2021/ml-prefetching-competition.Google Scholar
J.L. Baer and T.F. Chen. An Effective On-Chip Preloading Scheme to Reduce Data Access Penalty. In Proceedings of Supercomputing, 1991.Google ScholarDigital Library
Mohammad Bakhshalipour, Mehran Shakerinava, Pejman Lotfi-Kamran, and Hamid Sarbazi-Azad. Bingo spatial data prefetcher. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 399--411. IEEE, 2019.Google ScholarCross Ref
R. Balasubramonian, A.B. Kahng, N. Muralimanohar, A. Shafiee, and V. Srinivas. CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories. ACM TACO, 14(2), 2017.Google ScholarDigital Library
Rahul Bera, Konstantinos Kanellopoulos, Anant V. Nori, Taha Shahroodi, Sreenivas Subramoney, and Onur Mutlu. Pythia: A customizable hardware prefetching framework using online reinforcement learning. CoRR, abs/2109.12021, 2021.Google Scholar
Guo-qiang Bi and Mu-ming Poo. Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. Journal of neuroscience, 18(24):10464--10472, 1998.Google ScholarCross Ref
Andrew S. Cassidy, Jun Sawada, Paul Merolla, John V. Arthur, Rodrigo Alvarez-Icaza, Filipp Akopyan, Bryan L. Jackson, and Dharmendra S. Modha. Truenorth: A high-performance, low-power neurosynaptic processor for multi-sensory perception, action, and cognition. 2016.Google Scholar
Chi F Chen, S-H Yang, Babak Falsafi, and Andreas Moshovos. Accurate and complexity-effective spatial pattern prediction. In 10th International Symposium on High Performance Computer Architecture (HPCA'04), pages 276--287. IEEE, 2004.Google ScholarDigital Library
Tien-Fu Chen and Jean-Loup Baer. Effective hardware-based data prefetching for high-performance processors. IEEE transactions on computers, 44(5):609--623, 1995.Google Scholar
Mike Davies, Narayan Srinivasa, Tsung-Han Lin, Gautham Chinya, Prasad Joshi, Andrew Lines, Andreas Wild, Hong Wang, and Deepak Mathaikutty. Loihi: A neuromorphic manycore processor with on-chip learning. IEEE Micro, PP:1--1, 01 2018.Google Scholar
P. Diehl and M. Cook. Unsupervised learning of digit recognition using spike-timing-dependent plasticity. In Frontiers in Computational Neuroscience, 2015.Google ScholarCross Ref
Peter Diehl and Matthew Cook. Unsupervised learning of digit recognition using spike-timing-dependent plasticity. Frontiers in Computational Neuroscience, 9, 2015.Google Scholar
John WC Fu, Janak H Patel, and Bob L Janssens. Stride directed prefetching in scalar processors. ACM SIGMICRO Newsletter, 23(1-2):102--110, 1992.Google ScholarDigital Library
Jeff Gehlhaar. Neuromorphic processing: a new frontier in scaling computer architecture. In ASPLOS, pages 317--318, 2014.Google ScholarDigital Library
Milad Hashemi, Kevin Swersky, Jamie Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, and Parthasarathy Ranganathan. Learning memory access patterns. In International Conference on Machine Learning. PMLR, 2018.Google Scholar
Milad Hashemi, Kevin Swersky, Jamie A. Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, and Parthasarathy Ranganathan. Learning memory access patterns. CoRR, abs/1803.02329, 2018.Google Scholar
Hananel Hazan, Daniel J. Saunders, Hassaan Khan, Devdhar Patel, Darpan T. Sanghavi, Hava T. Siegelmann, and Robert Kozma. Bindsnet: A machine learning-oriented spiking neural networks library in python. Frontiers in Neuroinformatics, 12, 2018.Google Scholar
Sorin Iacobovici, Lawrence Spracklen, Sudarshan Kadambi, Yuan Chou, and Santosh G Abraham. Effective stream-based and execution-based data prefetching. In Proceedings of the 18th annual international conference on Supercomputing, pages 1--11, 2004.Google ScholarDigital Library
Akanksha Jain and Calvin Lin. Linearizing irregular memory accesses for improved correlated prefetching. In 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 247--259, 2013.Google ScholarDigital Library
Akanksha Jain and Calvin Lin. Linearizing irregular memory accesses for improved correlated prefetching. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, pages 247--259, 2013.Google ScholarDigital Library
Teresa L Johnson, Matthew C Merten, and Wen-Mei W Hwu. Runtime spatial locality detection and optimization. In Proceedings of 30th Annual International Symposium on Microarchitecture, pages 57--64. IEEE, 1997.Google ScholarDigital Library
Norman P Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. ACM SIGARCH Computer Architecture News, 18(2SI):364--373, 1990.Google ScholarDigital Library
M.M. Khan, D.R. Lester, L.A. Plana, A. Rast, X. Jin, E. Painkras, and S.B. Furber. Spinnaker: Mapping neural networks onto a massively-parallel chip multiprocessor. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pages 2849--2856, 2008.Google ScholarCross Ref
Saeed Reza Kheradpisheh, Mohammad Ganjtabesh, Simon J. Thorpe, and Timothée Masquelier. Stdp-based spiking deep neural networks for object recognition. CoRR, abs/1611.01421, 2016.Google Scholar
Jinchun Kim, Seth H Pugsley, Paul V Gratz, AL Narasimha Reddy, Chris Wilkerson, and Zeshan Chishti. Path confidence based lookahead prefetching. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE.Google Scholar
An-Chow Lai, Cem Fide, and Babak Falsafi. Dead-block prediction & dead-block correlating prefetchers. In Proceedings 28th annual international symposium on computer architecture, pages 144--154. IEEE, 2001.Google ScholarDigital Library
Beiye Liu, Yiran Chen, Bryant Wysocki, and Tingwen Huang. Re-configurable neuromorphic computing system with memristor-based synapse design. Neural Processing Letters, 41(2):159--167, 2015.Google ScholarDigital Library
Chenchen Liu, Bonan Yan, Chaofei Yang, Linghao Song, Zheng Li, Beiye Liu, Yiran Chen, Hai Li, Qing Wu, and Hao Jiang. A spiking neuromorphic design with resistive crossbar. In 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC), pages 1--6. IEEE, 2015.Google ScholarDigital Library
Pierre Michaud. A best-offset prefetcher. In 2nd Data Prefetching Championship, 2015.Google Scholar
Pierre Michaud. Best-offset hardware prefetching. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 469--480, 2016.Google ScholarCross Ref
Sparsh Mittal. A survey of recent prefetching techniques for processor caches. ACM Computing Surveys (CSUR), 49(2):1--35, 2016.Google Scholar
Surya Narayanan, Ali Shafiee, and Rajeev Balasubramonian. Inxs: Bridging the throughput and energy gap for spiking neural networks. In 2017 International Joint Conference on Neural Networks (IJCNN), pages 2451--2459, 2017.Google ScholarCross Ref
Surya Narayanan, Karl Taht, Rajeev Balasubramonian, Edouard Giacomin, and Pierre-Emmanuel Gaillardon. Spinalflow: An architecture and dataflow tailored for spiking neural networks. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), pages 349--362, 2020.Google ScholarDigital Library
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.Google Scholar
Leeor Peled, Shie Mannor, Uri Weiser, and Yoav Etsion. Semantic locality and context-based prefetching using reinforcement learning. SIGARCH Comput. Archit. News, 43(3S):285--297, jun 2015.Google ScholarDigital Library
Seth H Pugsley, Zeshan Chishti, Chris Wilkerson, Peng-fei Chuang, Robert L Scott, Aamer Jaleel, Shih-Lien Lu, Kingsum Chow, and Rajeev Balasubramonian. Sandbox prefetching: Safe run-time evaluation of aggressive prefetchers. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), pages 626--637, 2014.Google ScholarCross Ref
Seth H Pugsley, Zeshan Chishti, Chris Wilkerson, Peng-fei Chuang, Robert L Scott, Aamer Jaleel, Shih-Lien Lu, Kingsum Chow, and Rajeev Balasubramonian. Sandbox prefetching: Safe run-time evaluation of aggressive prefetchers. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), pages 626--637. IEEE, 2014.Google ScholarCross Ref
A. Seznec. A New Case for the TAGE Branch Predictor. In Proceedings of MICRO, 2011.Google ScholarDigital Library
M. Shevgoor, S. Koladiya, R. Balasubramonian, S. Pugsley, C. Wilkerson, and Z. Chishti. Efficiently Prefetching Complex Address Patterns. In Proceedings of MICRO, 2015.Google ScholarDigital Library
Zhan Shi, Akanksha Jain, Kevin Swersky, Milad Hashemi, Parthasarathy Ranganathan, and Calvin Lin. A hierarchical neural model of data prefetching. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2021, page 861--873, New York, NY, USA, 2021. Association for Computing Machinery.Google ScholarDigital Library
Zhan Shi, Akanksha Jain, Kevin Swersky, Milad Hashemi, Parthasarathy Ranganathan, and Calvin Lin. Voyager github repository. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2021, page 861--873, New York, NY, USA, 2021. Association for Computing Machinery.Google Scholar
Ivan Sklenář. Prefetch unit for vector operations on scalar computers. ACM SIGARCH Computer Architecture News, 20(4):31--37, 1992.Google ScholarDigital Library
Stephen Somogyi, Thomas F Wenisch, Anastasia Ailamaki, and Babak Falsafi. Spatio-temporal memory streaming. ACM SIGARCH Computer Architecture News, 37(3):69--80, 2009.Google ScholarDigital Library
Stephen Somogyi, Thomas F Wenisch, Anastassia Ailamaki, Babak Falsafi, and Andreas Moshovos. Spatial memory streaming. ACM SIGARCH Computer Architecture News, 34(2):252--263, 2006.Google ScholarDigital Library
Ajitesh Srivastava, Aggelos Lazaris, Benjamin Brooks, Rajgopal Kannan, and Viktor Prasanna. Predicting memory accesses: the road to compact ml-driven prefetcher. pages 461--470, 09 2019.Google ScholarDigital Library
Synopsys. Synopsys Design Compiler User Guide. http://www.synopsys.com/\Tools/Implementation/RTLSynthesis/DCUltra/Pages/.Google Scholar
Tianqi Tang, Lixue Xia, Boxun Li, Rong Luo, Yiran Chen, Yu Wang, and Huazhong Yang. Spiking neural network with rram: Can we use it for real-world application? In 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 860--865, 2015.Google Scholar
WikiChip. Ryzen 7 2700X - AMD. https://en.wikichip.org/wiki/amd/ryzen_7/2700x.Google Scholar
Michael Wu, Ketaki Joshi, Andrew Sheinberg, Guilherme Cox, Anurag Khandelwal, Raghavendra Pradyumna Pothukuchi, and Abhishek Bhattacharjee. Prefetching using principles of hippocampal-neocortical interaction. In Proceedings of the 19th Workshop on Hot Topics in Operating Systems, HOTOS '23, page 53--60, New York, NY, USA, 2023. Association for Computing Machinery.Google ScholarDigital Library
Pengmiao Zhang, Rajgopal Kannan, Ajitesh Srivastava, Anant V Nori, and Viktor K Prasanna. Resemble: reinforced ensemble framework for data prefetching. In SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1--14. IEEE, 2022.Google Scholar
Pengmiao Zhang, Ajitesh Srivastava, Benjamin Brooks, Rajgopal Kannan, and Viktor K. Prasanna. Raop: Recurrent neural network augmented offset prefetcher. In The International Symposium on Memory Systems, MEMSYS 2020, page 352--362, New York, NY, USA, 2020. Association for Computing Machinery.Google ScholarDigital Library

Recommendations

Stealth prefetching
Proceedings of the 2006 ASPLOS Conference

Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching ...
Read More
Maintaining Cache Coherence through Compiler-Directed Data Prefetching

In this paper, we propose a compiler-directed cache coherence scheme which makes use of data prefetching to enforce cache coherence in large-scale distributed shared-memory (DSM) systems. TheCache Coherence With Data Prefetching(CCDP) scheme uses ...
Read More
Stealth prefetching
ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems

Prefetching in shared-memory multiprocessor systems is an increasingly difficult problem. As system designs grow to incorporate larger numbers of faster processors, memory latency and interconnect traffic increase. While aggressive prefetching ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3
April 2024
1106 pages
ISBN:9798400703867
DOI:10.1145/3620666
General Chairs:
Nael Abu-Ghazaleh,
Rajiv Gupta,
Program Chairs:
Madan Musuvathi,
Dan Tsafrir
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 April 2024
Check for updates
Badges
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate535of2,713submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 104
  Total Downloads
- Downloads (Last 12 months)104
- Downloads (Last 6 weeks)104
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.