Intelligent cache prefetchers in HPC architecture

Fargo, Farah; Diamond, Mitchell; Franza, Olivier; Foose, Paul; Adiletta, Jack; Adiletta, Matthew; Steely, Simon C.

doi:10.1007/s10586-024-04854-0

Intelligent cache prefetchers in HPC architecture

Published: 21 January 2025

Volume 28, article number 154, (2025)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Farah Fargo¹,
Mitchell Diamond¹,
Olivier Franza¹,
Paul Foose¹,
Jack Adiletta¹,
Matthew Adiletta¹ &
…
Simon C. Steely Jr¹

133 Accesses
Explore all metrics

Abstract

Reinforcement learning (RL) is revolutionizing the field of Artificial Intelligence (AI) and represents a step ahead towards building an optimal and autonomous system with a higher level of understanding (Arulkumaran et al. in IEEE Signal Processing Mag 34(6):26–38, 2017). One of the main goals for AI is to produce fully autonomous agents to interact with several features and learn the optimal behavior to optimize. Applications vary in data access patterns and a static hardware configuration is not idea for all phases of a workload. Today Xeon cores have multiple data prefetchers which fetch the next sets of data to be used, however, there are problems with these prefetchers as they may interact in destructive ways. This destructive behavior can cause several problems such as an increase in cache pollution, bottlenecks in the memory bandwidth, and additional occupancy to critical path demand queues. Managing the aggressiveness of the prefetchers are necessary to mitigate these problems. Current hardware prefetchers manage the aggressiveness of prefetchers by monitoring telemetry such as memory bandwidth and accuracy. However, there are problems with this approach as the telemetry data does not necessarily correlate with the overall system performance. In addition, other solutions show optimizing prefetchers individually to manage the system performance rather than allowing multiple features to work together. This research introduces hierarchical smart agents using reinforcement learning to find the optimal aggressiveness for the MLC prefetchers on runtime managed by the Smart Prefetchers Manager (SPM). We have expanded our previous work and evaluated more workloads on a hierarchical model and applied reinforcement learning in addition to offline training approach. This approach is implemented and evaluated on single core, single process environment to optimize the three Mid-level Cache (MLC) prefetchers on run time. Results demonstrated that using the reinforcement learning can optimize up to 7.18% improvement in instructions per cycle (IPC) over the state-of-the-art hardware solution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RL-CoPref: a reinforcement learning-based coordinated prefetching controller for multiple prefetchers

Article Open access 27 February 2024

Mjolnir: A framework agnostic auto-tuning system with deep reinforcement learning

Article 20 October 2022

Energy-aware scheduling for spark job based on deep reinforcement learning in cloud

Article 08 March 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

No datasets were generated or analysed during the current study.

References

H. Devarajan, A. Kougkas, H. Zheng, V. Vishwanath and X. -H. Sun, "Stimulus: Accelerate Data Management for Scientific AI applications in HPC," 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), Taormina, 2022
M. J. Adiletta, F. Fargo, M. Diamond, J. Adiletta, O. Franza and S. Steely, "A Reinforcement Learning Approach to Optimize Cache Prefetcher Aggressiveness at Run-Time," 2023 Tenth International Conference on Software Defined Systems (SDS), San Antonio, 2023
J. Doweck, Inside intel core microarchitecture and smart memory access-White Paper, Intel.
Le, H.Q., et al.: IBM POWER6 microarchitecture. IBM J. Res. Dev. 51, 639–662 (2007)
Article Google Scholar
Owen, J., Steinman, M.: Northbridge architecture of AMD’s Griffin microprocessor family. IEEE Micro 28, 10–18 (2008)
Article MATH Google Scholar
J. Casazza, First the tick now the tock: Intel microarchitecture (Nehalem) - White Paper, Intel, 2009.
E. Ebrahimi, C. J. Lee, O. Mutlu and Y. N. Patt, "Prefetch-aware shared-resource management for multi-core systems," 2011 38th Annual International Symposium on Computer Architecture (ISCA), San Jose, 2011.
A. E. Papathanasiou and M. L. Scott. Aggressive prefetching: an idea whose time has come. Hot Topics in Operating Systems, 2005
W. Heirman, K. D. Bois, Y. Vandriessche, S. Eyerman, and I. Hur, “Near-side prefetch throttling: Adaptive prefetching for high-performance many-core processors,” in Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, ser. PACT ’18. New York: ACM, 2018, pp. 28:1–28:11. [Online]. Available: http://doi.acm.org/https://doi.org/10.1145/3243176.3243181
Santhosh Srinath, Onur Mutlu, Hyesoon Kim, and Yale N Patt. 2007. Feedback directed prefetching: improving the performance and bandwidth-efficiency of hardware prefetchers. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA). 63–74.
M. Kumar, S. Ali Khan, A. Bhatia, V. Sharma and P. Jain, "A Conceptual introduction of Machine Learning Algorithms," 2023 1st International Conference on Intelligent Computing and Research Trends (ICRT), Roorkee, 2023, pp. 1–7, https://doi.org/10.1109/ICRT57042.2023.10146676.
M. S. Kupriyanov, Y. A. Shichkina and S. E. Ilin, "Implementation of Temporal Action Detection Workflow for Video Data Analysis with the Use of Machine Learning Operations," 2024 V International Conference on Neural Networks and Neurotechnologies (NeuroNT), Saint Petersburg, Russian Federation, 2024.
Shih-wei Liao, Tzu-Han Hung, Donald Nguyen, Chinyen Chou, Chiaheng Tu, and Hucheng Zhou. 2009. Machine Learning-based Prefetch Optimization for Data Center Applications. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC ’09). 1–10.
J. Hiebel et al. Machine learning for fine-grained hardware prefetcher control. Proc. ICPP, p. 3, 2019.
Saami Rahman et al. “Maximizing hardware prefetch effectiveness with machine learning”. In: High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conference on Embedded Software and Systems (ICESS), 2015 IEEE 17th International Conference on. IEEE. 2015, pp. 383–389.
A. Aziz, M. Cireno, E. Barros and B. Prado, "Balanced prefetching aggressiveness controller for NoC-based multiprocessor", SBCCI '14 Proceedings of the 27th Symposium on Integrated Circuits and Systems Design, 2014.
Panda, B., Balachandran, S.: CAFFEINE: a utility-driven prefetcher aggressiveness engine for multicores. ACM Trans. Archit. Code Optim. 12, 25 (2015). https://doi.org/10.1145/2806891
Article MATH Google Scholar
Panda, B.: SPAC: a synergistic prefetcher aggressiveness controller for multi-core systems. IEEE Trans. Comput. 65, 3740–3753 (2016)
MathSciNet MATH Google Scholar
Intel 2016. Intel 64 and IA-32 Architectures Developer’s Manual: Volume 3C. Intel
Saami Rahman, Martin Burtscher, Ziliang Zong, and Apan Qasem. 2015. Maximizing Hardware Prefetch Effectiveness with Machine Learning. In Proceedings of the 17th International Conference on High Performance Computing and Communications. 383–389.
Vish Viswanathan. 2014. Disclosure of H/W Prefetcher Control on some Intel Processors. Technical Report. Intel.
Christopher JCH Watkins and Peter Dayan: Q-learning. Machine learning 8(3–4), 279–292 (1992)
Google Scholar
Richard Bellman, Dynamic programming, Princeton University Press, Princeton, N. J., 1957. MR 0090477.
Ankur Limaye and Tosiron Adegbija. A workload characterization of the spec cpu2017 benchmark suite. In Performance Analysis of Systems and Software (ISPASS), 2018 IEEE International Symposium on, pages 149–158. IEEE, 2018.
Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34(6), 26–38 (2017)
Article Google Scholar
The Sniper Multi-Core Simulator. (2020). Sniper. Retrieved July 24, 2020, from https://snipersim.org/w/The_Sniper_Multi-Core_Simulator

Download references

Funding

The authors have not disclosed any funding.

Author information

Authors and Affiliations

Intel Corporation, 75 Reeds Rd, Hudson, MA, 01749, USA
Farah Fargo, Mitchell Diamond, Olivier Franza, Paul Foose, Jack Adiletta, Matthew Adiletta & Simon C. Steely Jr

Authors

Farah Fargo
View author publications
You can also search for this author in PubMed Google Scholar
Mitchell Diamond
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Franza
View author publications
You can also search for this author in PubMed Google Scholar
Paul Foose
View author publications
You can also search for this author in PubMed Google Scholar
Jack Adiletta
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Adiletta
View author publications
You can also search for this author in PubMed Google Scholar
Simon C. Steely Jr
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors whose names appear on the submission made substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data; or the creation of new software used in the work; drafted the work or revised it critically for important intellectual content; approved the version to be published; and agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Corresponding author

Correspondence to Farah Fargo.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Fargo, F., Diamond, M., Franza, O. et al. Intelligent cache prefetchers in HPC architecture. Cluster Comput 28, 154 (2025). https://doi.org/10.1007/s10586-024-04854-0

Download citation

Received: 28 June 2024
Revised: 02 October 2024
Accepted: 22 October 2024
Published: 21 January 2025
DOI: https://doi.org/10.1007/s10586-024-04854-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Intelligent cache prefetchers in HPC architecture

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

RL-CoPref: a reinforcement learning-based coordinated prefetching controller for multiple prefetchers

Mjolnir: A framework agnostic auto-tuning system with deep reinforcement learning

Energy-aware scheduling for spark job based on deep reinforcement learning in cloud

Explore related subjects

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now