Abstract
Reinforcement learning (RL) is revolutionizing the field of Artificial Intelligence (AI) and represents a step ahead towards building an optimal and autonomous system with a higher level of understanding (Arulkumaran et al. in IEEE Signal Processing Mag 34(6):26–38, 2017). One of the main goals for AI is to produce fully autonomous agents to interact with several features and learn the optimal behavior to optimize. Applications vary in data access patterns and a static hardware configuration is not idea for all phases of a workload. Today Xeon cores have multiple data prefetchers which fetch the next sets of data to be used, however, there are problems with these prefetchers as they may interact in destructive ways. This destructive behavior can cause several problems such as an increase in cache pollution, bottlenecks in the memory bandwidth, and additional occupancy to critical path demand queues. Managing the aggressiveness of the prefetchers are necessary to mitigate these problems. Current hardware prefetchers manage the aggressiveness of prefetchers by monitoring telemetry such as memory bandwidth and accuracy. However, there are problems with this approach as the telemetry data does not necessarily correlate with the overall system performance. In addition, other solutions show optimizing prefetchers individually to manage the system performance rather than allowing multiple features to work together. This research introduces hierarchical smart agents using reinforcement learning to find the optimal aggressiveness for the MLC prefetchers on runtime managed by the Smart Prefetchers Manager (SPM). We have expanded our previous work and evaluated more workloads on a hierarchical model and applied reinforcement learning in addition to offline training approach. This approach is implemented and evaluated on single core, single process environment to optimize the three Mid-level Cache (MLC) prefetchers on run time. Results demonstrated that using the reinforcement learning can optimize up to 7.18% improvement in instructions per cycle (IPC) over the state-of-the-art hardware solution.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
No datasets were generated or analysed during the current study.
References
H. Devarajan, A. Kougkas, H. Zheng, V. Vishwanath and X. -H. Sun, "Stimulus: Accelerate Data Management for Scientific AI applications in HPC," 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), Taormina, 2022
M. J. Adiletta, F. Fargo, M. Diamond, J. Adiletta, O. Franza and S. Steely, "A Reinforcement Learning Approach to Optimize Cache Prefetcher Aggressiveness at Run-Time," 2023 Tenth International Conference on Software Defined Systems (SDS), San Antonio, 2023
J. Doweck, Inside intel core microarchitecture and smart memory access-White Paper, Intel.
Le, H.Q., et al.: IBM POWER6 microarchitecture. IBM J. Res. Dev. 51, 639–662 (2007)
Owen, J., Steinman, M.: Northbridge architecture of AMD’s Griffin microprocessor family. IEEE Micro 28, 10–18 (2008)
J. Casazza, First the tick now the tock: Intel microarchitecture (Nehalem) - White Paper, Intel, 2009.
E. Ebrahimi, C. J. Lee, O. Mutlu and Y. N. Patt, "Prefetch-aware shared-resource management for multi-core systems," 2011 38th Annual International Symposium on Computer Architecture (ISCA), San Jose, 2011.
A. E. Papathanasiou and M. L. Scott. Aggressive prefetching: an idea whose time has come. Hot Topics in Operating Systems, 2005
W. Heirman, K. D. Bois, Y. Vandriessche, S. Eyerman, and I. Hur, “Near-side prefetch throttling: Adaptive prefetching for high-performance many-core processors,” in Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, ser. PACT ’18. New York: ACM, 2018, pp. 28:1–28:11. [Online]. Available: http://doi.acm.org/https://doi.org/10.1145/3243176.3243181
Santhosh Srinath, Onur Mutlu, Hyesoon Kim, and Yale N Patt. 2007. Feedback directed prefetching: improving the performance and bandwidth-efficiency of hardware prefetchers. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA). 63–74.
M. Kumar, S. Ali Khan, A. Bhatia, V. Sharma and P. Jain, "A Conceptual introduction of Machine Learning Algorithms," 2023 1st International Conference on Intelligent Computing and Research Trends (ICRT), Roorkee, 2023, pp. 1–7, https://doi.org/10.1109/ICRT57042.2023.10146676.
M. S. Kupriyanov, Y. A. Shichkina and S. E. Ilin, "Implementation of Temporal Action Detection Workflow for Video Data Analysis with the Use of Machine Learning Operations," 2024 V International Conference on Neural Networks and Neurotechnologies (NeuroNT), Saint Petersburg, Russian Federation, 2024.
Shih-wei Liao, Tzu-Han Hung, Donald Nguyen, Chinyen Chou, Chiaheng Tu, and Hucheng Zhou. 2009. Machine Learning-based Prefetch Optimization for Data Center Applications. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC ’09). 1–10.
J. Hiebel et al. Machine learning for fine-grained hardware prefetcher control. Proc. ICPP, p. 3, 2019.
Saami Rahman et al. “Maximizing hardware prefetch effectiveness with machine learning”. In: High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conference on Embedded Software and Systems (ICESS), 2015 IEEE 17th International Conference on. IEEE. 2015, pp. 383–389.
A. Aziz, M. Cireno, E. Barros and B. Prado, "Balanced prefetching aggressiveness controller for NoC-based multiprocessor", SBCCI '14 Proceedings of the 27th Symposium on Integrated Circuits and Systems Design, 2014.
Panda, B., Balachandran, S.: CAFFEINE: a utility-driven prefetcher aggressiveness engine for multicores. ACM Trans. Archit. Code Optim. 12, 25 (2015). https://doi.org/10.1145/2806891
Panda, B.: SPAC: a synergistic prefetcher aggressiveness controller for multi-core systems. IEEE Trans. Comput. 65, 3740–3753 (2016)
Intel 2016. Intel 64 and IA-32 Architectures Developer’s Manual: Volume 3C. Intel
Saami Rahman, Martin Burtscher, Ziliang Zong, and Apan Qasem. 2015. Maximizing Hardware Prefetch Effectiveness with Machine Learning. In Proceedings of the 17th International Conference on High Performance Computing and Communications. 383–389.
Vish Viswanathan. 2014. Disclosure of H/W Prefetcher Control on some Intel Processors. Technical Report. Intel.
Christopher JCH Watkins and Peter Dayan: Q-learning. Machine learning 8(3–4), 279–292 (1992)
Richard Bellman, Dynamic programming, Princeton University Press, Princeton, N. J., 1957. MR 0090477.
Ankur Limaye and Tosiron Adegbija. A workload characterization of the spec cpu2017 benchmark suite. In Performance Analysis of Systems and Software (ISPASS), 2018 IEEE International Symposium on, pages 149–158. IEEE, 2018.
Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34(6), 26–38 (2017)
The Sniper Multi-Core Simulator. (2020). Sniper. Retrieved July 24, 2020, from https://snipersim.org/w/The_Sniper_Multi-Core_Simulator
Funding
The authors have not disclosed any funding.
Author information
Authors and Affiliations
Contributions
All authors whose names appear on the submission made substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data; or the creation of new software used in the work; drafted the work or revised it critically for important intellectual content; approved the version to be published; and agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fargo, F., Diamond, M., Franza, O. et al. Intelligent cache prefetchers in HPC architecture. Cluster Comput 28, 154 (2025). https://doi.org/10.1007/s10586-024-04854-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10586-024-04854-0