Skip to main content

A Profiling-Based Approach to Cache Partitioning of Program Data

  • Conference paper
  • First Online:
Parallel and Distributed Computing, Applications and Technologies (PDCAT 2022)

Abstract

Cache efficiency is important to avoid unnecessary data transfers and to keep processors active. Cache partitioning, a technique to virtually divide a cache into multiple partitions, has become available in recent hardware. Cache partitioning can improve efficiency by isolating data with high temporal locality to avoid its early eviction before reuse. However, deciding on the partitioning is challenging, because it depends on the locality of reference. To facilitate the decision-making, we propose a profiling-based approach that measures locality, providing knowledge for cache partitioning without requiring manual code analysis. We present a profiling tool and confirm its benefits through experiments on Fujitsu’s A64FX processor, which supports the cache partitioning mechanism called sector cache. Our results show ways to optimize program codes to improve cache efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alappat, C., et al.: Execution-Cache-Memory modeling and performance tuning of sparse matrix-vector multiplication and Lattice quantum chromodynamics on A64FX. Concurr. Comput.: Pract. Experience 34(20), e6512 (2022). https://doi.org/10.1002/cpe.6512

    Article  Google Scholar 

  2. Bailey, D.H., et al.: The NAS parallel benchmarks-summary and preliminary results. In: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, pp. 158–165. ACM (1991). https://doi.org/10.1145/125826.125925

  3. Belady, L.A.: A study of replacement algorithms for a virtual-storage computer. IBM Syst. J. 5(2), 78–101 (1966). https://doi.org/10.1147/sj.52.0078

    Article  Google Scholar 

  4. Beyls, K., D’Hollander, E.: Reuse distance as a metric for cache behavior. In: Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems, pp. 617–622 (2001)

    Google Scholar 

  5. El-Sayed, N., et al.: KPart: a hybrid cache partitioning-sharing technique for commodity multicores. In: 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 104–117 (2018). https://doi.org/10.1109/HPCA.2018.00019

  6. Fujitsu Limited: A64FX Microarchitecture Manual, version 1.5 edn. (2021). https://github.com/fujitsu/A64FX/blob/master/doc/

  7. Intel Corporation: Improving real-time performance by utilizing cache allocation technology. Intel Corporation (2015)

    Google Scholar 

  8. Jiang, Y., Zhang, E.Z., Tian, K., Shen, X.: Is reuse distance applicable to data locality analysis on chip multiprocessors? In: Gupta, R. (ed.) CC 2010. LNCS, vol. 6011, pp. 264–282. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11970-5_15

    Chapter  Google Scholar 

  9. Kim, Y.H., et al.: Implementing stack simulation for highly-associative memories. SIGMETRICS Perform. Eval. Rev. 19(1), 212–213 (1991). https://doi.org/10.1145/107972.107995

    Article  Google Scholar 

  10. Kumar, S., Singh, P.K.: An overview of modern cache memory and performance analysis of replacement policies. In: 2016 IEEE International Conference on Engineering and Technology, pp. 210–214 (2016). https://doi.org/10.1109/ICETECH.2016.7569243

  11. Lu, Q., Lin, J., et al.: Soft-OLP: improving hardware cache performance through software-controlled object-level partitioning. In: 2009 18th International Conference on Parallel Architectures and Compilation Techniques, pp. 246–257 (2009). https://doi.org/10.1109/PACT.2009.35

  12. Löff, J., et al.: The NAS parallel benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures. Future Gener. Comput. Syst. 125(C), 743–757 (2021). https://doi.org/10.1016/j.future.2021.07.021

  13. Mellor-Crummey, J.M., Scott, M.L.: Synchronization without contention. SIGPLAN Not. 26(4), 269–278 (1991). https://doi.org/10.1145/106973.106999

    Article  Google Scholar 

  14. Mittal, S.: A survey of techniques for cache partitioning in multicore processors. ACM Comput. Surv. 50(2) (2017). https://doi.org/10.1145/3062394

  15. Mucci, P.J., Browne, S., et al.: PAPI: a portable interface to hardware performance counters. In: Proceedings of the Department of Defense HPCMP Users Group Conference, vol. 710. Citeseer (1999)

    Google Scholar 

  16. Perarnau, S., Sato, M.: Toward automated cache partitioning for the K computer. IPSJ SIG-HPC (2012)

    Google Scholar 

  17. Sabarimuthu, J.M., Venkatesh, T.: Analytical miss rate calculation of L2 cache from the RD profile of L1 cache. IEEE Trans. Comput. 67(1), 9–15 (2017). https://doi.org/10.1109/TC.2017.2723878

    Article  MathSciNet  MATH  Google Scholar 

  18. Sasongko, M.A., Chabbi, M., et al.: ReuseTracker: fast yet accurate multicore reuse distance analyzer. ACM Trans. Archit. Code Optim. 19(1) (2021). https://doi.org/10.1145/3484199

  19. Schuff, D.L., Kulkarni, M., Pai, V.S.: Accelerating multicore reuse distance analysis with sampling and parallelization. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, pp. 53–64 (2010). https://doi.org/10.1145/1854273.1854286

  20. Schuff, D.L., Parsons, B.S., Pai, V.S.: Multicore-aware reuse distance analysis. In: 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), pp. 1–8 (2010). https://doi.org/10.1109/IPDPSW.2010.5470780

  21. Wang, Q., Liu, X., Chabbi, M.: Featherlight reuse-distance measurement. In: 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 440–453. IEEE (2019). https://doi.org/10.1109/HPCA.2019.00056

  22. Wu, M.J., Yeung, D.: Identifying optimal multicore cache hierarchies for loop-based parallel programs via reuse distance analysis. In: Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, pp. 2–11 (2012). https://doi.org/10.1145/2247684.2247687

  23. Yoshida, T., Hondo, M., Kan, R., Sugizaki, G.: SPARC64 VIIIfx: CPU for the K computer. Fujitsu Sci. Tech. J 48(3), 274–279 (2012)

    Google Scholar 

  24. Zhong, Y., Dropsho, S.G., et al.: Miss rate prediction across program inputs and cache configurations. IEEE Trans. Comput. 56(3), 328–343 (2007). https://doi.org/10.1109/TC.2007.50

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgement

This work has received funding from the European High-Performance Joint Undertaking under grant agreement no.956213 (SparCity), and the Federal Ministry of Education and Research of Germany (project number 16HPC045). Performance results have been obtained on systems in the test environment BEAST (Bavarian Energy Architecture & Software Testbed) (https://www.lrz.de/presse/ereignisse/2020-11-06_BEAST/) at the Leibniz Supercomputing Centre.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergej Breiter .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Breiter, S., Weidendorfer, J., Chung, M.T., Fürlinger, K. (2023). A Profiling-Based Approach to Cache Partitioning of Program Data. In: Takizawa, H., Shen, H., Hanawa, T., Hyuk Park, J., Tian, H., Egawa, R. (eds) Parallel and Distributed Computing, Applications and Technologies. PDCAT 2022. Lecture Notes in Computer Science, vol 13798. Springer, Cham. https://doi.org/10.1007/978-3-031-29927-8_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-29927-8_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-29926-1

  • Online ISBN: 978-3-031-29927-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics