Skip to main content
Log in

Filter cache: filtering useless cache blocks for a small but efficient shared last-level cache

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Although the shared last-level cache (SLLC) occupies a significant portion of multicore CPU chip die area, more than 59% of SLLC cache blocks are not reused during their lifetime. If we can filter out these useless blocks from SLLC, we can effectively reduce the size of SLLC without sacrificing performance. For this purpose, we classify the reuse of cache blocks into temporal and spatial reuse and further analyze the reuse by using reuse interval and reuse count. From our experimentation, we found that most of spatially reused cache blocks are reused only once with short reuse interval, so it is inefficient to manage them in SLLC. In this paper, we propose a new small additional cache called Filter Cache to the SLLC, which cannot only check the temporal reuse but also can prevent spatially reused blocks from entering the SLLC. Thus, we do not maintain data for non-reused blocks and spatially reused blocks in the SLLC, dramatically reducing the size of the SLLC. Through our detailed simulation on PARSEC benchmarks, we show that our new SLLC design with Filter Cache exhibits comparable performance to the conventional SLLC with only 24.21% of SLLC area across a variety of different workloads. This is achieved by its faster access and high reuse rates in the small SLLC with Filter Cache.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

References

  1. Core Intel (2010) i7 processor extreme edition and Intel Core i7 processor datasheet. White paper, Intel

    Google Scholar 

  2. Singh T, Rangarajan S, John D, Henrion C, Southard S, McIntyre H, Novak A et al. (2017) 3.2 Zen: a next-generation high-performance × 86 core. In: Solid-State Circuits Conference (ISSCC), 2017 IEEE International, pp 52–53. IEEE

  3. McNairy Cameron, Soltis Don (2003) Itanium 2 processor microarchitecture. IEEE Micro 23(2):44–55

    Article  Google Scholar 

  4. Konstadinidis GK, Li HP, Schumacher F, Krishnaswamy V, Cho H, Dash S, Masleid RP et al (2016) SPARC M7: A 20 nm 32-core 64 MB L3 cache processor. IEEE J Solid-State Circuits 51(1):79–91

    Article  Google Scholar 

  5. Sinharoy B, Van Norstrand JA, Eickemeyer RJ, Le HQ, Leenstra J, Nguyen DQ, Konigsburg B et al (2015) IBM POWER8 processor core microarchitecture. IBM J Res Dev 59(1):1–2

    Google Scholar 

  6. Albericio J, Ibáñez P, Viñals V, Llabería JM (2013) Exploiting reuse locality on inclusive shared last-level caches. ACM Trans Archit Code Optim (TACO) 9(4):38

    Google Scholar 

  7. Jaleel A, Theobald KB, Steely SC Jr, Emer J (2010) High performance cache replacement using re-reference interval prediction (RRIP). ACM SIGARCH Comput Archit News 38(3):60–71

    Article  Google Scholar 

  8. Wu CJ, Jaleel A, Hasenplaugh W, Martonosi M, Steely Jr SC, Emer J (2011) SHiP: signature-based hit predictor for high performance caching. In: Proceedings of the 44th Annual Ieee/Acm International Symposium on Microarchitecture , pp 430–441. IEEE

  9. Albericio J, Ibáñez P, Viñals V, Llabería JM (2013) The reuse cache: downsizing the shared last-level cache. In: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, pp 310–321. ACM

  10. Das S, Kapoor HK (2016) Towards a better cache utilization by selective data storage for CMP last level caches. In: 2016 29th International Conference on VLSI Design and 2016 15th International Conference on Embedded Systems (VLSID) , pp 92–97. IEEE

  11. Zhao L, Iyer R, Makineni S, Newell D, Cheng L (2010) NCID: a non-inclusive cache, inclusive directory architecture for flexible and efficient cache hierarchies. In: Proceedings of the 7th ACM International Conference on Computing Frontiers, pp 121–130. ACM

  12. Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The Gem5 simulator. ACM SIGARCH Comput Archit News 39(2):1–7

    Article  Google Scholar 

  13. Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp 72–81, Oct 2008

  14. Thoziyoor, S., Muralimanohar, N., Ahn, J. H., & Jouppi, N., “Cacti 5.3.”, HP Laboratories, Palo Alto, CA., 2008

  15. Jain A, Lin C (2016) Back to the future: leveraging Belady’s algorithm for improved cache replacement. In: 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture. IEEE

  16. Díaz J, Monreal T, Ibáñez P, Llabería JM, Viñals V (2019) ReD: a reuse detector for content selection in exclusive shared last-level caches. J Parallel Distrib Comput 125:106–120

    Article  Google Scholar 

Download references

Acknowledgements

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) and funded by the Ministry of Science, ICT and Future Planning (NRF-2017R1A2B2009 641). This research was also supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2019-2015-0-00363) supervised by the IITP (Institute for Information & Communications Technology Promotion). This research was supported by Korea University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lynn Choi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bae, H.J., Choi, L. Filter cache: filtering useless cache blocks for a small but efficient shared last-level cache. J Supercomput 76, 7521–7544 (2020). https://doi.org/10.1007/s11227-020-03177-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03177-2

Keywords

Navigation