Reducing the second-level cache conflict misses using a set folding technique

Shatnawi, Ali; Alsaedeen, Mohammad

doi:10.1007/s11227-017-2174-8

Reducing the second-level cache conflict misses using a set folding technique

Published: 01 November 2017

Volume 74, pages 970–993, (2018)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Ali Shatnawi¹ &
Mohammad Alsaedeen¹

363 Accesses
Explore all metrics

Abstract

The cache memory has a direct effect on the performance of a computer system. Instructions and data are fetched from a fast cache instead of a slow memory to save hundreds of cycles. Reducing the cache miss ratio will definitely improve the execution time of an application. In this work, we propose cache memory designs that reduce the number of conflict misses significantly. The proposed designs reduce the conflict misses in the last level multi-way set associative cache. Each set is divided into a group of subsets: the first is referred to as the exclusive subset, and the rest are the shared subsets. The exclusive is configured as a traditional cache where each block is mapped to the set whose index matches the block index. In addition to their standard cache indexing role, the shared subsets are configured to host blocks with different indices. A memory block can be mapped to one subset from the exclusive type or one of multiple subsets from the shared type. Since the proposed technique is based on combining multiple sets of the shared part to form a larger set, that is shared between memory blocks with different indices, we have chosen the name “set folding.” The decision as to where to map a memory block depends on the number of misses encountered at each of the potential target sets. To evaluate the proposed design based on the overall hit rate, twenty-three benchmarks from SPEC CPU 2006 were simulated using the SuperESCalar simulator. The proposed designs require a few extra storage bits which adds a small overhead on the hardware complexity in comparison with the conventional cache. However, the proposed designs achieve lower miss rates for most of the benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Jl H, Da P (2007) Computer architecture: a quantitative approach, vol 4. Kaufmann Publishers, San Francisco
Google Scholar
Moore GE et al (1998) Cramming more components onto integrated circuits. Proc IEEE 86(1):82–85
Article Google Scholar
Kagi A, Goodman JR, Burger D (1996) Memory bandwidth limitations of future microprocessors. In: Computer Architecture, 23rd Annual International Symposium on IEEE 1996, pp 78–78
Patt YN, Patel SJ, Evers M, Friendly DH, Stark J (1997) One billion transistors, one uniprocessor, one chip. Computer 30(9):51–57
Article Google Scholar
Agarwal A, Pudar SD (1993) Column-associative caches: a technique for reducing the miss rate of direct-mapped caches, vol 21 (2). ASM
Jouppi NP (1990) Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In: Computer Architecture, Proceedings of the 17th Annual International Symposium on IEEE 1990, pp 364–373
Smith AJ (1982) Cache memories. ACM Comput Surv (CSUR) 14(3):473–530
Article MathSciNet Google Scholar
Seznec A (1993) A case for two-way skewed-associative caches. In: ACM SIGARCH computer architecture news, vol 21(2). ACM, pp 169–178
Kharbutli M, Irwin K, Solihin Y, Lee J (2004) Using prime numbers for cache indexing to eliminate conflict misses. In: Software, IEE Proceedings of IEEE, pp 288–299
Bodin F, Seznec A (1997) Skewed-associativity improves performance and enhances predictability. IEEE Trans Comput 46(5):530–544
Article Google Scholar
Bodin F, Seznec A (1995) Skewed associativity enhances performance predictability. In: Proceedings of the 22nd annual international symposium on computer architecture, 22–24 June 1995. S. Margherita Ligure, Italy, pp 265–274
Bugnion E, Anderson JM, Mowry TC, Rosenblum M, Lam MS (1996) Compiler-directed page coloring for multiprocessors. In: ACM SIGPLAN notices, vol 31(9). ACM, pp 244–255
Kessler RE, Hill MD (1992) Page placement algorithms for large real-indexed caches. ACM Trans Comput Syst (TOCS) 10(4):338–359
Article Google Scholar
Zhang C (2006) Balanced cache: reducing conflict misses of direct-mapped caches. ACM SIGARCH Compu Archit News 34(2):155–166
Article Google Scholar
Ros A, Xekalakis P, Cintra M, Acacio ME, Garcia JM (2015) Adaptive selection of cache indexing bits for removing conflict misses. IEEE Trans Comput 64(6):1534–1547
MathSciNet MATH Google Scholar
Qureshi MK, Thompson D, Patt YN (2005) The v-way cache: demand-based associativity via global replacement. In: Computer Architecture, ISCA’05. Proceedings of the 32nd International Symposium on IEEE 2005, pp 544–555
Rolán D, Fraguela BB, Doallo R (2009) Adaptive line placement with the set balancing cache. In: Microarchitecture, MICRO-42. 42nd Annual IEEE/ACM International Symposium on IEEE 2009, pp 529–540
González A, Valero M, Topham N, Parcerisa JM (1997) Eliminating cache conflict misses through xor-based placement functions. In: Proceedings of the 11th International Conference on Supercomputing. ACM, pp 76–83
Rau BR (1991) Pseudo-randomly interleaved memory. In: ACM SIGARCH computer architecture news, vol 19(3). ACM, pp 74–83
Givargis T (2003) Improved indexing for cache miss reduction in embedded systems. In: Design Automation Conference, Proceedings of IEEE 2003, pp 875–880
Johnson TL, Connors DA, Merten MC, Hwu W-M (1999) Run-time cache bypassing. IEEE Trans Comput 48(12):1338–1354
Article Google Scholar
Collins JD, Tullsen DM (2001) Runtime identification of cache conflict misses: the adaptive miss buffer. ACM Trans Comput Syst (TOCS) 19(4):413–439
Article Google Scholar
Balasubramonian R, Albonesi D, Buyuktosunoglu A, Dwarkadas S (2000) Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In: Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture. ACM, pp 245–257
Chiou D, Jain P, Devadas S, Rudolph L (2000) Cache partitioning via columnization. In: Proceedings of Design Automation Conference. minus 0.4emCiteseer
Ranganathan P, Adve S, Jouppi NP (2000) Reconfigurable caches and their application to media processing, vol 28(2). ACM
Bershad BN, Lee D, Romer TH, Chen JB (1994) Avoiding conflict misses dynamically in large direct-mapped caches. In: ACM SIGPLAN notices, vol 29(11). ACM, pp. 158–170
Sherwood T, Calder B, Emer J (1999) Reducing cache misses using hardware and software page placement. In: Proceedings of the 13th International Conference on Supercomputing. ACM, pp 155–164
Chu Y, Ito MR (2000) The 2-way thrashing-avoidance cache (tac): an efficient instruction cache scheme for object-oriented languages. In: Computer Design, Proceedings of 2000 International Conference on IEEE, pp 93–98
Chu Y, Ito M (2001) An efficient instruction cache scheme for object-oriented languages. In: Performance, Computing, and Communications, IEEE International Conference on IEEE, pp 329–336
Calder B, Grunwald D, Zorn B (1994) Quantifying behavioral differences between c and c++ programs. J Program Lang 2(4):313–351
Google Scholar
Das S, Kapoor HK (2015) Dynamic associativity management using utility based way-sharing. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing. ACM, pp 1919–1924
Das S, Kapoor HK (2013) Dynamic associativity management using fellow sets. In: Electronic System Design (ISED), International Symposium on IEEE 2013, pp 133–137
Salwan H (2013) Global conflict avoidance using block placement strategies in multi-level caches. In: Information and Communication Technologies (ICT), 2013 IEEE Conference on IEEE, pp 1221–1226
Rolán D, Fraguela BB, Doallo R (2010) Reducing capacity and conflict misses using set saturation levels. In: High Performance Computing (HiPC), 2010 International Conference on IEEE, pp 1–9
Jia X, Jiang J, Ni X, Zhao T, Qi S, Fu G, Zhang M (2011) Understanding how non-uniform distribution of memory accesses on cache sets affects the system performance of chip multiprocessors. In: Parallel and Distributed Processing with Applications Workshops (ISPAW), Ninth IEEE International Symposium on IEEE 2011, pp 266–272
Wang B, Liu Z, Wang X, Yu W (2015) Eliminating intra-warp conflict misses in GPU. In: Proceedings of the 2015 Design, Automation and Test in Europe Conference and Exhibition. EDA Consortium, pp 689–694
Hong C, Bao W, Cohen A, Krishnamoorthy S, Pouchet L-N, Rastello F, Ramanujam J, Sadayappan P (2016) Effective padding of multidimensional arrays to avoid cache conflict misses. In: ACM SIGPLAN notices, vol 51(6). ACM, pp 129–144
Khairy M, Zahran M, Wassal A (2017) Sacat: streaming-aware conflict-avoiding thrashing-resistant gpgpu cache management scheme. IEEE Trans Parallel Distrib Syst 28(6):1740–1753
Article Google Scholar
Sato Y, Endo T (2017) An accurate simulator of cache-line conflicts to exploit the underlying cache performance. In: European Conference on Parallel Processing. Springer, pp 119–133
Austin T, Larson E, Ernst D (2002) Simplescalar: an infrastructure for computer system modeling. Computer 35(2):59–67
Article Google Scholar
Ortego PM, Sack P (2004) Sesc: Superescalar simulator. In: 17th Euro Micro Conference on Real Time Systems (ECRTS05), pp 1–4
Spec cpu benchmarks. http://www.spec.org/benchmarks.html
Spec cpu2000. http://www.spec.org/cpu2000
Spec cpu2006. http://www.spec.org/cpu2006
Beckmann N, Sanchez D (2015) Talus: a simple way to remove cliffs in cache performance. In: High Performance Computer Architecture (HPCA), IEEE 21st International Symposium on IEEE 2015, pp 64–75

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Faculty of Computer and Information Technology, Jordan University of Science and Technology, Irbid, 22110, Jordan
Ali Shatnawi & Mohammad Alsaedeen

Authors

Ali Shatnawi
View author publications
You can also search for this author inPubMed Google Scholar
Mohammad Alsaedeen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Ali Shatnawi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shatnawi, A., Alsaedeen, M. Reducing the second-level cache conflict misses using a set folding technique . J Supercomput 74, 970–993 (2018). https://doi.org/10.1007/s11227-017-2174-8

Download citation

Published: 01 November 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s11227-017-2174-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reducing the second-level cache conflict misses using a set folding technique

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Utilization of Shared Caches in Multicore Architectures

Improving Shared Cache Performance Using Variation of Bit Set Insertion Policy

Lifetime Enhancement of Non-Volatile Caches by Exploiting Dynamic Associativity Management Techniques

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Reducing the second-level cache conflict misses using a set folding technique

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Utilization of Shared Caches in Multicore Architectures

Improving Shared Cache Performance Using Variation of Bit Set Insertion Policy

Lifetime Enhancement of Non-Volatile Caches by Exploiting Dynamic Associativity Management Techniques

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now