ABSTRACT
This paper proposes a cache management scheme for multiprogrammed, multithreaded applications, with the objective of obtaining maximum performance for both individual applications and the multithreaded workload mix. In this scheme, each individual application's performance is improved by increasing the priority of its slowest thread, while the overall system performance is improved by ensuring that each individual application's performance benefit does not come at the cost of a significant degradation to other application's threads that are sharing the same cache. Averaged over six workloads, our shared cache management scheme improves the performance of the combination of applications by 18%. These improvements across applications in each mix are also fair, as indicated by average fair speedup improvements of 10% across the threads of each application (averaged over all the workloads).
- M. Bhadauria and S. A. McKee. An approach to resource-aware co-scheduling for CMPs. In ICS'10. Google ScholarDigital Library
- J. Chang and G. S. Sohi. Cooperative cache partitioning for chip multiprocessors. In ICS'07. Google ScholarDigital Library
- M. Chaudhuri. Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches. In MICRO'09. Google ScholarDigital Library
- E. Ebrahimi, et al. Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. In ASPLOS'10. Google ScholarDigital Library
- F. Guo, et al. From chaos to QoS: case studies in CMP resource management. SIGARCH Comput. Archit. News, 35(1):21--30, 2007. Google ScholarDigital Library
- L. R. Hsu, et al. Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In PACT'06. Google ScholarDigital Library
- R. Iyer. CQoS: a framework for enabling QoS in shared caches of CMP platforms. In ICS'04. Google ScholarDigital Library
- R. Iyer, et al. QoS policies and architecture for cache/memory in CMP platforms. SIGMETRIGS Perform. Eval. Rev., 35(1):25--36, 2007. Google ScholarDigital Library
- A. Jaleel, et al. Adaptive insertion policies for managing shared caches. In PACT'08. Google ScholarDigital Library
- J. A. Kahle, et al. Introduction to the CELL multiprocessor. IBM J. Res. Dev., 49(4/5):589--604, 2005. Google ScholarCross Ref
- M. Kandemir, et al. A helper thread based dynamic cache partitioning scheme for multithreaded applications. In DAC'11. Google ScholarDigital Library
- P. Kongetira, et al. Niagara: A 32-way multithreaded sparc processor. IEEE Micro, 25(2):21--29, 2005. Google ScholarDigital Library
- P. S. Magnusson, et al. SIMICS: A full system simulation platform. Computer, 35(2):50--58, 2002. Google ScholarDigital Library
- R. Manikantan, et al. NUcache: An efficient multicore cache organization based on next-use distance. In HPCA, 2011. Google ScholarDigital Library
- C. McNairy and R. Bhatia. Montecito: A dual-core, dual-thread Itanium processor. IEEE Micro, 25(2): 10--20, 2005. Google ScholarDigital Library
- S. P. Muralidhara, M. Kandemir, and P. Raghavan. Intra-application shared cache partitioning for multithreaded applications. In PPoPP'10. Google ScholarDigital Library
- M. K. Qureshi, et al. Adaptive insertion policies for high performance caching. In ISCA '07. Google ScholarDigital Library
- M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO'06. Google ScholarDigital Library
- N. Rafique, et al. Architectural support for operating system-driven CMP cache management. In PACT'06. Google ScholarDigital Library
- D. Sanchez, et al. Flexible architectural support for fine-grain scheduling. In ASPLOS'10. Google ScholarDigital Library
- S. Srikantaiah, et al. SHARP control: controlled shared cache management in chip multiprocessors. In MICRO'09. Google ScholarDigital Library
- G. E. Suh, et al. Dynamic partitioning of shared cache memory. J. Supercomput., 28(1):7--26, 2004. Google ScholarDigital Library
- Y. Xie and G. H. Loh. Pipp: promotion/insertion pseudo-partitioning of multi-core shared caches. In ISCA '09. Google ScholarDigital Library
- S. Zhuravlev, et al. Addressing shared resource contention in multicore processors via scheduling. In ASPLOS'10. Google ScholarDigital Library
Index Terms
- Courteous cache sharing: being nice to others in capacity management
Recommendations
Code-based cache partitioning for improving hardware cache performance
ICUIMC '12: Proceedings of the 6th International Conference on Ubiquitous Information Management and CommunicationRecently, improving hardware cache performance is getting more important, because the performance gap between processor and memory has caused "memory wall" problem. Most cache designs are based on the LRU replacement policy which is effective for high-...
Using Aggressor Thread Information to Improve Shared Cache Management for CMPs
PACT '09: Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation TechniquesShared cache allocation policies play an important role in determining CMP performance. The simplest policy, LRU, allocates cache implicitly as a consequence of its replacement decisions. But under high cache interference, LRU performs poorly because ...
Cache Exclusivity and Sharing: Theory and Optimization
A problem on multicore systems is cache sharing, where the cache occupancy of a program depends on the cache usage of peer programs. Exclusive cache hierarchy as used on AMD processors is an effective solution to allow processor cores to have a large ...
Comments