Abstract
With rapid development of multi/many-core processors, contention in shared cache becomes more and more serious that restricts performance improvement of parallel programs. Recent researches have employed page coloring mechanism to realize cache partitioning on real system and to reduce contentions in shared cache. However, page coloring-based cache partitioning has some side effects, one is page coloring restricts memory space that an application can allocate, from which may lead to memory pressure, another is changing cache partition dynamically needs massive page copying which will incur large overhead. To make page coloring-based cache partition more practical, this paper proposes a malloc allocator-based dynamic cache partitioning mechanism with page coloring. Memory allocated by our malloc allocator can be dynamically partitioned among different applications according to partitioning policy. Only coloring the dynamically allocated pages can remit memory pressure and reduce page copying overhead led by re-coloring compared to all-page coloring. To further alleviate the overhead, we introduce minimum distance page copying strategy and lazy flush strategy. We conduct experiments on real system to evaluate these strategies and results show that they work well for reducing cache misses and re-coloring overhead.





Similar content being viewed by others
References
Lin J, Lu Q, Zhang X et al (2008) Gaining insights into multicore cache partitioning: bridging the gap between simulation and real systems. In: Proceedings of the 14th international symposium on high performance computer architecture (HPCA-14), Salt Lake City
Soares L, Tam D, Stumm M (2008) Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer. In 41th international symposium on microarchitecture
Zhang X, Dwarkadas S, Shen K (2009) Towards practical page coloring-based multicore Cache management. In: Proceedings of the 4th ACM European conference on computer systems (EuroSys’09), pp 89–102
Taylor G, Davies P, Farmwald M (1990) The TLB sliceCa low-cost high-speed address translation mechanism. In: Proceedings of the ISCA’90, pp 355–363
Kessler RE, Hill MD (1992) Page placement algorithms for large real-indexed caches. ACM Trans Comput Syst 10(4):338–359
Bugnion E, Anderson J, Mowry T et al (1996) Compiler-directed page coloring for multiprocessors. ACM SIGPLAN Not 31(9):244–255
Ding X, Wang K, Zhang X (2011) ULCC: a user-level facility for optimizing shared cache performance on multicores. In: Proceedings of 16th ACM SIGPLAN annual symposium on principles and practice of parallel programming (PPoPP 2011), 12–16 Feb 2011
Lu Q, Lin J, Zhang X et al (2009) Soft-olp: improving hardware cache performance through software-controlled object-level partitioning. In: Proceedings of the 18th international conference on parallel architectures and compilation techniques (PACT), pp 246–257
Perarnau S, Tchiboukdjian M, Huard G (2011) Controlling cache utilization of hpc applications. ACM. In: Proceedings of the international conference on supercomputing, pp 295–304
SPEC CPU2006. http://www.spec.org/cpu2006.2006
Tang L, Mars J, Soffa ML (2011) Contentiousness vs. sensitivity: improving contention aware runtime systems on multicore architectures. In: Proceedings of the 1st international workshop on adaptive self-tuning computing systems for the Exaflop Era, San Jose, June 2011
Zhu X, Li K, Salah A (2013) A data parallel strategy for aligning multiple biological sequences on multi-core computers. Comput Biol Med 43(4):350–361
Acknowledgments
We thank the anonymous reviewers for their insightful comments, which greatly improved the quality of this manuscript. This work is supported by National Science Foundation of China under Grant No. 61073011 and 61133004, and National High-Tech Program of China (863 program) under Grant No. 2012AA01A302.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, L., Liu, Y., Wang, R. et al. Lightweight dynamic partitioning for last-level cache of multicore processor on real system. J Supercomput 69, 547–560 (2014). https://doi.org/10.1007/s11227-014-1092-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-014-1092-2