Skip to main content
Log in

Lightweight dynamic partitioning for last-level cache of multicore processor on real system

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

With rapid development of multi/many-core processors, contention in shared cache becomes more and more serious that restricts performance improvement of parallel programs. Recent researches have employed page coloring mechanism to realize cache partitioning on real system and to reduce contentions in shared cache. However, page coloring-based cache partitioning has some side effects, one is page coloring restricts memory space that an application can allocate, from which may lead to memory pressure, another is changing cache partition dynamically needs massive page copying which will incur large overhead. To make page coloring-based cache partition more practical, this paper proposes a malloc allocator-based dynamic cache partitioning mechanism with page coloring. Memory allocated by our malloc allocator can be dynamically partitioned among different applications according to partitioning policy. Only coloring the dynamically allocated pages can remit memory pressure and reduce page copying overhead led by re-coloring compared to all-page coloring. To further alleviate the overhead, we introduce minimum distance page copying strategy and lazy flush strategy. We conduct experiments on real system to evaluate these strategies and results show that they work well for reducing cache misses and re-coloring overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Lin J, Lu Q, Zhang X et al (2008) Gaining insights into multicore cache partitioning: bridging the gap between simulation and real systems. In: Proceedings of the 14th international symposium on high performance computer architecture (HPCA-14), Salt Lake City

  2. Soares L, Tam D, Stumm M (2008) Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer. In 41th international symposium on microarchitecture

  3. Zhang X, Dwarkadas S, Shen K (2009) Towards practical page coloring-based multicore Cache management. In: Proceedings of the 4th ACM European conference on computer systems (EuroSys’09), pp 89–102

  4. Taylor G, Davies P, Farmwald M (1990) The TLB sliceCa low-cost high-speed address translation mechanism. In: Proceedings of the ISCA’90, pp 355–363

  5. Kessler RE, Hill MD (1992) Page placement algorithms for large real-indexed caches. ACM Trans Comput Syst 10(4):338–359

    Article  Google Scholar 

  6. Bugnion E, Anderson J, Mowry T et al (1996) Compiler-directed page coloring for multiprocessors. ACM SIGPLAN Not 31(9):244–255

    Article  Google Scholar 

  7. Ding X, Wang K, Zhang X (2011) ULCC: a user-level facility for optimizing shared cache performance on multicores. In: Proceedings of 16th ACM SIGPLAN annual symposium on principles and practice of parallel programming (PPoPP 2011), 12–16 Feb 2011

  8. Lu Q, Lin J, Zhang X et al (2009) Soft-olp: improving hardware cache performance through software-controlled object-level partitioning. In: Proceedings of the 18th international conference on parallel architectures and compilation techniques (PACT), pp 246–257

  9. Perarnau S, Tchiboukdjian M, Huard G (2011) Controlling cache utilization of hpc applications. ACM. In: Proceedings of the international conference on supercomputing, pp 295–304

  10. perf. http://perf.wiki.kernel.org/.2011

  11. SPEC CPU2006. http://www.spec.org/cpu2006.2006

  12. Tang L, Mars J, Soffa ML (2011) Contentiousness vs. sensitivity: improving contention aware runtime systems on multicore architectures. In: Proceedings of the 1st international workshop on adaptive self-tuning computing systems for the Exaflop Era, San Jose, June 2011

  13. Zhu X, Li K, Salah A (2013) A data parallel strategy for aligning multiple biological sequences on multi-core computers. Comput Biol Med 43(4):350–361

    Article  Google Scholar 

Download references

Acknowledgments

We thank the anonymous reviewers for their insightful comments, which greatly improved the quality of this manuscript. This work is supported by National Science Foundation of China under Grant No. 61073011 and 61133004, and National High-Tech Program of China (863 program) under Grant No. 2012AA01A302.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, L., Liu, Y., Wang, R. et al. Lightweight dynamic partitioning for last-level cache of multicore processor on real system. J Supercomput 69, 547–560 (2014). https://doi.org/10.1007/s11227-014-1092-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-014-1092-2

Keywords

Navigation